Skip to main content

Project Setup

Learning Objectives

In this lab, you'll:

  • Initialize a new Cloudflare Worker project for the Intelligence Collector
  • Configure KV storage for threat intelligence data
  • Set up automated scheduling with cron triggers
  • Understand the basic Worker architecture

Time Estimate

5 minutes


Step 1: Initialize the Worker Project

Create a new Cloudflare Worker project specifically for intelligence collection:

npm create cloudflare@latest intelligence-collector
cd intelligence-collector

When prompted:

  • Choose Hello World Example
  • Which template: Worker only
  • Use TypeScript: Yes
  • Use git: Yes (recommended)
  • Deploy now: No (we'll deploy manually)

This creates a dedicated project with:

  • src/index.ts - Your main Worker code
  • wrangler.jsonc - Worker configuration
  • package.json - Dependencies and scripts
  • tsconfig.json - TypeScript configuration

Step 2: Create KV Storage

Create a dedicated KV namespace for threat intelligence data:

npx wrangler kv namespace create "THREAT_INTEL"

Expected Output:

🌀 Creating namespace with title "intelligence-collector-THREAT_INTEL"
✨ Success!
Add the following to your configuration file:
kv_namespaces = [
{ binding = "THREAT_INTEL", id = "abc123def456ghi789" }
]

Step 3: Configure Your Worker

Update your wrangler.jsonc with the KV namespace and scheduling:

{
"name": "intelligence-collector",
"main": "src/index.ts",
"compatibility_date": "2025-08-19",
"kv_namespaces": [
{
"binding": "THREAT_INTEL",
"id": "YOUR_ACTUAL_ID_FROM_STEP_2"
}
],
"triggers": {
"crons": ["*/15 * * * *"]
},
"observability": {
"enabled": true,
"head_sampling_rate": 1
}
}

Important: Replace YOUR_ACTUAL_ID_FROM_STEP_2 with the actual ID from the command output.

Step 4: Understand the Worker Structure

Let's examine the basic Worker structure. Create src/types.ts and update src/index.ts:

src/types.ts
export interface Env {
THREAT_INTEL: KVNamespace;
}

export interface ThreatSource {
name: string;
url: string;
weight: number;
format: 'plain' | 'csv' | 'json';
timeout: number;
user_agent: string;
}

export interface ThreatIP {
ip: string;
score: number;
sources: string[];
first_seen: string;
last_seen: string;
is_whitelisted: boolean;
}

export interface CollectionResult {
threats: Map<string, ThreatIP>;
stats: {
total_sources: number;
successful_sources: string[];
failed_sources: string[];
total_raw_ips: number;
unique_ips: number;
processing_time_ms: number;
};
}

export interface WhitelistEntry {
ip: string;
reason: string;
added_by: string;
added_at: string;
type: 'cloudflare' | 'custom';
}

export interface WhitelistConfig {
cloudflare: string[];
custom: WhitelistEntry[];
last_updated: string;
}

export interface EnhancedThreatIP extends ThreatIP {
confidence_level: 'low' | 'medium' | 'high' | 'very_high';
risk_category: 'spam' | 'malware' | 'botnet' | 'scanning' | 'unknown';
geographic_info?: {
country?: string;
asn?: string;
org?: string;
};
last_validation: string;
expires_at: string;
}

export interface ThreatScoringConfig {
confidence_thresholds: {
low: number;
medium: number;
high: number;
very_high: number;
};
max_age_hours: number;
source_weights: Record<string, number>;
}


export interface ThreatIntelligenceStats {
collection: {
last_run: string;
duration_ms: number;
sources_attempted: number;
sources_successful: number;
sources_failed: string[];
};
processing: {
raw_ips_collected: number;
unique_ips_processed: number;
duplicates_removed: number;
validation_passed: number;
validation_failed: number;
};
scoring: {
confidence_distribution: Record<string, number>;
risk_distribution: Record<string, number>;
average_score: number;
highest_score: number;
};
whitelist: {
total_checked: number;
cloudflare_protected: number;
custom_protected: number;
active_threats: number;
};
data_quality: {
freshness_hours: number;
expired_removed: number;
stale_warnings: number;
data_coverage_percentage: number;
};
}

export interface APIResponse<T = any> {
success: boolean;
data?: T;
error?: {
code: string;
message: string;
details?: any;
};
metadata: {
timestamp: string;
request_id: string;
processing_time_ms: number;
version: string;
};
}

export interface PaginatedResponse<T> extends APIResponse<T[]> {
pagination: {
page: number;
limit: number;
total: number;
has_next: boolean;
has_previous: boolean;
};
}

export interface ThreatIPResponse {
ip: string;
score: number;
confidence_level: string;
risk_category: string;
sources: string[];
first_seen: string;
last_seen: string;
is_whitelisted: boolean;
geographic_info?: {
country?: string;
asn?: string;
org?: string;
};
}

export interface IPCheckResponse {
ip: string;
status: 'threat' | 'whitelisted' | 'clean' | 'unknown';
threat_info?: ThreatIPResponse;
whitelist_info?: {
type: 'cloudflare' | 'custom';
reason: string;
added_at: string;
};
recommendations: string[];
}
src/index.ts
// Main Worker export with two key functions
export default {
// Handles HTTP requests (API endpoints)
async fetch(request: Request, env: Env, ctx: ExecutionContext): Promise<Response> {
return new Response('Intelligence Collector API - Coming Soon!', {
headers: { 'Content-Type': 'text/plain' }
});
},

// Handles scheduled execution (automated data collection)
async scheduled(event: ScheduledEvent, env: Env, ctx: ExecutionContext): Promise<void> {
console.log('Scheduled collection triggered at:', new Date().toISOString());
// Data collection logic will go here
},
};

Step 5: Util functions

src/lib/ip.ts
export function isValidIP(ip: string): boolean {
const ipv4Regex = /^(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)$/;
return ipv4Regex.test(ip);
}

export function ipToInt(ip: string): number {
const parts = ip.split('.').map(Number);
return (parts[0] << 24) + (parts[1] << 16) + (parts[2] << 8) + parts[3];
}

export function isIpInCidr(ip: string, cidr: string): boolean {
try {
const [rangeIp, prefixLengthStr] = cidr.split('/');
const prefixLength = parseInt(prefixLengthStr, 10);

if (!isValidIP(ip) || !isValidIP(rangeIp) || isNaN(prefixLength)) {
return false;
}

if (prefixLength < 0 || prefixLength > 32) {
return false;
}

const mask = ~(Math.pow(2, 32 - prefixLength) - 1);

const ipInt = ipToInt(ip);
const rangeInt = ipToInt(rangeIp);

return (ipInt & mask) === (rangeInt & mask);
} catch (error) {
console.warn(`CIDR check failed for ${ip} in ${cidr}:`, error);
return false;
}
}

export function isIpInRanges(ip: string, ranges: string[]): boolean {
for (const range of ranges) {
if (isIpInCidr(ip, range)) {
return true;
}
}
return false;
}

export function isValidCidr(cidr: string): boolean {
try {
const [ip, prefixStr] = cidr.split('/');
const prefix = parseInt(prefixStr, 10);

return isValidIP(ip) && prefix >= 0 && prefix <= 32;
} catch {
return false;
}
}
src/lib/utils.ts
import { APIResponse, PaginatedResponse } from "../types";

export function generateRequestId(): string {
return `req_${Date.now()}_${Math.random().toString(36).substr(2, 9)}`;
}

export function createAPIResponse<T>(
data: T | null,
error: { code: string; message: string; details?: any } | null,
startTime: number,
requestId: string
): APIResponse<T> {
return {
success: error === null,
data: data || undefined,
error: error || undefined,
metadata: {
timestamp: new Date().toISOString(),
request_id: requestId,
processing_time_ms: Date.now() - startTime,
version: '1.0.0'
}
};
}

export function createPaginatedResponse<T>(
items: T[],
page: number,
limit: number,
total: number,
startTime: number,
requestId: string
): PaginatedResponse<T> {
const totalPages = Math.ceil(total / limit);

return {
success: true,
data: items,
pagination: {
page,
limit,
total,
has_next: page < totalPages,
has_previous: page > 1
},
metadata: {
timestamp: new Date().toISOString(),
request_id: requestId,
processing_time_ms: Date.now() - startTime,
version: '1.0.0'
}
};
}

export function parseQueryParams(url: URL): {
page: number;
limit: number;
confidence?: string;
category?: string;
sort?: string;
include_whitelisted?: boolean;
} {
const page = Math.max(1, parseInt(url.searchParams.get('page') || '1'));
const limit = Math.min(1000, Math.max(1, parseInt(url.searchParams.get('limit') || '100')));
const confidence = url.searchParams.get('confidence') || undefined;
const category = url.searchParams.get('category') || undefined;
const sort = url.searchParams.get('sort') || 'score_desc';
const include_whitelisted = url.searchParams.get('include_whitelisted') === 'true';

return { page, limit, confidence, category, sort, include_whitelisted };
}

Step 5: Generate Cloudflare Types

Generate TypeScript types for Cloudflare Workers to enable better IDE support:

npx wrangler types
# or
npm run cf-typegen

This creates worker-configuration.d.ts with type definitions for your Worker environment, including KV bindings and other Cloudflare-specific types.

Step 6: Deploy and Test Your Setup

Deploy and test your initial Worker:

# Deploy to Cloudflare
npx wrangler deploy

# Test the basic endpoint
curl "https://intelligence-collector.YOUR-SUBDOMAIN.workers.dev"

Expected Response:

Intelligence Collector API - Coming Soon!

Step 7: Verify Scheduled Function Setup

Test that your scheduled function can be triggered:

npm run dev

# In another terminal, manually trigger the scheduled function
curl "http://localhost:8787/cdn-cgi/handler/scheduled?cron=*+*+*+*+*"

ops-automation-test-1.png

You should see logs showing the scheduled function execution.

Architecture Understanding

Key Components

  1. Fetch Handler - Processes HTTP requests, serves API endpoints
  2. Scheduled Handler - Runs automatically based on cron expressions
  3. KV Storage - Global, low-latency data persistence
  4. Environment Interface - Type-safe access to Worker resources

Data Flow

Cron Trigger → Scheduled Function → Threat Collection → KV Storage

HTTP Request → Fetch Handler → Read from KV → API Response

Worker Execution Model

  • Isolates: Each request runs in an isolated V8 runtime
  • Edge Locations: Code runs in 300+ global locations
  • Zero Cold Start: Instant execution, no server spin-up time
  • Automatic Scaling: Handles millions of requests automatically

Configuration Explained

Cron Expression Breakdown

"crons": ["*/15 * * * *"]

Format: minute hour day month day-of-week

  • */15 - Every 15 minutes
  • * - Every hour
  • * - Every day
  • * - Every month
  • * - Every day of week

Common Patterns:

  • */15 * * * * - Every 15 minutes (recommended for threat feeds)
  • 0 */2 * * * - Every 2 hours
  • 0 0 * * * - Daily at midnight UTC
  • 0 0 * * 1 - Weekly on Monday

KV Namespace Benefits

  • Global Replication - Data available in all edge locations
  • Low Latency - Sub-millisecond read times
  • High Availability - 99.9% uptime SLA
  • Automatic Expiration - TTL support for data lifecycle management

Troubleshooting

Common Issues

1. KV Namespace Not Found

Error: KV namespace with binding "THREAT_INTEL" not found

Solution: Verify the KV namespace ID in wrangler.jsonc matches the output from step 2.

2. Cron Syntax Error

Error: Invalid cron expression

Solution: Verify the cron expression uses the correct format with 5 fields.

3. Deployment Permission Error

Error: You do not have permission to deploy

Solution: Run wrangler auth login to authenticate with Cloudflare.

Validation Commands

# Check Worker status
npx wrangler status

# List KV namespaces
npx wrangler kv namespace list

# View live logs
npx wrangler tail --format=pretty

Next Steps

Your Intelligence Collector foundation is now set up! You have:

  • ✅ A deployed Worker with proper TypeScript configuration
  • ✅ KV storage ready for threat intelligence data
  • ✅ Automated scheduling configured for data collection
  • ✅ Basic project structure understanding

In the next lab, you'll implement Multi-Source Threat Intelligence Feeds to start collecting real threat data from multiple sources.

Key Takeaways

  1. Separation of Concerns - HTTP handling vs. scheduled tasks
  2. Global Edge Deployment - Your code runs worldwide automatically
  3. Resource Bindings - Type-safe access to Cloudflare services
  4. Configuration as Code - Infrastructure defined in wrangler.jsonc