Project Setup

Learning Objectives

In this lab, you'll:

Initialize a new Cloudflare Worker project for the Intelligence Collector
Configure KV storage for threat intelligence data
Set up automated scheduling with cron triggers
Understand the basic Worker architecture

Time Estimate

5 minutes

Step 1: Initialize the Worker Project

Create a new Cloudflare Worker project specifically for intelligence collection:

npm create cloudflare@latest intelligence-collector
cd intelligence-collector

When prompted:

Choose Hello World Example
Which template: Worker only
Use TypeScript: Yes
Use git: Yes (recommended)
Deploy now: No (we'll deploy manually)

This creates a dedicated project with:

src/index.ts - Your main Worker code
wrangler.jsonc - Worker configuration
package.json - Dependencies and scripts
tsconfig.json - TypeScript configuration

Step 2: Create KV Storage

Create a dedicated KV namespace for threat intelligence data:

npx wrangler kv namespace create "THREAT_INTEL"

Expected Output:

🌀 Creating namespace with title "intelligence-collector-THREAT_INTEL"
✨ Success!
Add the following to your configuration file:
kv_namespaces = [
	{ binding = "THREAT_INTEL", id = "abc123def456ghi789" }
]

Step 3: Configure Your Worker

Update your wrangler.jsonc with the KV namespace and scheduling:

{
  "name": "intelligence-collector",
  "main": "src/index.ts",
  "compatibility_date": "2025-08-19",
  "kv_namespaces": [
    {
      "binding": "THREAT_INTEL",
      "id": "YOUR_ACTUAL_ID_FROM_STEP_2"
    }
  ],
  "triggers": {
    "crons": ["*/15 * * * *"]
  },
  "observability": {
    "enabled": true,
    "head_sampling_rate": 1
  }
}

Important: Replace YOUR_ACTUAL_ID_FROM_STEP_2 with the actual ID from the command output.

Step 4: Understand the Worker Structure

Let's examine the basic Worker structure. Create src/types.ts and update src/index.ts:

src/types.ts
export interface Env {
    THREAT_INTEL: KVNamespace;
}

export interface ThreatSource {
    name: string;           
    url: string;            
    weight: number;         
    format: 'plain' | 'csv' | 'json'; 
    timeout: number;        
    user_agent: string;     
}

export interface ThreatIP {
    ip: string;             
    score: number;          
    sources: string[];      
    first_seen: string;     
    last_seen: string;      
    is_whitelisted: boolean; 
}

export interface CollectionResult {
    threats: Map<string, ThreatIP>;
    stats: {
        total_sources: number;
        successful_sources: string[];
        failed_sources: string[];
        total_raw_ips: number;
        unique_ips: number;
        processing_time_ms: number;
    };
}

export interface WhitelistEntry {
    ip: string;             
    reason: string;         
    added_by: string;       
    added_at: string;       
    type: 'cloudflare' | 'custom'; 
}

export interface WhitelistConfig {
    cloudflare: string[];           
    custom: WhitelistEntry[];       
    last_updated: string;           
}

export interface EnhancedThreatIP extends ThreatIP {
    confidence_level: 'low' | 'medium' | 'high' | 'very_high';
    risk_category: 'spam' | 'malware' | 'botnet' | 'scanning' | 'unknown';
    geographic_info?: {
        country?: string;
        asn?: string;
        org?: string;
    };
    last_validation: string;
    expires_at: string;
}

export interface ThreatScoringConfig {
    confidence_thresholds: {
        low: number;
        medium: number;
        high: number;
        very_high: number;
    };
    max_age_hours: number;
    source_weights: Record<string, number>;
}


export interface ThreatIntelligenceStats {
    collection: {
        last_run: string;
        duration_ms: number;
        sources_attempted: number;
        sources_successful: number;
        sources_failed: string[];
    };
    processing: {
        raw_ips_collected: number;
        unique_ips_processed: number;
        duplicates_removed: number;
        validation_passed: number;
        validation_failed: number;
    };
    scoring: {
        confidence_distribution: Record<string, number>;
        risk_distribution: Record<string, number>;
        average_score: number;
        highest_score: number;
    };
    whitelist: {
        total_checked: number;
        cloudflare_protected: number;
        custom_protected: number;
        active_threats: number;
    };
    data_quality: {
        freshness_hours: number;
        expired_removed: number;
        stale_warnings: number;
        data_coverage_percentage: number;
    };
}

export interface APIResponse<T = any> {
    success: boolean;
    data?: T;
    error?: {
        code: string;
        message: string;
        details?: any;
    };
    metadata: {
        timestamp: string;
        request_id: string;
        processing_time_ms: number;
        version: string;
    };
}

export interface PaginatedResponse<T> extends APIResponse<T[]> {
    pagination: {
        page: number;
        limit: number;
        total: number;
        has_next: boolean;
        has_previous: boolean;
    };
}

export interface ThreatIPResponse {
    ip: string;
    score: number;
    confidence_level: string;
    risk_category: string;
    sources: string[];
    first_seen: string;
    last_seen: string;
    is_whitelisted: boolean;
    geographic_info?: {
        country?: string;
        asn?: string;
        org?: string;
    };
}

export interface IPCheckResponse {
    ip: string;
    status: 'threat' | 'whitelisted' | 'clean' | 'unknown';
    threat_info?: ThreatIPResponse;
    whitelist_info?: {
        type: 'cloudflare' | 'custom';
        reason: string;
        added_at: string;
    };
    recommendations: string[];
}

src/index.ts
// Main Worker export with two key functions
export default {
    // Handles HTTP requests (API endpoints)
    async fetch(request: Request, env: Env, ctx: ExecutionContext): Promise<Response> {
        return new Response('Intelligence Collector API - Coming Soon!', {
            headers: { 'Content-Type': 'text/plain' }
        });
    },
    
    // Handles scheduled execution (automated data collection)
    async scheduled(event: ScheduledEvent, env: Env, ctx: ExecutionContext): Promise<void> {
        console.log('Scheduled collection triggered at:', new Date().toISOString());
        // Data collection logic will go here
    },
};

Step 5: Util functions

src/lib/ip.ts
export function isValidIP(ip: string): boolean {
    const ipv4Regex = /^(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)$/;
    return ipv4Regex.test(ip);
}

export function ipToInt(ip: string): number {
    const parts = ip.split('.').map(Number);
    return (parts[0] << 24) + (parts[1] << 16) + (parts[2] << 8) + parts[3];
}

export function isIpInCidr(ip: string, cidr: string): boolean {
    try {
        const [rangeIp, prefixLengthStr] = cidr.split('/');
        const prefixLength = parseInt(prefixLengthStr, 10);
        
        if (!isValidIP(ip) || !isValidIP(rangeIp) || isNaN(prefixLength)) {
            return false;
        }
        
        if (prefixLength < 0 || prefixLength > 32) {
            return false;
        }
        
        const mask = ~(Math.pow(2, 32 - prefixLength) - 1);
        
        const ipInt = ipToInt(ip);
        const rangeInt = ipToInt(rangeIp);
        
        return (ipInt & mask) === (rangeInt & mask);
    } catch (error) {
        console.warn(`CIDR check failed for ${ip} in ${cidr}:`, error);
        return false;
    }
}

export function isIpInRanges(ip: string, ranges: string[]): boolean {
    for (const range of ranges) {
        if (isIpInCidr(ip, range)) {
            return true;
        }
    }
    return false;
}

export function isValidCidr(cidr: string): boolean {
    try {
        const [ip, prefixStr] = cidr.split('/');
        const prefix = parseInt(prefixStr, 10);
        
        return isValidIP(ip) && prefix >= 0 && prefix <= 32;
    } catch {
        return false;
    }
}

src/lib/utils.ts
import { APIResponse, PaginatedResponse } from "../types";

export function generateRequestId(): string {
    return `req_${Date.now()}_${Math.random().toString(36).substr(2, 9)}`;
}

export function createAPIResponse<T>(
    data: T | null,
    error: { code: string; message: string; details?: any } | null,
    startTime: number,
    requestId: string
): APIResponse<T> {
    return {
        success: error === null,
        data: data || undefined,
        error: error || undefined,
        metadata: {
            timestamp: new Date().toISOString(),
            request_id: requestId,
            processing_time_ms: Date.now() - startTime,
            version: '1.0.0'
        }
    };
}

export function createPaginatedResponse<T>(
    items: T[],
    page: number,
    limit: number,
    total: number,
    startTime: number,
    requestId: string
): PaginatedResponse<T> {
    const totalPages = Math.ceil(total / limit);
    
    return {
        success: true,
        data: items,
        pagination: {
            page,
            limit,
            total,
            has_next: page < totalPages,
            has_previous: page > 1
        },
        metadata: {
            timestamp: new Date().toISOString(),
            request_id: requestId,
            processing_time_ms: Date.now() - startTime,
            version: '1.0.0'
        }
    };
}

export function parseQueryParams(url: URL): {
    page: number;
    limit: number;
    confidence?: string;
    category?: string;
    sort?: string;
    include_whitelisted?: boolean;
} {
    const page = Math.max(1, parseInt(url.searchParams.get('page') || '1'));
    const limit = Math.min(1000, Math.max(1, parseInt(url.searchParams.get('limit') || '100')));
    const confidence = url.searchParams.get('confidence') || undefined;
    const category = url.searchParams.get('category') || undefined;
    const sort = url.searchParams.get('sort') || 'score_desc';
    const include_whitelisted = url.searchParams.get('include_whitelisted') === 'true';
    
    return { page, limit, confidence, category, sort, include_whitelisted };
}

Step 5: Generate Cloudflare Types

Generate TypeScript types for Cloudflare Workers to enable better IDE support:

npx wrangler types
# or
npm run cf-typegen

This creates worker-configuration.d.ts with type definitions for your Worker environment, including KV bindings and other Cloudflare-specific types.

Step 6: Deploy and Test Your Setup

Deploy and test your initial Worker:

# Deploy to Cloudflare
npx wrangler deploy

# Test the basic endpoint
curl "https://intelligence-collector.YOUR-SUBDOMAIN.workers.dev"

Expected Response:

Intelligence Collector API - Coming Soon!

Step 7: Verify Scheduled Function Setup

Test that your scheduled function can be triggered:

npm run dev

# In another terminal, manually trigger the scheduled function
curl "http://localhost:8787/cdn-cgi/handler/scheduled?cron=*+*+*+*+*"

You should see logs showing the scheduled function execution.

Architecture Understanding

Key Components

Fetch Handler - Processes HTTP requests, serves API endpoints
Scheduled Handler - Runs automatically based on cron expressions
KV Storage - Global, low-latency data persistence
Environment Interface - Type-safe access to Worker resources

Data Flow

Cron Trigger → Scheduled Function → Threat Collection → KV Storage
                                                            ↓
HTTP Request → Fetch Handler → Read from KV → API Response

Worker Execution Model

Isolates: Each request runs in an isolated V8 runtime
Edge Locations: Code runs in 300+ global locations
Zero Cold Start: Instant execution, no server spin-up time
Automatic Scaling: Handles millions of requests automatically

Configuration Explained

Cron Expression Breakdown

"crons": ["*/15 * * * *"]

Format: minute hour day month day-of-week

*/15 - Every 15 minutes
* - Every hour
* - Every day
* - Every month
* - Every day of week

Common Patterns:

*/15 * * * * - Every 15 minutes (recommended for threat feeds)
0 */2 * * * - Every 2 hours
0 0 * * * - Daily at midnight UTC
0 0 * * 1 - Weekly on Monday

KV Namespace Benefits

Global Replication - Data available in all edge locations
Low Latency - Sub-millisecond read times
High Availability - 99.9% uptime SLA
Automatic Expiration - TTL support for data lifecycle management

Troubleshooting

Common Issues

1. KV Namespace Not Found

Error: KV namespace with binding "THREAT_INTEL" not found

Solution: Verify the KV namespace ID in wrangler.jsonc matches the output from step 2.

2. Cron Syntax Error

Error: Invalid cron expression

Solution: Verify the cron expression uses the correct format with 5 fields.

3. Deployment Permission Error

Error: You do not have permission to deploy

Solution: Run wrangler auth login to authenticate with Cloudflare.

Validation Commands

# Check Worker status
npx wrangler status

# List KV namespaces
npx wrangler kv namespace list

# View live logs
npx wrangler tail --format=pretty

Next Steps

Your Intelligence Collector foundation is now set up! You have:

✅ A deployed Worker with proper TypeScript configuration
✅ KV storage ready for threat intelligence data
✅ Automated scheduling configured for data collection
✅ Basic project structure understanding

In the next lab, you'll implement Multi-Source Threat Intelligence Feeds to start collecting real threat data from multiple sources.

Key Takeaways

Separation of Concerns - HTTP handling vs. scheduled tasks
Global Edge Deployment - Your code runs worldwide automatically
Resource Bindings - Type-safe access to Cloudflare services
Configuration as Code - Infrastructure defined in wrangler.jsonc

Learning Objectives​

Time Estimate​

Step 1: Initialize the Worker Project​

Step 2: Create KV Storage​

Step 3: Configure Your Worker​

Step 4: Understand the Worker Structure​

Step 5: Util functions​

Step 5: Generate Cloudflare Types​

Step 6: Deploy and Test Your Setup​

Step 7: Verify Scheduled Function Setup​

Architecture Understanding​

Key Components​

Data Flow​

Worker Execution Model​

Configuration Explained​

Cron Expression Breakdown​

KV Namespace Benefits​

Troubleshooting​

Common Issues​

Validation Commands​

Next Steps​

Key Takeaways​