Skip to content

Webhook Delivery System

The webhook delivery system consists of a background worker that processes pending events, delivers them to subscriber URLs, and manages retries.

Architecture

go
// internal/core/webhook/worker.go

type Worker struct {
    db         *pgxpool.Pool
    httpClient *http.Client
}

func (w *Worker) Run(ctx context.Context) {
    ticker := time.NewTicker(5 * time.Second)
    defer ticker.Stop()

    for {
        select {
        case <-ctx.Done():
            return
        case <-ticker.C:
            w.processBatch(ctx)
        }
    }
}

Delivery Process

The worker runs every 5 seconds and:

  1. Fetches up to 50 pending events where status = 'pending' AND next_retry_at <= now()
  2. For each event:
    • Load subscription (get URL + signing_secret)
    • Marshal payload to JSON
    • Sign payload with HMAC-SHA256
    • POST to URL with signature headers
    • Update event status based on response
go
func (w *Worker) processBatch(ctx context.Context) {
    events := w.fetchPendingEvents(ctx, 50)

    for _, evt := range events {
        sub := w.loadSubscription(ctx, evt.SubscriptionID)
        if sub == nil || !sub.IsActive {
            w.markExhausted(ctx, evt.ID, "subscription inactive")
            continue
        }

        body, _ := json.Marshal(evt.Payload)
        signature := Sign(body, sub.SigningSecret)

        req, _ := http.NewRequestWithContext(ctx, "POST", sub.URL, bytes.NewReader(body))
        req.Header.Set("Content-Type", "application/json")
        req.Header.Set("X-Webhook-Signature", signature)
        req.Header.Set("X-Webhook-Event", evt.EventType)

        resp, err := w.httpClient.Do(req)
        if err != nil {
            w.recordFailure(ctx, evt, err.Error(), 0)
            continue
        }
        defer resp.Body.Close()

        if resp.StatusCode >= 200 && resp.StatusCode < 300 {
            w.markDelivered(ctx, evt.ID, resp.StatusCode)
        } else {
            w.recordFailure(ctx, evt, "", resp.StatusCode)
        }
    }
}

Retry Strategy

Exponential backoff with a maximum of 5 attempts:

AttemptDelayCumulative
1Immediate0
21 minute1 min
315 minutes16 min
41 hour1h 16m
56 hours~7h 16m

After 5 failed attempts, the event status moves to exhausted.

go
func nextRetryDelay(attempt int) time.Duration {
    delays := []time.Duration{
        0,
        1 * time.Minute,
        15 * time.Minute,
        1 * time.Hour,
        6 * time.Hour,
    }
    if attempt >= len(delays) {
        return 0 // exhausted
    }
    return delays[attempt]
}

Event States

Events move through the following states:

pending → delivered (2xx response)

        failed → pending (retry) → delivered

                              exhausted (after 5 attempts)
  • pending: Waiting for delivery or retry
  • delivered: Successfully delivered (2xx response)
  • failed: Last attempt failed (non-2xx response or network error)
  • exhausted: All retry attempts failed

HTTP Client Configuration

go
httpClient := &http.Client{
    Timeout: 10 * time.Second,
    Transport: &http.Transport{
        MaxIdleConns:        100,
        MaxIdleConnsPerHost: 10,
        IdleConnTimeout:     90 * time.Second,
    },
}
  • 10-second timeout — Slow receivers don't block the worker
  • Connection pooling — Reuse connections for performance
  • Follows redirects — Up to 10 redirects (Go default)

Success Criteria

A delivery is considered successful if:

  • HTTP status code is 2xx (200-299)
  • Response received within 10 seconds

Any other response (3xx, 4xx, 5xx) or timeout triggers a retry.

Failure Handling

When a delivery fails:

  1. Increment attempts counter
  2. Set last_attempt_at to current time
  3. Record last_status_code and last_error
  4. Calculate next_retry_at based on attempt number
  5. Update status:
    • status = 'pending' if more retries available
    • status = 'exhausted' if all retries exhausted

Dead Letter Queue

Events that reach exhausted status are effectively moved to a "dead letter queue":

  • They're no longer processed by the worker
  • Admins can view them via the API
  • They're retained for 90 days for debugging
  • Admins can manually retry them if needed (future feature)

Idempotency

Receivers should implement idempotency:

  • Each event has a unique id field (e.g., evt_01HQ3K5M7N8P9R0S)
  • Receivers can use this ID to deduplicate deliveries
  • If a delivery succeeds but the Core API doesn't receive the 2xx response (network issue), the event may be retried

Example receiver implementation:

go
func handleWebhook(w http.ResponseWriter, r *http.Request) {
    var event struct {
        ID string `json:"id"`
        // ... other fields
    }
    json.NewDecoder(r.Body).Decode(&event)

    // Check if we've already processed this event
    if alreadyProcessed(event.ID) {
        w.WriteHeader(http.StatusOK)
        return
    }

    // Process the event
    processEvent(event)

    // Mark as processed
    markProcessed(event.ID)

    w.WriteHeader(http.StatusOK)
}

Rate Limiting

The worker processes a maximum of 50 events per tick (5 seconds), resulting in:

  • Max throughput: 600 events/minute
  • Prevents thundering herd on bulk operations
  • Protects receiver endpoints from overload

If you need higher throughput, contact support to adjust the batch size.

Monitoring & Debugging

Delivery Logs

All delivery attempts are recorded in webhook_events:

sql
SELECT
    event_type,
    status,
    attempts,
    last_status_code,
    last_error,
    next_retry_at,
    created_at
FROM webhook_events
WHERE subscription_id = ?
ORDER BY created_at DESC
LIMIT 100;

Common Issues

Deliveries stuck in pending:

  • Check if next_retry_at is in the future (waiting for retry delay)
  • Verify subscription is is_active = true
  • Check worker logs for errors

All deliveries failing with timeout:

  • Receiver endpoint is too slow (> 10 seconds)
  • Receiver should return 2xx immediately and process async

Deliveries failing with 401/403:

  • Receiver is rejecting requests
  • Verify signature verification on receiver side

High failure rate:

  • Check receiver logs for errors
  • Verify receiver endpoint is correct
  • Test with /test endpoint

Performance Considerations

Database queries:

  • Index on (next_retry_at) WHERE status = 'pending' for fast batch fetch
  • Index on (subscription_id, created_at DESC) for delivery history queries
  • Index on (organization_id, created_at DESC) for org-wide views

Worker scaling:

  • Single worker instance handles 600 events/minute
  • For higher throughput, run multiple workers
  • Workers coordinate via database locks (no duplicate deliveries)

Future Enhancements

  • Manual retry — Admin UI to retry exhausted events
  • Webhook logs export — Download CSV of delivery history
  • Real-time delivery status — WebSocket updates for admins
  • Webhook analytics — Success rate, latency, error trends