Webhook Delivery System
The webhook delivery system consists of a background worker that processes pending events, delivers them to subscriber URLs, and manages retries.
Architecture
// internal/core/webhook/worker.go
type Worker struct {
db *pgxpool.Pool
httpClient *http.Client
}
func (w *Worker) Run(ctx context.Context) {
ticker := time.NewTicker(5 * time.Second)
defer ticker.Stop()
for {
select {
case <-ctx.Done():
return
case <-ticker.C:
w.processBatch(ctx)
}
}
}Delivery Process
The worker runs every 5 seconds and:
- Fetches up to 50 pending events where
status = 'pending'ANDnext_retry_at <= now() - For each event:
- Load subscription (get URL + signing_secret)
- Marshal payload to JSON
- Sign payload with HMAC-SHA256
- POST to URL with signature headers
- Update event status based on response
func (w *Worker) processBatch(ctx context.Context) {
events := w.fetchPendingEvents(ctx, 50)
for _, evt := range events {
sub := w.loadSubscription(ctx, evt.SubscriptionID)
if sub == nil || !sub.IsActive {
w.markExhausted(ctx, evt.ID, "subscription inactive")
continue
}
body, _ := json.Marshal(evt.Payload)
signature := Sign(body, sub.SigningSecret)
req, _ := http.NewRequestWithContext(ctx, "POST", sub.URL, bytes.NewReader(body))
req.Header.Set("Content-Type", "application/json")
req.Header.Set("X-Webhook-Signature", signature)
req.Header.Set("X-Webhook-Event", evt.EventType)
resp, err := w.httpClient.Do(req)
if err != nil {
w.recordFailure(ctx, evt, err.Error(), 0)
continue
}
defer resp.Body.Close()
if resp.StatusCode >= 200 && resp.StatusCode < 300 {
w.markDelivered(ctx, evt.ID, resp.StatusCode)
} else {
w.recordFailure(ctx, evt, "", resp.StatusCode)
}
}
}Retry Strategy
Exponential backoff with a maximum of 5 attempts:
| Attempt | Delay | Cumulative |
|---|---|---|
| 1 | Immediate | 0 |
| 2 | 1 minute | 1 min |
| 3 | 15 minutes | 16 min |
| 4 | 1 hour | 1h 16m |
| 5 | 6 hours | ~7h 16m |
After 5 failed attempts, the event status moves to exhausted.
func nextRetryDelay(attempt int) time.Duration {
delays := []time.Duration{
0,
1 * time.Minute,
15 * time.Minute,
1 * time.Hour,
6 * time.Hour,
}
if attempt >= len(delays) {
return 0 // exhausted
}
return delays[attempt]
}Event States
Events move through the following states:
pending → delivered (2xx response)
↓
failed → pending (retry) → delivered
↓
exhausted (after 5 attempts)- pending: Waiting for delivery or retry
- delivered: Successfully delivered (2xx response)
- failed: Last attempt failed (non-2xx response or network error)
- exhausted: All retry attempts failed
HTTP Client Configuration
httpClient := &http.Client{
Timeout: 10 * time.Second,
Transport: &http.Transport{
MaxIdleConns: 100,
MaxIdleConnsPerHost: 10,
IdleConnTimeout: 90 * time.Second,
},
}- 10-second timeout — Slow receivers don't block the worker
- Connection pooling — Reuse connections for performance
- Follows redirects — Up to 10 redirects (Go default)
Success Criteria
A delivery is considered successful if:
- HTTP status code is 2xx (200-299)
- Response received within 10 seconds
Any other response (3xx, 4xx, 5xx) or timeout triggers a retry.
Failure Handling
When a delivery fails:
- Increment
attemptscounter - Set
last_attempt_atto current time - Record
last_status_codeandlast_error - Calculate
next_retry_atbased on attempt number - Update status:
status = 'pending'if more retries availablestatus = 'exhausted'if all retries exhausted
Dead Letter Queue
Events that reach exhausted status are effectively moved to a "dead letter queue":
- They're no longer processed by the worker
- Admins can view them via the API
- They're retained for 90 days for debugging
- Admins can manually retry them if needed (future feature)
Idempotency
Receivers should implement idempotency:
- Each event has a unique
idfield (e.g.,evt_01HQ3K5M7N8P9R0S) - Receivers can use this ID to deduplicate deliveries
- If a delivery succeeds but the Core API doesn't receive the 2xx response (network issue), the event may be retried
Example receiver implementation:
func handleWebhook(w http.ResponseWriter, r *http.Request) {
var event struct {
ID string `json:"id"`
// ... other fields
}
json.NewDecoder(r.Body).Decode(&event)
// Check if we've already processed this event
if alreadyProcessed(event.ID) {
w.WriteHeader(http.StatusOK)
return
}
// Process the event
processEvent(event)
// Mark as processed
markProcessed(event.ID)
w.WriteHeader(http.StatusOK)
}Rate Limiting
The worker processes a maximum of 50 events per tick (5 seconds), resulting in:
- Max throughput: 600 events/minute
- Prevents thundering herd on bulk operations
- Protects receiver endpoints from overload
If you need higher throughput, contact support to adjust the batch size.
Monitoring & Debugging
Delivery Logs
All delivery attempts are recorded in webhook_events:
SELECT
event_type,
status,
attempts,
last_status_code,
last_error,
next_retry_at,
created_at
FROM webhook_events
WHERE subscription_id = ?
ORDER BY created_at DESC
LIMIT 100;Common Issues
Deliveries stuck in pending:
- Check if
next_retry_atis in the future (waiting for retry delay) - Verify subscription is
is_active = true - Check worker logs for errors
All deliveries failing with timeout:
- Receiver endpoint is too slow (> 10 seconds)
- Receiver should return 2xx immediately and process async
Deliveries failing with 401/403:
- Receiver is rejecting requests
- Verify signature verification on receiver side
High failure rate:
- Check receiver logs for errors
- Verify receiver endpoint is correct
- Test with
/testendpoint
Performance Considerations
Database queries:
- Index on
(next_retry_at)WHEREstatus = 'pending'for fast batch fetch - Index on
(subscription_id, created_at DESC)for delivery history queries - Index on
(organization_id, created_at DESC)for org-wide views
Worker scaling:
- Single worker instance handles 600 events/minute
- For higher throughput, run multiple workers
- Workers coordinate via database locks (no duplicate deliveries)
Future Enhancements
- Manual retry — Admin UI to retry exhausted events
- Webhook logs export — Download CSV of delivery history
- Real-time delivery status — WebSocket updates for admins
- Webhook analytics — Success rate, latency, error trends