Skip to content

Hold System, Redis Architecture & SSE Protocol

Overview

The hold system prevents double-booking by allowing clients to temporarily reserve ("hold") a timeslot before confirming. Holds are backed by Redis with TTL-based auto-expiry, and state changes are streamed to clients in real-time via Server-Sent Events (SSE). This document reflects the merged scheduling domain architecture used in the platform.


Hold Lifecycle

Client                          Redis                           Other Clients (via SSE)
  │                               │                               │
  │  POST /v1/holds               │                               │
  │  ─────────────────────►       │                               │
  │                               │                               │
  │  1. Check client quota        │                               │
  │     SCARD client:{cid}:holds  │                               │
  │                               │                               │
  │  2. Pick specialist by priority│                               │
  │     (availability.go logic)   │                               │
  │                               │                               │
  │  3. Atomic claim              │                               │
  │     SET hold:{atid}:{slot}:{spid}                             │
  │         value PX 30000 NX ───►│                               │
  │                               │  PUBLISH holds:events:{atid}  │
  │                               │  ─────────────────────────────►│  "hold" event
  │  4. Index by client           │                               │
  │     SADD client:{cid}:holds   │                               │
  │                               │                               │
  │  ◄──── { holdId, specialistId }│                              │
  │                               │                               │
  │  PATCH /v1/holds (heartbeat)  │                               │
  │  ─────────────────────►       │                               │
  │     PEXPIRE hold key + set    │  PUBLISH "heartbeat" event    │
  │  ◄──── { ok: true }          │  ─────────────────────────────►│
  │                               │                               │
  │  POST /v1/appointment-types/{id}/book                         │
  │  ─────────────────────►       │                               │
  │     DEL hold key              │  PUBLISH "confirm" event      │
  │     SREM from client set      │  ─────────────────────────────►│
  │     INSERT appointment into DB│                               │
  │  ◄──── { appointment }       │                               │
  │                               │                               │
  │       ── OR (no heartbeat) ── │                               │
  │                               │                               │
  │                 TTL expires ──►│                               │
  │                 Key auto-deleted                               │
  │                 (no event published on expiry)                 │

Redis Key Patterns

Hold Storage

Key:    hold:{appointmentTypeId}:{slotStartDate}:{specialistId}
Value:  JSON HoldPayload
TTL:    30 seconds (default), extended by heartbeat
Set:    NX (atomic, fails if already exists)

Example:

Key:    hold:550e8400-...:2025-03-15T09:00:00Z:770a1200-...
Value:  {"holdId":"abc123","clientId":"sess_xyz","appointmentTypeId":"550e8400-...","specialistId":"770a1200-...","slotStartDate":"2025-03-15T09:00:00Z","slotEndDate":"2025-03-15T09:30:00Z","holdExpiresAt":"2025-03-15T08:55:30Z"}
TTL:    30000ms

Client Hold Index

Key:    client:{clientId}:holds
Type:   SET of holdId strings
TTL:    Same as hold (re-set on each heartbeat)

Tracks which holds belong to a client. Same TTL ensures cleanup when holds expire.

Pub/Sub Channels

Channel:  holds:events:{appointmentTypeId}
Messages: JSON HoldEvent objects

All hold state changes are published here. SSE stream subscribers receive these events.

Timeslot Cache

Key:    timeslots:{appointmentTypeId}:{specialistId|pooled}
Value:  JSON timeslot response
TTL:    300 seconds (5 minutes, configurable)

Invalidated when appointments, weekly hours, or overrides change.

Rate Limiting

Key:    client_limit:{clientId}:{appointmentTypeId}
Value:  JSON { bookedAt, appointmentTypeId, cooldownMinutes, appointmentId }
TTL:    cooldownMinutes * 60 seconds

Hold Creation: Priority-Based Assignment with Retry

When a client requests a hold without specifying specialistId, the scheduling domain selects the best specialist automatically:

1. Get candidate specialists (available at this slot)
   └── For each specialist: check weekly hours, overrides, appointments

2. Filter by priority (highest wins)
   └── If multiple specialists share top priority → step 3

3. Deterministic tiebreak (FNV-1a hash)
   └── Hash seed = fnv1a32(appointmentTypeId + ":" + slotStartDate)
   └── Same slot always produces same ordering → consistent, unbiased

4. Attempt SET NX on the selected specialist
   └── Success → publish "hold" event, return
   └── Failure (already held) → add to exclusion set, go to step 2

5. Repeat until success or no candidates remain

Why retry? Between candidate discovery and SET NX, another client may have claimed the same specialist. The retry loop tries the next-best candidate without re-checking availability (it was just checked).

See go/assignment.go for the full algorithm.


Race Condition Analysis

OperationMechanismSafety
Slot claimSET NX (Redis atomic)Safe — only one client wins
Client quotaSCARD then SET (not atomic)Soft limit — may exceed by 1 under concurrency. Acceptable.
Heartbeat during expiryGET then PEXPIREHold may expire between operations. Client sees false, retries.
Concurrent releaseTwo DEL on same keyFirst succeeds, second returns false. Idempotent.
Expiry cleanupRedis TTL on both hold + client setGuaranteed — no orphaned index entries
Hold expiry during bookingHold expires between GET and INSERTEdge case — booking handler should verify hold exists before creating appointment

SSE Stream Protocol

Connection Setup

GET /v1/holds/stream?appointmentTypeId={id}&clientId={cid}&leaseMs={ms}

Response Headers:
  Content-Type: text/event-stream; charset=utf-8
  Cache-Control: no-cache, no-transform
  Connection: keep-alive
  X-Accel-Buffering: no

Connection Lifecycle

1. Generate unique connectionId
2. Deduplicate: if client already has a stream, send "replaced" end event
   to old connection and close it
3. Send retry hint: "retry: 5000\n\n"
4. Send init event
5. Load snapshot: listActiveHolds(appointmentTypeId)
   └── For each active hold: send "hold" event with isOwnHold flag
   └── For expired holds (TTL <= 0): send "release" event
6. Subscribe to Redis channel: holds:events:{appointmentTypeId}
7. Send connected event
8. Start ping interval (every 15s)
9. Start lease timeout (default 15 min, max 1 hour)
10. Forward Redis pub/sub events to client with filtering

Event Filtering

Not all events are forwarded to all clients:

Event TypeForward Rule
holdAlways (affects slot availability)
releaseAlways (affects slot availability)
confirmAlways (affects slot availability)
heartbeatOnly to the hold owner (isOwnHold=true)
System events (init, connected, ping, end)Always

Each forwarded event includes isOwnHold: true|false based on clientId match.

Event Format

All events are SSE-formatted:

data: {"type":"hold","holdId":"abc","clientId":"xyz","appointmentTypeId":"...","specialistId":"...","slotStartDate":"...","slotEndDate":"...","holdExpiresAt":"...","isOwnHold":false}\n\n

Connection Termination

ReasonTriggerClient Action
lease-expiredServer-side timeout (15 min default)Reconnect
send-failedWrite to stream failedReconnect
init-failedSnapshot/subscription setup failedReconnect
client-abortClient closed connection
client-cancelStream cancelled
server-shutdownProcess terminationReconnect
replacedSame client opened new streamUse new stream
unknownUnexpected errorReconnect

Per-Client Deduplication

Only one SSE connection per clientId is allowed. If a client opens a new stream while one is active, the old stream receives an end event with reason: "replaced" and is closed.

Memory Management

The SSE handler monitors process memory:

  • Warning at heap > 500MB or RSS > 1GB
  • Connection count tracked per appointment type for metrics
  • Graceful cleanup on SIGTERM/SIGINT

Rate Limiting

Flow

1. Client requests hold → checkClientRateLimit(clientId, appointmentTypeId, cooldownMinutes)
   └── If limited: reject with remaining cooldown time
   └── If not: proceed

2. Client completes booking → setClientRateLimit(clientId, appointmentTypeId, cooldownMinutes, appointmentId)
   └── Redis SET with TTL = cooldownMinutes * 60

3. Appointment cancelled → clearRateLimitByAppointmentId(appointmentId, appointmentTypeId)
   └── SCAN for matching key, delete if found
   └── Client can book again immediately

Design Decisions

  • Rate limit is set after successful appointment creation, not before. Failed bookings don't consume cooldown.
  • Rate limit check is fail-open — Redis errors are swallowed, client proceeds.
  • Clearing by appointmentId requires a SCAN (searches all keys matching client_limit:*:{appointmentTypeId}). Acceptable at current scale.
  • Default cooldown: 1440 minutes (24 hours). Configurable per appointment type.

Timeslot Caching

Cache Key

timeslots:{appointmentTypeId}:{specialistId|pooled}

TTL

300 seconds (5 minutes), configurable via TIMESLOTS_CACHE_TTL_SECONDS env var.

Invalidation

Cache is invalidated (keys deleted via SCAN + DEL) when:

EventKeys Invalidated
Appointment createdtimeslots:{appointmentTypeId}:*
Appointment cancelledtimeslots:{appointmentTypeId}:*
Appointment rescheduledtimeslots:{appointmentTypeId}:*
Weekly hours changedAll timeslots:* for affected appointment types
Override created/updated/deletedtimeslots:{appointmentTypeId}:*
Appointment type updatedtimeslots:{appointmentTypeId}:*
Specialist deletedAll timeslots:* for affected appointment types

Go Implementation Reference

ConcernGo File
Hold CRUD, heartbeat, release, confirmgo/holds.go
Priority-based specialist assignmentgo/assignment.go
Availability engine (slot calculation)go/availability.go
Type definitions, Redis key buildersgo/types.go

SSE in Go

The SSE stream handler is not included in the Go files because it's primarily HTTP/infrastructure code. In Go, the implementation is simpler than the Node.js version:

go
// Sketch — not full implementation
func (h *Handler) HandleHoldStream(w http.ResponseWriter, r *http.Request) {
    flusher, ok := w.(http.Flusher)
    if !ok {
        http.Error(w, "streaming not supported", http.StatusInternalServerError)
        return
    }

    w.Header().Set("Content-Type", "text/event-stream")
    w.Header().Set("Cache-Control", "no-cache")
    w.Header().Set("Connection", "keep-alive")

    ctx := r.Context() // cancelled on client disconnect

    // Subscribe to Redis pub/sub
    sub := h.redis.Subscribe(ctx, EventChannel(appointmentTypeID))
    defer sub.Close()

    // Send snapshot, then forward events
    // ctx.Done() handles cleanup automatically — no manual memory management
}

Key advantages in Go:

  • context.Context handles cancellation and cleanup automatically
  • http.Flusher for SSE is built into net/http
  • Goroutines for per-connection handling (no manual connection registry needed)
  • go-redis pub/sub integrates cleanly with context cancellation