Hold System, Redis Architecture & SSE Protocol

Overview

The hold system prevents double-booking by allowing clients to temporarily reserve ("hold") a timeslot before confirming. Holds are backed by Redis with TTL-based auto-expiry, and state changes are streamed to clients in real-time via Server-Sent Events (SSE). This document reflects the merged scheduling domain architecture used in the platform.

Hold Lifecycle

Client                          Redis                           Other Clients (via SSE)
  │                               │                               │
  │  POST /v1/holds               │                               │
  │  ─────────────────────►       │                               │
  │                               │                               │
  │  1. Check client quota        │                               │
  │     SCARD client:{cid}:holds  │                               │
  │                               │                               │
  │  2. Pick specialist by priority│                               │
  │     (availability.go logic)   │                               │
  │                               │                               │
  │  3. Atomic claim              │                               │
  │     SET hold:{atid}:{slot}:{spid}                             │
  │         value PX 30000 NX ───►│                               │
  │                               │  PUBLISH holds:events:{atid}  │
  │                               │  ─────────────────────────────►│  "hold" event
  │  4. Index by client           │                               │
  │     SADD client:{cid}:holds   │                               │
  │                               │                               │
  │  ◄──── { holdId, specialistId }│                              │
  │                               │                               │
  │  PATCH /v1/holds (heartbeat)  │                               │
  │  ─────────────────────►       │                               │
  │     PEXPIRE hold key + set    │  PUBLISH "heartbeat" event    │
  │  ◄──── { ok: true }          │  ─────────────────────────────►│
  │                               │                               │
  │  POST /v1/appointment-types/{id}/book                         │
  │  ─────────────────────►       │                               │
  │     DEL hold key              │  PUBLISH "confirm" event      │
  │     SREM from client set      │  ─────────────────────────────►│
  │     INSERT appointment into DB│                               │
  │  ◄──── { appointment }       │                               │
  │                               │                               │
  │       ── OR (no heartbeat) ── │                               │
  │                               │                               │
  │                 TTL expires ──►│                               │
  │                 Key auto-deleted                               │
  │                 (no event published on expiry)                 │

Redis Key Patterns

Hold Storage

Key:    hold:{appointmentTypeId}:{slotStartDate}:{specialistId}
Value:  JSON HoldPayload
TTL:    30 seconds (default), extended by heartbeat
Set:    NX (atomic, fails if already exists)

Example:

Key:    hold:550e8400-...:2025-03-15T09:00:00Z:770a1200-...
Value:  {"holdId":"abc123","clientId":"sess_xyz","appointmentTypeId":"550e8400-...","specialistId":"770a1200-...","slotStartDate":"2025-03-15T09:00:00Z","slotEndDate":"2025-03-15T09:30:00Z","holdExpiresAt":"2025-03-15T08:55:30Z"}
TTL:    30000ms

Client Hold Index

Key:    client:{clientId}:holds
Type:   SET of holdId strings
TTL:    Same as hold (re-set on each heartbeat)

Tracks which holds belong to a client. Same TTL ensures cleanup when holds expire.

Pub/Sub Channels

Channel:  holds:events:{appointmentTypeId}
Messages: JSON HoldEvent objects

All hold state changes are published here. SSE stream subscribers receive these events.

Timeslot Cache

Key:    timeslots:{appointmentTypeId}:{specialistId|pooled}
Value:  JSON timeslot response
TTL:    300 seconds (5 minutes, configurable)

Invalidated when appointments, weekly hours, or overrides change.

Rate Limiting

Key:    client_limit:{clientId}:{appointmentTypeId}
Value:  JSON { bookedAt, appointmentTypeId, cooldownMinutes, appointmentId }
TTL:    cooldownMinutes * 60 seconds

Hold Creation: Priority-Based Assignment with Retry

When a client requests a hold without specifying specialistId, the scheduling domain selects the best specialist automatically:

1. Get candidate specialists (available at this slot)
   └── For each specialist: check weekly hours, overrides, appointments

2. Filter by priority (highest wins)
   └── If multiple specialists share top priority → step 3

3. Deterministic tiebreak (FNV-1a hash)
   └── Hash seed = fnv1a32(appointmentTypeId + ":" + slotStartDate)
   └── Same slot always produces same ordering → consistent, unbiased

4. Attempt SET NX on the selected specialist
   └── Success → publish "hold" event, return
   └── Failure (already held) → add to exclusion set, go to step 2

5. Repeat until success or no candidates remain

Why retry? Between candidate discovery and SET NX, another client may have claimed the same specialist. The retry loop tries the next-best candidate without re-checking availability (it was just checked).

See go/assignment.go for the full algorithm.

Race Condition Analysis

Operation	Mechanism	Safety
Slot claim	`SET NX` (Redis atomic)	Safe — only one client wins
Client quota	`SCARD` then `SET` (not atomic)	Soft limit — may exceed by 1 under concurrency. Acceptable.
Heartbeat during expiry	`GET` then `PEXPIRE`	Hold may expire between operations. Client sees false, retries.
Concurrent release	Two `DEL` on same key	First succeeds, second returns false. Idempotent.
Expiry cleanup	Redis TTL on both hold + client set	Guaranteed — no orphaned index entries
Hold expiry during booking	Hold expires between `GET` and `INSERT`	Edge case — booking handler should verify hold exists before creating appointment

SSE Stream Protocol

Connection Setup

GET /v1/holds/stream?appointmentTypeId={id}&clientId={cid}&leaseMs={ms}

Response Headers:
  Content-Type: text/event-stream; charset=utf-8
  Cache-Control: no-cache, no-transform
  Connection: keep-alive
  X-Accel-Buffering: no

Connection Lifecycle

1. Generate unique connectionId
2. Deduplicate: if client already has a stream, send "replaced" end event
   to old connection and close it
3. Send retry hint: "retry: 5000\n\n"
4. Send init event
5. Load snapshot: listActiveHolds(appointmentTypeId)
   └── For each active hold: send "hold" event with isOwnHold flag
   └── For expired holds (TTL <= 0): send "release" event
6. Subscribe to Redis channel: holds:events:{appointmentTypeId}
7. Send connected event
8. Start ping interval (every 15s)
9. Start lease timeout (default 15 min, max 1 hour)
10. Forward Redis pub/sub events to client with filtering

Event Filtering

Not all events are forwarded to all clients:

Event Type	Forward Rule
`hold`	Always (affects slot availability)
`release`	Always (affects slot availability)
`confirm`	Always (affects slot availability)
`heartbeat`	Only to the hold owner (isOwnHold=true)
System events (`init`, `connected`, `ping`, `end`)	Always

Each forwarded event includes isOwnHold: true|false based on clientId match.

Event Format

All events are SSE-formatted:

data: {"type":"hold","holdId":"abc","clientId":"xyz","appointmentTypeId":"...","specialistId":"...","slotStartDate":"...","slotEndDate":"...","holdExpiresAt":"...","isOwnHold":false}\n\n

Connection Termination

Reason	Trigger	Client Action
`lease-expired`	Server-side timeout (15 min default)	Reconnect
`send-failed`	Write to stream failed	Reconnect
`init-failed`	Snapshot/subscription setup failed	Reconnect
`client-abort`	Client closed connection	—
`client-cancel`	Stream cancelled	—
`server-shutdown`	Process termination	Reconnect
`replaced`	Same client opened new stream	Use new stream
`unknown`	Unexpected error	Reconnect

Per-Client Deduplication

Only one SSE connection per clientId is allowed. If a client opens a new stream while one is active, the old stream receives an end event with reason: "replaced" and is closed.

Memory Management

The SSE handler monitors process memory:

Warning at heap > 500MB or RSS > 1GB
Connection count tracked per appointment type for metrics
Graceful cleanup on SIGTERM/SIGINT

Rate Limiting

Flow

1. Client requests hold → checkClientRateLimit(clientId, appointmentTypeId, cooldownMinutes)
   └── If limited: reject with remaining cooldown time
   └── If not: proceed

2. Client completes booking → setClientRateLimit(clientId, appointmentTypeId, cooldownMinutes, appointmentId)
   └── Redis SET with TTL = cooldownMinutes * 60

3. Appointment cancelled → clearRateLimitByAppointmentId(appointmentId, appointmentTypeId)
   └── SCAN for matching key, delete if found
   └── Client can book again immediately

Design Decisions

Rate limit is set after successful appointment creation, not before. Failed bookings don't consume cooldown.
Rate limit check is fail-open — Redis errors are swallowed, client proceeds.
Clearing by appointmentId requires a SCAN (searches all keys matching client_limit:*:{appointmentTypeId}). Acceptable at current scale.
Default cooldown: 1440 minutes (24 hours). Configurable per appointment type.

Timeslot Caching

Cache Key

timeslots:{appointmentTypeId}:{specialistId|pooled}

TTL

300 seconds (5 minutes), configurable via TIMESLOTS_CACHE_TTL_SECONDS env var.

Invalidation

Cache is invalidated (keys deleted via SCAN + DEL) when:

Event	Keys Invalidated
Appointment created	`timeslots:{appointmentTypeId}:*`
Appointment cancelled	`timeslots:{appointmentTypeId}:*`
Appointment rescheduled	`timeslots:{appointmentTypeId}:*`
Weekly hours changed	All `timeslots:*` for affected appointment types
Override created/updated/deleted	`timeslots:{appointmentTypeId}:*`
Appointment type updated	`timeslots:{appointmentTypeId}:*`
Specialist deleted	All `timeslots:*` for affected appointment types

Go Implementation Reference

Concern	Go File
Hold CRUD, heartbeat, release, confirm	go/holds.go
Priority-based specialist assignment	go/assignment.go
Availability engine (slot calculation)	go/availability.go
Type definitions, Redis key builders	go/types.go

SSE in Go

The SSE stream handler is not included in the Go files because it's primarily HTTP/infrastructure code. In Go, the implementation is simpler than the Node.js version:

// Sketch — not full implementation
func (h *Handler) HandleHoldStream(w http.ResponseWriter, r *http.Request) {
    flusher, ok := w.(http.Flusher)
    if !ok {
        http.Error(w, "streaming not supported", http.StatusInternalServerError)
        return
    }

    w.Header().Set("Content-Type", "text/event-stream")
    w.Header().Set("Cache-Control", "no-cache")
    w.Header().Set("Connection", "keep-alive")

    ctx := r.Context() // cancelled on client disconnect

    // Subscribe to Redis pub/sub
    sub := h.redis.Subscribe(ctx, EventChannel(appointmentTypeID))
    defer sub.Close()

    // Send snapshot, then forward events
    // ctx.Done() handles cleanup automatically — no manual memory management
}

Key advantages in Go:

context.Context handles cancellation and cleanup automatically
http.Flusher for SSE is built into net/http
Goroutines for per-connection handling (no manual connection registry needed)
go-redis pub/sub integrates cleanly with context cancellation

Hold System, Redis Architecture & SSE Protocol ​

Overview ​

Hold Lifecycle ​

Redis Key Patterns ​

Hold Storage ​

Client Hold Index ​

Pub/Sub Channels ​

Timeslot Cache ​

Rate Limiting ​

Hold Creation: Priority-Based Assignment with Retry ​

Race Condition Analysis ​

SSE Stream Protocol ​

Connection Setup ​

Connection Lifecycle ​

Event Filtering ​

Event Format ​

Connection Termination ​

Per-Client Deduplication ​

Memory Management ​

Rate Limiting ​

Flow ​

Design Decisions ​

Timeslot Caching ​

Cache Key ​

TTL ​

Invalidation ​

Go Implementation Reference ​

SSE in Go ​

Hold System, Redis Architecture & SSE Protocol

Overview

Hold Lifecycle

Redis Key Patterns

Hold Storage

Client Hold Index

Pub/Sub Channels

Timeslot Cache

Rate Limiting

Hold Creation: Priority-Based Assignment with Retry

Race Condition Analysis

SSE Stream Protocol

Connection Setup

Connection Lifecycle

Event Filtering

Event Format

Connection Termination

Per-Client Deduplication

Memory Management

Rate Limiting

Flow

Design Decisions

Timeslot Caching

Cache Key

TTL

Invalidation

Go Implementation Reference

SSE in Go