Hold System, Redis Architecture & SSE Protocol
Overview
The hold system prevents double-booking by allowing clients to temporarily reserve ("hold") a timeslot before confirming. Holds are backed by Redis with TTL-based auto-expiry, and state changes are streamed to clients in real-time via Server-Sent Events (SSE). This document reflects the merged scheduling domain architecture used in the platform.
Hold Lifecycle
Client Redis Other Clients (via SSE)
│ │ │
│ POST /v1/holds │ │
│ ─────────────────────► │ │
│ │ │
│ 1. Check client quota │ │
│ SCARD client:{cid}:holds │ │
│ │ │
│ 2. Pick specialist by priority│ │
│ (availability.go logic) │ │
│ │ │
│ 3. Atomic claim │ │
│ SET hold:{atid}:{slot}:{spid} │
│ value PX 30000 NX ───►│ │
│ │ PUBLISH holds:events:{atid} │
│ │ ─────────────────────────────►│ "hold" event
│ 4. Index by client │ │
│ SADD client:{cid}:holds │ │
│ │ │
│ ◄──── { holdId, specialistId }│ │
│ │ │
│ PATCH /v1/holds (heartbeat) │ │
│ ─────────────────────► │ │
│ PEXPIRE hold key + set │ PUBLISH "heartbeat" event │
│ ◄──── { ok: true } │ ─────────────────────────────►│
│ │ │
│ POST /v1/appointment-types/{id}/book │
│ ─────────────────────► │ │
│ DEL hold key │ PUBLISH "confirm" event │
│ SREM from client set │ ─────────────────────────────►│
│ INSERT appointment into DB│ │
│ ◄──── { appointment } │ │
│ │ │
│ ── OR (no heartbeat) ── │ │
│ │ │
│ TTL expires ──►│ │
│ Key auto-deleted │
│ (no event published on expiry) │Redis Key Patterns
Hold Storage
Key: hold:{appointmentTypeId}:{slotStartDate}:{specialistId}
Value: JSON HoldPayload
TTL: 30 seconds (default), extended by heartbeat
Set: NX (atomic, fails if already exists)Example:
Key: hold:550e8400-...:2025-03-15T09:00:00Z:770a1200-...
Value: {"holdId":"abc123","clientId":"sess_xyz","appointmentTypeId":"550e8400-...","specialistId":"770a1200-...","slotStartDate":"2025-03-15T09:00:00Z","slotEndDate":"2025-03-15T09:30:00Z","holdExpiresAt":"2025-03-15T08:55:30Z"}
TTL: 30000msClient Hold Index
Key: client:{clientId}:holds
Type: SET of holdId strings
TTL: Same as hold (re-set on each heartbeat)Tracks which holds belong to a client. Same TTL ensures cleanup when holds expire.
Pub/Sub Channels
Channel: holds:events:{appointmentTypeId}
Messages: JSON HoldEvent objectsAll hold state changes are published here. SSE stream subscribers receive these events.
Timeslot Cache
Key: timeslots:{appointmentTypeId}:{specialistId|pooled}
Value: JSON timeslot response
TTL: 300 seconds (5 minutes, configurable)Invalidated when appointments, weekly hours, or overrides change.
Rate Limiting
Key: client_limit:{clientId}:{appointmentTypeId}
Value: JSON { bookedAt, appointmentTypeId, cooldownMinutes, appointmentId }
TTL: cooldownMinutes * 60 secondsHold Creation: Priority-Based Assignment with Retry
When a client requests a hold without specifying specialistId, the scheduling domain selects the best specialist automatically:
1. Get candidate specialists (available at this slot)
└── For each specialist: check weekly hours, overrides, appointments
2. Filter by priority (highest wins)
└── If multiple specialists share top priority → step 3
3. Deterministic tiebreak (FNV-1a hash)
└── Hash seed = fnv1a32(appointmentTypeId + ":" + slotStartDate)
└── Same slot always produces same ordering → consistent, unbiased
4. Attempt SET NX on the selected specialist
└── Success → publish "hold" event, return
└── Failure (already held) → add to exclusion set, go to step 2
5. Repeat until success or no candidates remainWhy retry? Between candidate discovery and SET NX, another client may have claimed the same specialist. The retry loop tries the next-best candidate without re-checking availability (it was just checked).
See go/assignment.go for the full algorithm.
Race Condition Analysis
| Operation | Mechanism | Safety |
|---|---|---|
| Slot claim | SET NX (Redis atomic) | Safe — only one client wins |
| Client quota | SCARD then SET (not atomic) | Soft limit — may exceed by 1 under concurrency. Acceptable. |
| Heartbeat during expiry | GET then PEXPIRE | Hold may expire between operations. Client sees false, retries. |
| Concurrent release | Two DEL on same key | First succeeds, second returns false. Idempotent. |
| Expiry cleanup | Redis TTL on both hold + client set | Guaranteed — no orphaned index entries |
| Hold expiry during booking | Hold expires between GET and INSERT | Edge case — booking handler should verify hold exists before creating appointment |
SSE Stream Protocol
Connection Setup
GET /v1/holds/stream?appointmentTypeId={id}&clientId={cid}&leaseMs={ms}
Response Headers:
Content-Type: text/event-stream; charset=utf-8
Cache-Control: no-cache, no-transform
Connection: keep-alive
X-Accel-Buffering: noConnection Lifecycle
1. Generate unique connectionId
2. Deduplicate: if client already has a stream, send "replaced" end event
to old connection and close it
3. Send retry hint: "retry: 5000\n\n"
4. Send init event
5. Load snapshot: listActiveHolds(appointmentTypeId)
└── For each active hold: send "hold" event with isOwnHold flag
└── For expired holds (TTL <= 0): send "release" event
6. Subscribe to Redis channel: holds:events:{appointmentTypeId}
7. Send connected event
8. Start ping interval (every 15s)
9. Start lease timeout (default 15 min, max 1 hour)
10. Forward Redis pub/sub events to client with filteringEvent Filtering
Not all events are forwarded to all clients:
| Event Type | Forward Rule |
|---|---|
hold | Always (affects slot availability) |
release | Always (affects slot availability) |
confirm | Always (affects slot availability) |
heartbeat | Only to the hold owner (isOwnHold=true) |
System events (init, connected, ping, end) | Always |
Each forwarded event includes isOwnHold: true|false based on clientId match.
Event Format
All events are SSE-formatted:
data: {"type":"hold","holdId":"abc","clientId":"xyz","appointmentTypeId":"...","specialistId":"...","slotStartDate":"...","slotEndDate":"...","holdExpiresAt":"...","isOwnHold":false}\n\nConnection Termination
| Reason | Trigger | Client Action |
|---|---|---|
lease-expired | Server-side timeout (15 min default) | Reconnect |
send-failed | Write to stream failed | Reconnect |
init-failed | Snapshot/subscription setup failed | Reconnect |
client-abort | Client closed connection | — |
client-cancel | Stream cancelled | — |
server-shutdown | Process termination | Reconnect |
replaced | Same client opened new stream | Use new stream |
unknown | Unexpected error | Reconnect |
Per-Client Deduplication
Only one SSE connection per clientId is allowed. If a client opens a new stream while one is active, the old stream receives an end event with reason: "replaced" and is closed.
Memory Management
The SSE handler monitors process memory:
- Warning at heap > 500MB or RSS > 1GB
- Connection count tracked per appointment type for metrics
- Graceful cleanup on
SIGTERM/SIGINT
Rate Limiting
Flow
1. Client requests hold → checkClientRateLimit(clientId, appointmentTypeId, cooldownMinutes)
└── If limited: reject with remaining cooldown time
└── If not: proceed
2. Client completes booking → setClientRateLimit(clientId, appointmentTypeId, cooldownMinutes, appointmentId)
└── Redis SET with TTL = cooldownMinutes * 60
3. Appointment cancelled → clearRateLimitByAppointmentId(appointmentId, appointmentTypeId)
└── SCAN for matching key, delete if found
└── Client can book again immediatelyDesign Decisions
- Rate limit is set after successful appointment creation, not before. Failed bookings don't consume cooldown.
- Rate limit check is fail-open — Redis errors are swallowed, client proceeds.
- Clearing by appointmentId requires a
SCAN(searches all keys matchingclient_limit:*:{appointmentTypeId}). Acceptable at current scale. - Default cooldown: 1440 minutes (24 hours). Configurable per appointment type.
Timeslot Caching
Cache Key
timeslots:{appointmentTypeId}:{specialistId|pooled}TTL
300 seconds (5 minutes), configurable via TIMESLOTS_CACHE_TTL_SECONDS env var.
Invalidation
Cache is invalidated (keys deleted via SCAN + DEL) when:
| Event | Keys Invalidated |
|---|---|
| Appointment created | timeslots:{appointmentTypeId}:* |
| Appointment cancelled | timeslots:{appointmentTypeId}:* |
| Appointment rescheduled | timeslots:{appointmentTypeId}:* |
| Weekly hours changed | All timeslots:* for affected appointment types |
| Override created/updated/deleted | timeslots:{appointmentTypeId}:* |
| Appointment type updated | timeslots:{appointmentTypeId}:* |
| Specialist deleted | All timeslots:* for affected appointment types |
Go Implementation Reference
| Concern | Go File |
|---|---|
| Hold CRUD, heartbeat, release, confirm | go/holds.go |
| Priority-based specialist assignment | go/assignment.go |
| Availability engine (slot calculation) | go/availability.go |
| Type definitions, Redis key builders | go/types.go |
SSE in Go
The SSE stream handler is not included in the Go files because it's primarily HTTP/infrastructure code. In Go, the implementation is simpler than the Node.js version:
// Sketch — not full implementation
func (h *Handler) HandleHoldStream(w http.ResponseWriter, r *http.Request) {
flusher, ok := w.(http.Flusher)
if !ok {
http.Error(w, "streaming not supported", http.StatusInternalServerError)
return
}
w.Header().Set("Content-Type", "text/event-stream")
w.Header().Set("Cache-Control", "no-cache")
w.Header().Set("Connection", "keep-alive")
ctx := r.Context() // cancelled on client disconnect
// Subscribe to Redis pub/sub
sub := h.redis.Subscribe(ctx, EventChannel(appointmentTypeID))
defer sub.Close()
// Send snapshot, then forward events
// ctx.Done() handles cleanup automatically — no manual memory management
}Key advantages in Go:
context.Contexthandles cancellation and cleanup automaticallyhttp.Flusherfor SSE is built intonet/http- Goroutines for per-connection handling (no manual connection registry needed)
go-redispub/sub integrates cleanly with context cancellation