Telemetry
How patient engagement and pose-tracking data flow through the platform — what's tracked, where it goes, and how features integrate with the telemetry layer.
Layer 2 feature, not yet implemented. For the full architecture and rationale, see ../telemetry/index.md. For ADR-level reasoning, see decisions.md → Why telemetry is PG + S3, not ClickHouse. For infrastructure monitoring (Datadog, UptimeRobot), see external-providers.md.
What telemetry covers (and what it does NOT)
Telemetry exists for two concrete product needs:
| Domain | What's tracked | Storage | Retention |
|---|---|---|---|
| Exercise video engagement | Play/pause/seek/heartbeat/buffering/milestone/end events from Patient Portal | Postgres (media_session_metrics, media_buffering_events) | 2 years |
| Pose-detection data | MediaPipe landmark frames during pose-tracked exercises; server-computed form scores; full-session replay | Postgres aggregates + S3 replay blobs | Aggregates 7y (clinical), replay blobs 6mo |
Earlier specs called many things "telemetry" that aren't. They live elsewhere:
| Concern | Lives in |
|---|---|
| Compliance audit (HIPAA / GDPR / MDR forensic trail) | audit_log in Core API Postgres (P10, monthly partitioned per P41). Not telemetry. |
| Usage-based billing | usage_records / usage_quotas / usage_summaries (1C.7) |
| AI provenance | audit_ai_provenance sibling table to audit_log |
| Security signals | mostly audit_log; SIEM-shaped concerns deferred |
| Server observability (latency, traces, errors) | OTel → Datadog/Grafana, not bespoke telemetry |
This split is the core simplification — see ../telemetry/index.md → Scope.
How features integrate
Appointments → Audit (not telemetry)
Every appointment action is captured by Core API's audit middleware to the local audit_log table. This is not part of telemetry — it's the compliance audit trail. Telemetry never receives audit events; the previous "audit forwarding to telemetry" design is rejected — see ../telemetry/index.md.
Exercise Library → Video engagement events
When a patient watches an exercise video in the Patient Portal, the browser sends events to the Telemetry API.
Patient opens exercise video
→ session_start event (TTFB, load time, connection info)
→ heartbeat every 10s (buffering, bitrate, quality, dropped frames)
→ buffering_start/end (per-stall detail)
→ quality_change (ABR switches)
→ milestone (25%, 50%, 75%, 95% watched)
→ session_end (final stats, completion status)
All → POST /v1/media/events on Telemetry API
→ server aggregates into media_session_metrics + media_buffering_events (PG)See ../telemetry/media-events.md for the full event specification.
Consent gate: analytics per-purpose flag must be active for the patient in the org.
Treatment Plans → Pose tracking + Video engagement
Pose-tracked exercise sessions combine two ingest paths:
Patient starts pose-tracked session
│
├── Exercise video plays
│ → media events (same as above)
│
├── MediaPipe runs in browser, outputs 33 landmarks per frame
│ → 1-second batches, binary float32 + gzip, ~3 MB per 30-min session
│ → POST /v1/pose/frames on Telemetry API (signed session token)
│ → Telemetry API appends to S3 multipart buffer
│
├── Session completes
│ → POST /v1/sessions/{id}/end
│ → Telemetry API computes form_score / ROM / rep_count from landmarks
│ → Finalizes S3 replay blob: s3://restartix-telemetry/{org_id}/{session_id}.bin.gz
│ → Publishes events.Bus event
│ → Core API subscriber writes pose_session_metrics, pose_rep_metrics, updates patient_exercise_logs
│
└── Specialist review
→ Clinic app GET /v1/exercise-sessions/{id} → PG aggregates
→ Clinic app GET /v1/exercise-sessions/{id}/replay → signed S3 URL → blob fetch in browserConsent gate: biometric per-purpose flag must be active for pose ingest. analytics for media events. Two named flags, no consent ladder.
Forms → Audit (not telemetry)
Form submissions are captured by audit middleware — same audit_log flow, no telemetry involvement.
Automations → events.Bus (not telemetry)
Automation executions publish on the internal events.Bus and are captured by audit middleware. They do not flow to Telemetry API. Earlier docs that suggested "automation analytics in ClickHouse" were rejected — automation effectiveness queries run against automation_executions in Core API Postgres directly.
Consent model
Two named per-purpose consent flags using the existing foundation per-purpose consent ledger (1B.9):
| Purpose code | Gates |
|---|---|
analytics | Media events (video lifecycle from Patient Portal) |
biometric | Pose ingest (MediaPipe landmark frames) |
Telemetry API rejects ingest with 403 if the matching consent flag is not active. Withdrawal takes effect immediately — consents ledger flips, the next batch is rejected.
The previous spec's 0–3 consent ladder is rejected — it does not match the platform's actual per-purpose model and was a poor fit for the GDPR Art. 6 lawful-basis structure.
Data flow summary
Patient Portal (browser)
│
├── Video player ────► media events (session_start, heartbeat, etc.)
├── Pose camera ─────► pose batches (binary float32 + gzip, 1-sec)
└── Session finalizer ► POST /v1/sessions/{id}/end
│
▼
Telemetry API (separate Go service, Cat F principal)
│
├─► S3 multipart: in-flight buffer + finalized replay blob
│
└─► server-side aggregation + events.Bus event
│
▼
Core API subscriber
│
▼
Postgres aggregates (RLS, audit, classified)
│
▼
Clinic app, Patient Portal, Console (reads via Core API)Feature integration checklist
When adding a new feature, ask what it needs:
| Question | If yes | Action |
|---|---|---|
| Does it generate authenticated user actions? | Audit covers it (P10) — automatic | No code |
| Does it play exercise video? | Frontend work | Send media events to POST /v1/media/events (Telemetry API) |
| Does it use the camera for pose tracking? | Frontend work | Send pose batches to POST /v1/pose/frames (Telemetry API) |
| Does it need a per-org dashboard? | Backend + frontend | Read PG aggregates via Core API |
| Does it need cross-tenant aggregates? | Out of scope today | Defer or re-discuss — may be Tier 3 trigger |
| Does it need user-facing error reporting? | Off-the-shelf (Sentry-equivalent) | Not telemetry |
Failure modes
| Failure | Impact | Recovery |
|---|---|---|
| Telemetry API down | Patient session loses live ingest; client buffers in IndexedDB until reconnect | Buffered batches flush on reconnect |
session_end never arrives (browser closed mid-session) | Server-side timeout (10 min silence) finalizes as incomplete; partial data preserved | Specialist sees incomplete flag |
| events.Bus delivery delayed | Aggregates appear in PG with delay | Outbox dispatcher (mirrors 1C.4 pattern) ensures eventual delivery |
| S3 unavailable for blob finalize | Aggregation event still publishes; replay blob URL marked pending | Retry job fetches from in-flight buffer |
| Consent withdrawn mid-session | Next batch rejected with 403 | Client stops sending; existing data retained per consent-revocation rules |
Scaling roadmap
| Tier | Peak concurrent | Architecture | Trigger |
|---|---|---|---|
| 0 | up to ~1 000 | Single Telemetry API, PG primary, S3 | — |
| 1 | 1 000 – 10 000 | Telemetry API horizontal, PG read replica, materialized views | Dashboard p95 > 500ms |
| 2 | 10 000 – 50 000 | Monthly partitioning, Kinesis, Athena/Glue for ad-hoc | Replica lag, S3 multipart limits |
| 3 | 50 000+ | ClickHouse for cross-tenant analytics surfaces | Cross-tenant query > 1s after view tuning |
See ../telemetry/index.md → Scaling roadmap for full details and the swap-point interfaces that make tier transitions bounded.
Key docs
| Doc | What it covers |
|---|---|
| ../telemetry/index.md | Architecture, design rationale, scaling roadmap |
| ../telemetry/api.md | Three typed ingest endpoints + signed-token auth |
| ../telemetry/media-events.md | Video event taxonomy, frontend integration sketch |
| decisions.md → Why telemetry is PG + S3 | ADR for the redesign |
| external-providers.md | Infrastructure monitoring (Datadog, UptimeRobot) — separate concern from telemetry |