Telemetry

How patient engagement and pose-tracking data flow through the platform — what's tracked, where it goes, and how features integrate with the telemetry layer.

Layer 2 feature, not yet implemented. For the full architecture and rationale, see ../telemetry/index.md. For ADR-level reasoning, see decisions.md → Why telemetry is PG + S3, not ClickHouse. For infrastructure monitoring (Datadog, UptimeRobot), see external-providers.md.

What telemetry covers (and what it does NOT)

Telemetry exists for two concrete product needs:

Domain	What's tracked	Storage	Retention
Exercise video engagement	Play/pause/seek/heartbeat/buffering/milestone/end events from Patient Portal	Postgres (`media_session_metrics`, `media_buffering_events`)	2 years
Pose-detection data	MediaPipe landmark frames during pose-tracked exercises; server-computed form scores; full-session replay	Postgres aggregates + S3 replay blobs	Aggregates 7y (clinical), replay blobs 6mo

Earlier specs called many things "telemetry" that aren't. They live elsewhere:

Concern	Lives in
Compliance audit (HIPAA / GDPR / MDR forensic trail)	`audit_log` in Core API Postgres (P10, monthly partitioned per P41). Not telemetry.
Usage-based billing	`usage_records` / `usage_quotas` / `usage_summaries` (1C.7)
AI provenance	`audit_ai_provenance` sibling table to `audit_log`
Security signals	mostly `audit_log`; SIEM-shaped concerns deferred
Server observability (latency, traces, errors)	OTel → Datadog/Grafana, not bespoke telemetry

This split is the core simplification — see ../telemetry/index.md → Scope.

How features integrate

Appointments → Audit (not telemetry)

Every appointment action is captured by Core API's audit middleware to the local audit_log table. This is not part of telemetry — it's the compliance audit trail. Telemetry never receives audit events; the previous "audit forwarding to telemetry" design is rejected — see ../telemetry/index.md.

Exercise Library → Video engagement events

When a patient watches an exercise video in the Patient Portal, the browser sends events to the Telemetry API.

Patient opens exercise video
  → session_start event (TTFB, load time, connection info)
  → heartbeat every 10s (buffering, bitrate, quality, dropped frames)
  → buffering_start/end (per-stall detail)
  → quality_change (ABR switches)
  → milestone (25%, 50%, 75%, 95% watched)
  → session_end (final stats, completion status)

All → POST /v1/media/events on Telemetry API
  → server aggregates into media_session_metrics + media_buffering_events (PG)

See ../telemetry/media-events.md for the full event specification.

Consent gate: analytics per-purpose flag must be active for the patient in the org.

Treatment Plans → Pose tracking + Video engagement

Pose-tracked exercise sessions combine two ingest paths:

Patient starts pose-tracked session
  │
  ├── Exercise video plays
  │     → media events (same as above)
  │
  ├── MediaPipe runs in browser, outputs 33 landmarks per frame
  │     → 1-second batches, binary float32 + gzip, ~3 MB per 30-min session
  │     → POST /v1/pose/frames on Telemetry API (signed session token)
  │     → Telemetry API appends to S3 multipart buffer
  │
  ├── Session completes
  │     → POST /v1/sessions/{id}/end
  │     → Telemetry API computes form_score / ROM / rep_count from landmarks
  │     → Finalizes S3 replay blob: s3://restartix-telemetry/{org_id}/{session_id}.bin.gz
  │     → Publishes events.Bus event
  │     → Core API subscriber writes pose_session_metrics, pose_rep_metrics, updates patient_exercise_logs
  │
  └── Specialist review
        → Clinic app GET /v1/exercise-sessions/{id} → PG aggregates
        → Clinic app GET /v1/exercise-sessions/{id}/replay → signed S3 URL → blob fetch in browser

Consent gate: biometric per-purpose flag must be active for pose ingest. analytics for media events. Two named flags, no consent ladder.

Forms → Audit (not telemetry)

Form submissions are captured by audit middleware — same audit_log flow, no telemetry involvement.

Automations → events.Bus (not telemetry)

Automation executions publish on the internal events.Bus and are captured by audit middleware. They do not flow to Telemetry API. Earlier docs that suggested "automation analytics in ClickHouse" were rejected — automation effectiveness queries run against automation_executions in Core API Postgres directly.

Two named per-purpose consent flags using the existing foundation per-purpose consent ledger (1B.9):

Purpose code	Gates
`analytics`	Media events (video lifecycle from Patient Portal)
`biometric`	Pose ingest (MediaPipe landmark frames)

Telemetry API rejects ingest with 403 if the matching consent flag is not active. Withdrawal takes effect immediately — consents ledger flips, the next batch is rejected.

The previous spec's 0–3 consent ladder is rejected — it does not match the platform's actual per-purpose model and was a poor fit for the GDPR Art. 6 lawful-basis structure.

Data flow summary

Patient Portal (browser)
  │
  ├── Video player ────► media events (session_start, heartbeat, etc.)
  ├── Pose camera ─────► pose batches (binary float32 + gzip, 1-sec)
  └── Session finalizer ► POST /v1/sessions/{id}/end
                          │
                          ▼
                    Telemetry API (separate Go service, Cat F principal)
                          │
                          ├─► S3 multipart: in-flight buffer + finalized replay blob
                          │
                          └─► server-side aggregation + events.Bus event
                                    │
                                    ▼
                              Core API subscriber
                                    │
                                    ▼
                              Postgres aggregates (RLS, audit, classified)
                                    │
                                    ▼
                          Clinic app, Patient Portal, Console (reads via Core API)

Feature integration checklist

When adding a new feature, ask what it needs:

Question	If yes	Action
Does it generate authenticated user actions?	Audit covers it (P10) — automatic	No code
Does it play exercise video?	Frontend work	Send media events to `POST /v1/media/events` (Telemetry API)
Does it use the camera for pose tracking?	Frontend work	Send pose batches to `POST /v1/pose/frames` (Telemetry API)
Does it need a per-org dashboard?	Backend + frontend	Read PG aggregates via Core API
Does it need cross-tenant aggregates?	Out of scope today	Defer or re-discuss — may be Tier 3 trigger
Does it need user-facing error reporting?	Off-the-shelf (Sentry-equivalent)	Not telemetry

Failure modes

Failure	Impact	Recovery
Telemetry API down	Patient session loses live ingest; client buffers in IndexedDB until reconnect	Buffered batches flush on reconnect
`session_end` never arrives (browser closed mid-session)	Server-side timeout (10 min silence) finalizes as `incomplete`; partial data preserved	Specialist sees `incomplete` flag
events.Bus delivery delayed	Aggregates appear in PG with delay	Outbox dispatcher (mirrors 1C.4 pattern) ensures eventual delivery
S3 unavailable for blob finalize	Aggregation event still publishes; replay blob URL marked `pending`	Retry job fetches from in-flight buffer
Consent withdrawn mid-session	Next batch rejected with 403	Client stops sending; existing data retained per consent-revocation rules

Scaling roadmap

Tier	Peak concurrent	Architecture	Trigger
0	up to ~1 000	Single Telemetry API, PG primary, S3	—
1	1 000 – 10 000	Telemetry API horizontal, PG read replica, materialized views	Dashboard p95 > 500ms
2	10 000 – 50 000	Monthly partitioning, Kinesis, Athena/Glue for ad-hoc	Replica lag, S3 multipart limits
3	50 000+	ClickHouse for cross-tenant analytics surfaces	Cross-tenant query > 1s after view tuning

See ../telemetry/index.md → Scaling roadmap for full details and the swap-point interfaces that make tier transitions bounded.

Key docs

Doc	What it covers
../telemetry/index.md	Architecture, design rationale, scaling roadmap
../telemetry/api.md	Three typed ingest endpoints + signed-token auth
../telemetry/media-events.md	Video event taxonomy, frontend integration sketch
decisions.md → Why telemetry is PG + S3	ADR for the redesign
external-providers.md	Infrastructure monitoring (Datadog, UptimeRobot) — separate concern from telemetry

Telemetry ​

What telemetry covers (and what it does NOT) ​

How features integrate ​

Appointments → Audit (not telemetry) ​

Exercise Library → Video engagement events ​

Treatment Plans → Pose tracking + Video engagement ​

Forms → Audit (not telemetry) ​

Automations → events.Bus (not telemetry) ​

Consent model ​

Data flow summary ​

Feature integration checklist ​

Failure modes ​

Scaling roadmap ​

Key docs ​