Skip to content

Telemetry

How patient engagement and pose-tracking data flow through the platform — what's tracked, where it goes, and how features integrate with the telemetry layer.

Layer 2 feature, not yet implemented. For the full architecture and rationale, see ../telemetry/index.md. For ADR-level reasoning, see decisions.md → Why telemetry is PG + S3, not ClickHouse. For infrastructure monitoring (Datadog, UptimeRobot), see external-providers.md.


What telemetry covers (and what it does NOT)

Telemetry exists for two concrete product needs:

DomainWhat's trackedStorageRetention
Exercise video engagementPlay/pause/seek/heartbeat/buffering/milestone/end events from Patient PortalPostgres (media_session_metrics, media_buffering_events)2 years
Pose-detection dataMediaPipe landmark frames during pose-tracked exercises; server-computed form scores; full-session replayPostgres aggregates + S3 replay blobsAggregates 7y (clinical), replay blobs 6mo

Earlier specs called many things "telemetry" that aren't. They live elsewhere:

ConcernLives in
Compliance audit (HIPAA / GDPR / MDR forensic trail)audit_log in Core API Postgres (P10, monthly partitioned per P41). Not telemetry.
Usage-based billingusage_records / usage_quotas / usage_summaries (1C.7)
AI provenanceaudit_ai_provenance sibling table to audit_log
Security signalsmostly audit_log; SIEM-shaped concerns deferred
Server observability (latency, traces, errors)OTel → Datadog/Grafana, not bespoke telemetry

This split is the core simplification — see ../telemetry/index.md → Scope.


How features integrate

Appointments → Audit (not telemetry)

Every appointment action is captured by Core API's audit middleware to the local audit_log table. This is not part of telemetry — it's the compliance audit trail. Telemetry never receives audit events; the previous "audit forwarding to telemetry" design is rejected — see ../telemetry/index.md.

Exercise Library → Video engagement events

When a patient watches an exercise video in the Patient Portal, the browser sends events to the Telemetry API.

Patient opens exercise video
  → session_start event (TTFB, load time, connection info)
  → heartbeat every 10s (buffering, bitrate, quality, dropped frames)
  → buffering_start/end (per-stall detail)
  → quality_change (ABR switches)
  → milestone (25%, 50%, 75%, 95% watched)
  → session_end (final stats, completion status)

All → POST /v1/media/events on Telemetry API
  → server aggregates into media_session_metrics + media_buffering_events (PG)

See ../telemetry/media-events.md for the full event specification.

Consent gate: analytics per-purpose flag must be active for the patient in the org.

Treatment Plans → Pose tracking + Video engagement

Pose-tracked exercise sessions combine two ingest paths:

Patient starts pose-tracked session

  ├── Exercise video plays
  │     → media events (same as above)

  ├── MediaPipe runs in browser, outputs 33 landmarks per frame
  │     → 1-second batches, binary float32 + gzip, ~3 MB per 30-min session
  │     → POST /v1/pose/frames on Telemetry API (signed session token)
  │     → Telemetry API appends to S3 multipart buffer

  ├── Session completes
  │     → POST /v1/sessions/{id}/end
  │     → Telemetry API computes form_score / ROM / rep_count from landmarks
  │     → Finalizes S3 replay blob: s3://restartix-telemetry/{org_id}/{session_id}.bin.gz
  │     → Publishes events.Bus event
  │     → Core API subscriber writes pose_session_metrics, pose_rep_metrics, updates patient_exercise_logs

  └── Specialist review
        → Clinic app GET /v1/exercise-sessions/{id} → PG aggregates
        → Clinic app GET /v1/exercise-sessions/{id}/replay → signed S3 URL → blob fetch in browser

Consent gate: biometric per-purpose flag must be active for pose ingest. analytics for media events. Two named flags, no consent ladder.

Forms → Audit (not telemetry)

Form submissions are captured by audit middleware — same audit_log flow, no telemetry involvement.

Automations → events.Bus (not telemetry)

Automation executions publish on the internal events.Bus and are captured by audit middleware. They do not flow to Telemetry API. Earlier docs that suggested "automation analytics in ClickHouse" were rejected — automation effectiveness queries run against automation_executions in Core API Postgres directly.


Two named per-purpose consent flags using the existing foundation per-purpose consent ledger (1B.9):

Purpose codeGates
analyticsMedia events (video lifecycle from Patient Portal)
biometricPose ingest (MediaPipe landmark frames)

Telemetry API rejects ingest with 403 if the matching consent flag is not active. Withdrawal takes effect immediately — consents ledger flips, the next batch is rejected.

The previous spec's 0–3 consent ladder is rejected — it does not match the platform's actual per-purpose model and was a poor fit for the GDPR Art. 6 lawful-basis structure.


Data flow summary

Patient Portal (browser)

  ├── Video player ────► media events (session_start, heartbeat, etc.)
  ├── Pose camera ─────► pose batches (binary float32 + gzip, 1-sec)
  └── Session finalizer ► POST /v1/sessions/{id}/end


                    Telemetry API (separate Go service, Cat F principal)

                          ├─► S3 multipart: in-flight buffer + finalized replay blob

                          └─► server-side aggregation + events.Bus event


                              Core API subscriber


                              Postgres aggregates (RLS, audit, classified)


                          Clinic app, Patient Portal, Console (reads via Core API)

Feature integration checklist

When adding a new feature, ask what it needs:

QuestionIf yesAction
Does it generate authenticated user actions?Audit covers it (P10) — automaticNo code
Does it play exercise video?Frontend workSend media events to POST /v1/media/events (Telemetry API)
Does it use the camera for pose tracking?Frontend workSend pose batches to POST /v1/pose/frames (Telemetry API)
Does it need a per-org dashboard?Backend + frontendRead PG aggregates via Core API
Does it need cross-tenant aggregates?Out of scope todayDefer or re-discuss — may be Tier 3 trigger
Does it need user-facing error reporting?Off-the-shelf (Sentry-equivalent)Not telemetry

Failure modes

FailureImpactRecovery
Telemetry API downPatient session loses live ingest; client buffers in IndexedDB until reconnectBuffered batches flush on reconnect
session_end never arrives (browser closed mid-session)Server-side timeout (10 min silence) finalizes as incomplete; partial data preservedSpecialist sees incomplete flag
events.Bus delivery delayedAggregates appear in PG with delayOutbox dispatcher (mirrors 1C.4 pattern) ensures eventual delivery
S3 unavailable for blob finalizeAggregation event still publishes; replay blob URL marked pendingRetry job fetches from in-flight buffer
Consent withdrawn mid-sessionNext batch rejected with 403Client stops sending; existing data retained per consent-revocation rules

Scaling roadmap

TierPeak concurrentArchitectureTrigger
0up to ~1 000Single Telemetry API, PG primary, S3
11 000 – 10 000Telemetry API horizontal, PG read replica, materialized viewsDashboard p95 > 500ms
210 000 – 50 000Monthly partitioning, Kinesis, Athena/Glue for ad-hocReplica lag, S3 multipart limits
350 000+ClickHouse for cross-tenant analytics surfacesCross-tenant query > 1s after view tuning

See ../telemetry/index.md → Scaling roadmap for full details and the swap-point interfaces that make tier transitions bounded.


Key docs

DocWhat it covers
../telemetry/index.mdArchitecture, design rationale, scaling roadmap
../telemetry/api.mdThree typed ingest endpoints + signed-token auth
../telemetry/media-events.mdVideo event taxonomy, frontend integration sketch
decisions.md → Why telemetry is PG + S3ADR for the redesign
external-providers.mdInfrastructure monitoring (Datadog, UptimeRobot) — separate concern from telemetry