Telemetry API Endpoints
Layer 2 feature, not yet implemented. The Telemetry API is a separate Go service (
services/telemetry/). All endpoints live there, not in Core API. This doc describes the locked design — see index.md for the full architecture and rationale.
The Telemetry API serves three concerns:
- Patient-facing ingest — pose batches, media events, session finalization. Authenticated via short-lived signed session token issued by Core API at exercise-session start.
- Internal callbacks — Telemetry API calls Core API as a Cat F service-account principal to publish aggregation events via
events.Bus. No public endpoint involved. - Reads — none. All telemetry reads (specialist dashboards, patient progress, cohort views, replay blob fetch) flow through Core API. Telemetry API itself exposes no read endpoints to clients.
Authentication
Signed session token (hot path)
Pose-frame ingest hits 10k+ requests/sec at peak. Verifying a Clerk JWT per batch is wasteful. Instead:
- Patient starts an exercise session in Portal.
- Portal calls Core API:
POST /v1/exercise-sessions(Clerk JWT auth as usual). - Core API creates the session row and returns a short-lived signed token:json
{ "session_id": "550e8400-e29b-41d4-a716-446655440000", "telemetry_token": "v1.eyJwcmluY2lwYWxfaWQiOiIuLi4iLCJvcmdfaWQiOiIuLi4iLCJleGVyY2lzZV9zZXNzaW9uX2lkIjoiLi4uIiwiZXhwIjoxNzU0OTk5OTk5fQ.signature", "telemetry_token_expires_at": "2026-05-07T14:00:00Z" } - Portal sends every pose/media batch to Telemetry API with
Authorization: Bearer <telemetry_token>. - Telemetry API verifies signature only (no Clerk dep on the hot path). Extracts
principal_id,org_id,exercise_session_idfrom claims.
Token claims:
| Claim | Type | Purpose |
|---|---|---|
principal_id | UUID | Whose session this is (patient principal) |
org_id | UUID | Tenant scope (must match what Core API recorded) |
exercise_session_id | UUID | The session the token authorizes ingest for |
iat | int (unix) | Issue time |
exp | int (unix) | Expiry — typically session_start + 2 hours, with renewal endpoint if needed |
Signing: HS256 today (see index.md → Swap-point interfaces for upgrade path to Ed25519 if needed). Secret rotated via Secrets Manager; Telemetry API holds current + previous, accepts both during rotation window.
Validation: Telemetry API rejects ingest with 401 if signature fails, token expired, or exercise_session_id mismatch. Rejects with 403 if the matching per-purpose consent flag (analytics for media, biometric for pose) is not active for principal_id in org_id — checked via cached read from Core API.
Internal callbacks (Telemetry → Core API)
Telemetry API holds a Cat F service-account principal credential and calls Core API to publish aggregation events:
POST <core-api>/v1/internal/telemetry/session-aggregated
Authorization: Bearer <service-account-jwt>
{
"session_id": "...",
"principal_id": "...",
"org_id": "...",
"session_metrics": { ... },
"rep_metrics": [ ... ],
"media_metrics": { ... }
}Core API verifies the service-account credential, validates the payload against the events.Bus event schema, and the subscriber writes to PG. RLS scope is set from org_id in the payload; the service-account principal has read-only RLS plus a narrow internal-write permission.
Endpoints
POST /v1/pose/frames
Batched pose ingest. Client buffers ~1 second of frames and posts.
Request body (binary, Content-Type: application/octet-stream):
[1-byte version] [4-byte frame_count] [4-byte fps_hint]
[N × 33 × 4 × float32 landmarks] ← x, y, z, visibility per landmark per frame
[N × 4 byte timestamp_ms] ← elapsed ms since session_start, per frame
[gzip wrapper around the whole binary blob]Plus optional JSON sidecar with batch metadata in ?meta=... query param or X-Batch-Meta header (pose_confidence summary, camera_resolution, processing_time_ms — small, not per-frame).
Response: 202 Accepted
{
"frames_accepted": 10,
"session_id": "550e8400-e29b-41d4-a716-446655440000",
"buffer_position_bytes": 5280
}Failure modes:
400— malformed binary, version unsupported401— token signature invalid / expired403— biometric consent not active413— batch too large (cap at ~64 KB after gzip)429— backpressure: drop policy says shed this batch (pose frames are dropable; aggregates are not)
Backpressure behavior: at high load, Telemetry API may drop pose-frame batches. Frames are fungible — losing 5% still gives 95% of an exercise's signal. The aggregator's session_end summary remains accurate within tolerance. Aggregate writes (POST /v1/sessions/{id}/end) MUST NOT drop; they have priority quota.
POST /v1/media/events
Video lifecycle ingest. Volume is low (~110 events/sec at 1000 concurrent), so JSON is fine here.
Request body (Content-Type: application/json):
{
"event": "session_start",
"session_id": "550e8400-e29b-41d4-a716-446655440000",
"media_id": "exercise-uuid",
"media_type": "video",
"timestamp": "2026-05-07T10:00:00.000Z",
"data": {
"total_duration_seconds": 120.5,
"ttfb_ms": 340,
"video_load_time_ms": 1200,
"cdn_response_time_ms": 280,
"connection_type": "wifi",
"effective_bandwidth": 12.5,
"rtt_ms": 45,
"initial_bitrate": 2500000,
"initial_resolution": "720p"
}
}Full event taxonomy (session_start, play, pause, seek, heartbeat, buffering_start, buffering_end, quality_change, milestone, error, session_end) and per-event field schemas live in media-events.md.
Response: 202 Accepted
{
"status": "accepted",
"session_id": "550e8400-e29b-41d4-a716-446655440000"
}Failure modes:
400— unknown event type, schema violation401— token invalid / expired403—analyticsconsent not active429— rate limit (per-session)
POST /v1/sessions/{id}/end
Session finalizer. Triggers server-side aggregation from accumulated landmarks, writes the S3 replay blob, publishes events.Bus event. Must not drop — separate priority quota from pose-frame ingest.
Request body (Content-Type: application/json):
{
"ended_at": "2026-05-07T10:30:00.000Z",
"client_status": "completed",
"exercise_id": "exercise-uuid",
"total_frames_attempted": 18000
}Server cross-checks total_frames_attempted against actual frames received to flag suspicious gaps.
Response: 200 OK
{
"session_id": "550e8400-e29b-41d4-a716-446655440000",
"status": "completed",
"frames_received": 17850,
"frames_dropped": 150,
"replay_blob": "s3://restartix-telemetry/{org_id}/{session_id}.bin.gz",
"aggregates_published_at": "2026-05-07T10:30:01.234Z"
}The replay_blob URL is internal — clients fetch replay via Core API (GET /v1/exercise-sessions/{id}/replay) which mints a short-lived signed S3 URL.
Failure modes:
400— bad payload401— token invalid404— session not found409— session already finalized (idempotent: sameclient_statusreturns 200; conflicting status returns 409)500— aggregation or S3 write failed; client retries with the same payload (idempotent)
GET /v1/healthz
Internal health check for ALB. Returns 200 OK with {"status": "ok", "version": "..."}. Not authenticated; not exposed publicly.
Reads (none on Telemetry API)
There are no read endpoints on Telemetry API for clients. All reads flow through Core API:
| Reader → endpoint | Source data |
|---|---|
Patient: GET /v1/me/exercise-sessions | pose_session_metrics + media_session_metrics (PG, RLS by principal) |
Specialist: GET /v1/patients/{id}/exercise-sessions | Same tables, RLS by org + permission |
Specialist: GET /v1/exercise-sessions/{id}/replay | Signed S3 URL to replay blob |
Clinic admin: GET /v1/analytics/cohort/exercise-adherence | Materialized view over pose_session_metrics |
Console: GET /v1/admin/platform/exercise-aggregate-counts | Anonymised aggregates only (no principal_id in response) |
This is by design: Core API is the single source of authentication, RLS, audit, classification, and per-org permission enforcement. Telemetry API is ingest-only.
Session lifecycle (end-to-end)
Portal Core API Telemetry API S3 events.Bus Core API subscriber PG
│ │ │ │ │ │ │
├── POST /v1/exercise-sessions ─► │ │ │ │ │
│ │ create row, mint │ │ │ │ │
│ │ signed token │ │ │ │ │
◄── 200 (session_id, telemetry_token) ── │ │ │ │ │
│ │ │ │ │ │ │
├── POST /v1/pose/frames (token) ────────────────► │ verify, append │ │ │ │
│ │ │ to S3 multipart │ │ │ │
│ │ ◄── 202 │ │ │ │
│ │ ... repeat for 30 min ... │ │ │ │
│ │ │ │ │ │ │
├── POST /v1/sessions/{id}/end ──────────────────► │ aggregate from │ │ │ │
│ │ │ landmarks, │ │ │ │
│ │ │ finalize S3 ───► │ │ │ │
│ │ │ publish event ────────────────────► │ │ │
│ │ │ │ │ deliver event ─► │ INSERT pose_* │
│ │ │ │ │ │ + media_* │
│ │ │ │ │ │ + UPDATE │
│ │ │ │ │ │ patient_exercise_logs
◄── 200 (replay_blob URL) ───────────────────────── │ │ │ │
│ │ │ │ │ │ │
│ ... later, specialist views ... │ │ │ │ │
│ ◄── GET /v1/exercise-sessions/{id} (Clinic app) │ │ │
│ │ read PG aggregates ──────────────────────────────────────────────────────────────────────────────► │
│ │ mint signed S3 URL │ │ │ │ │
│ ── 200 ──► │ │ │ │ │
│ ◄── GET /v1/exercise-sessions/{id}/replay │ │ │
│ │ S3 GET ──────────────────────────────────► (browser) │ │ │Idempotency
- Pose batches are not idempotent — duplicate batches are appended (small cost). Client should not retry pose batches; on failure, the next batch overlaps anyway.
- Media events are idempotent by
(session_id, event_type, timestamp)— duplicates dropped at ingest. POST /v1/sessions/{id}/endis idempotent — finalizing a finalized session returns the existing finalization state.
Rate limiting
Per-session quota: ~1 batch/sec per endpoint (matches client batch cadence). Burst allowance ~5 batches/sec for reconnect-and-flush cases. Per-org global quota: based on entitlements + 1C.7 metering — exceeding the org's quota returns 429 across all sessions.
Observability
Telemetry API itself emits OTel traces / metrics / logs to the platform's observability stack (Datadog at scale, CloudWatch at staging). These are operational signals about the service — separate concern from the product telemetry it ingests.
Migration / older spec
Earlier versions of this doc described POST /v1/analytics/track, POST /v1/errors/report, POST /v1/audit/ingest, and an admin/dashboard surface. All four are out — see index.md → Scope for what each was replaced by (or rejected). When Layer 2 builds the service, this doc is the canonical reference; older references in features specs will be cleaned up at that time.