Skip to content

Telemetry API Endpoints

Layer 2 feature, not yet implemented. The Telemetry API is a separate Go service (services/telemetry/). All endpoints live there, not in Core API. This doc describes the locked design — see index.md for the full architecture and rationale.

The Telemetry API serves three concerns:

  1. Patient-facing ingest — pose batches, media events, session finalization. Authenticated via short-lived signed session token issued by Core API at exercise-session start.
  2. Internal callbacks — Telemetry API calls Core API as a Cat F service-account principal to publish aggregation events via events.Bus. No public endpoint involved.
  3. Reads — none. All telemetry reads (specialist dashboards, patient progress, cohort views, replay blob fetch) flow through Core API. Telemetry API itself exposes no read endpoints to clients.

Authentication

Signed session token (hot path)

Pose-frame ingest hits 10k+ requests/sec at peak. Verifying a Clerk JWT per batch is wasteful. Instead:

  1. Patient starts an exercise session in Portal.
  2. Portal calls Core API: POST /v1/exercise-sessions (Clerk JWT auth as usual).
  3. Core API creates the session row and returns a short-lived signed token:
    json
    {
      "session_id": "550e8400-e29b-41d4-a716-446655440000",
      "telemetry_token": "v1.eyJwcmluY2lwYWxfaWQiOiIuLi4iLCJvcmdfaWQiOiIuLi4iLCJleGVyY2lzZV9zZXNzaW9uX2lkIjoiLi4uIiwiZXhwIjoxNzU0OTk5OTk5fQ.signature",
      "telemetry_token_expires_at": "2026-05-07T14:00:00Z"
    }
  4. Portal sends every pose/media batch to Telemetry API with Authorization: Bearer <telemetry_token>.
  5. Telemetry API verifies signature only (no Clerk dep on the hot path). Extracts principal_id, org_id, exercise_session_id from claims.

Token claims:

ClaimTypePurpose
principal_idUUIDWhose session this is (patient principal)
org_idUUIDTenant scope (must match what Core API recorded)
exercise_session_idUUIDThe session the token authorizes ingest for
iatint (unix)Issue time
expint (unix)Expiry — typically session_start + 2 hours, with renewal endpoint if needed

Signing: HS256 today (see index.md → Swap-point interfaces for upgrade path to Ed25519 if needed). Secret rotated via Secrets Manager; Telemetry API holds current + previous, accepts both during rotation window.

Validation: Telemetry API rejects ingest with 401 if signature fails, token expired, or exercise_session_id mismatch. Rejects with 403 if the matching per-purpose consent flag (analytics for media, biometric for pose) is not active for principal_id in org_id — checked via cached read from Core API.

Internal callbacks (Telemetry → Core API)

Telemetry API holds a Cat F service-account principal credential and calls Core API to publish aggregation events:

POST <core-api>/v1/internal/telemetry/session-aggregated
Authorization: Bearer <service-account-jwt>
{
  "session_id": "...",
  "principal_id": "...",
  "org_id": "...",
  "session_metrics": { ... },
  "rep_metrics": [ ... ],
  "media_metrics": { ... }
}

Core API verifies the service-account credential, validates the payload against the events.Bus event schema, and the subscriber writes to PG. RLS scope is set from org_id in the payload; the service-account principal has read-only RLS plus a narrow internal-write permission.

Endpoints

POST /v1/pose/frames

Batched pose ingest. Client buffers ~1 second of frames and posts.

Request body (binary, Content-Type: application/octet-stream):

[1-byte version] [4-byte frame_count] [4-byte fps_hint]
[N × 33 × 4 × float32 landmarks]   ← x, y, z, visibility per landmark per frame
[N × 4 byte timestamp_ms]          ← elapsed ms since session_start, per frame
[gzip wrapper around the whole binary blob]

Plus optional JSON sidecar with batch metadata in ?meta=... query param or X-Batch-Meta header (pose_confidence summary, camera_resolution, processing_time_ms — small, not per-frame).

Response: 202 Accepted

json
{
  "frames_accepted": 10,
  "session_id": "550e8400-e29b-41d4-a716-446655440000",
  "buffer_position_bytes": 5280
}

Failure modes:

  • 400 — malformed binary, version unsupported
  • 401 — token signature invalid / expired
  • 403 — biometric consent not active
  • 413 — batch too large (cap at ~64 KB after gzip)
  • 429 — backpressure: drop policy says shed this batch (pose frames are dropable; aggregates are not)

Backpressure behavior: at high load, Telemetry API may drop pose-frame batches. Frames are fungible — losing 5% still gives 95% of an exercise's signal. The aggregator's session_end summary remains accurate within tolerance. Aggregate writes (POST /v1/sessions/{id}/end) MUST NOT drop; they have priority quota.

POST /v1/media/events

Video lifecycle ingest. Volume is low (~110 events/sec at 1000 concurrent), so JSON is fine here.

Request body (Content-Type: application/json):

json
{
  "event": "session_start",
  "session_id": "550e8400-e29b-41d4-a716-446655440000",
  "media_id": "exercise-uuid",
  "media_type": "video",
  "timestamp": "2026-05-07T10:00:00.000Z",
  "data": {
    "total_duration_seconds": 120.5,
    "ttfb_ms": 340,
    "video_load_time_ms": 1200,
    "cdn_response_time_ms": 280,
    "connection_type": "wifi",
    "effective_bandwidth": 12.5,
    "rtt_ms": 45,
    "initial_bitrate": 2500000,
    "initial_resolution": "720p"
  }
}

Full event taxonomy (session_start, play, pause, seek, heartbeat, buffering_start, buffering_end, quality_change, milestone, error, session_end) and per-event field schemas live in media-events.md.

Response: 202 Accepted

json
{
  "status": "accepted",
  "session_id": "550e8400-e29b-41d4-a716-446655440000"
}

Failure modes:

  • 400 — unknown event type, schema violation
  • 401 — token invalid / expired
  • 403analytics consent not active
  • 429 — rate limit (per-session)

POST /v1/sessions/{id}/end

Session finalizer. Triggers server-side aggregation from accumulated landmarks, writes the S3 replay blob, publishes events.Bus event. Must not drop — separate priority quota from pose-frame ingest.

Request body (Content-Type: application/json):

json
{
  "ended_at": "2026-05-07T10:30:00.000Z",
  "client_status": "completed",
  "exercise_id": "exercise-uuid",
  "total_frames_attempted": 18000
}

Server cross-checks total_frames_attempted against actual frames received to flag suspicious gaps.

Response: 200 OK

json
{
  "session_id": "550e8400-e29b-41d4-a716-446655440000",
  "status": "completed",
  "frames_received": 17850,
  "frames_dropped": 150,
  "replay_blob": "s3://restartix-telemetry/{org_id}/{session_id}.bin.gz",
  "aggregates_published_at": "2026-05-07T10:30:01.234Z"
}

The replay_blob URL is internal — clients fetch replay via Core API (GET /v1/exercise-sessions/{id}/replay) which mints a short-lived signed S3 URL.

Failure modes:

  • 400 — bad payload
  • 401 — token invalid
  • 404 — session not found
  • 409 — session already finalized (idempotent: same client_status returns 200; conflicting status returns 409)
  • 500 — aggregation or S3 write failed; client retries with the same payload (idempotent)

GET /v1/healthz

Internal health check for ALB. Returns 200 OK with {"status": "ok", "version": "..."}. Not authenticated; not exposed publicly.

Reads (none on Telemetry API)

There are no read endpoints on Telemetry API for clients. All reads flow through Core API:

Reader → endpointSource data
Patient: GET /v1/me/exercise-sessionspose_session_metrics + media_session_metrics (PG, RLS by principal)
Specialist: GET /v1/patients/{id}/exercise-sessionsSame tables, RLS by org + permission
Specialist: GET /v1/exercise-sessions/{id}/replaySigned S3 URL to replay blob
Clinic admin: GET /v1/analytics/cohort/exercise-adherenceMaterialized view over pose_session_metrics
Console: GET /v1/admin/platform/exercise-aggregate-countsAnonymised aggregates only (no principal_id in response)

This is by design: Core API is the single source of authentication, RLS, audit, classification, and per-org permission enforcement. Telemetry API is ingest-only.

Session lifecycle (end-to-end)

Portal                    Core API              Telemetry API           S3              events.Bus      Core API subscriber       PG
  │                          │                       │                  │                   │                  │                   │
  ├── POST /v1/exercise-sessions ─►                  │                  │                   │                  │                   │
  │                          │ create row, mint      │                  │                   │                  │                   │
  │                          │ signed token          │                  │                   │                  │                   │
  ◄── 200 (session_id, telemetry_token) ──           │                  │                   │                  │                   │
  │                          │                       │                  │                   │                  │                   │
  ├── POST /v1/pose/frames (token) ────────────────► │ verify, append   │                   │                  │                   │
  │                          │                       │ to S3 multipart  │                   │                  │                   │
  │                          │                  ◄── 202                 │                   │                  │                   │
  │                          │                  ... repeat for 30 min ... │                  │                   │                  │
  │                          │                       │                  │                   │                  │                   │
  ├── POST /v1/sessions/{id}/end ──────────────────► │ aggregate from   │                   │                  │                   │
  │                          │                       │ landmarks,       │                   │                  │                   │
  │                          │                       │ finalize S3 ───► │                   │                  │                   │
  │                          │                       │ publish event ────────────────────► │                   │                  │
  │                          │                       │                  │                   │ deliver event ─► │ INSERT pose_*    │
  │                          │                       │                  │                   │                  │ + media_*        │
  │                          │                       │                  │                   │                  │ + UPDATE         │
  │                          │                       │                  │                   │                  │ patient_exercise_logs
  ◄── 200 (replay_blob URL) ─────────────────────────                    │                   │                  │                   │
  │                          │                       │                  │                   │                  │                   │
  │ ... later, specialist views ...                  │                  │                   │                  │                   │
  │                          ◄── GET /v1/exercise-sessions/{id} (Clinic app)               │                  │                   │
  │                          │ read PG aggregates ──────────────────────────────────────────────────────────────────────────────► │
  │                          │ mint signed S3 URL    │                  │                   │                  │                   │
  │                          ── 200 ──►              │                  │                   │                  │                   │
  │                          ◄── GET /v1/exercise-sessions/{id}/replay                     │                  │                   │
  │                          │ S3 GET ──────────────────────────────────► (browser)         │                  │                   │

Idempotency

  • Pose batches are not idempotent — duplicate batches are appended (small cost). Client should not retry pose batches; on failure, the next batch overlaps anyway.
  • Media events are idempotent by (session_id, event_type, timestamp) — duplicates dropped at ingest.
  • POST /v1/sessions/{id}/end is idempotent — finalizing a finalized session returns the existing finalization state.

Rate limiting

Per-session quota: ~1 batch/sec per endpoint (matches client batch cadence). Burst allowance ~5 batches/sec for reconnect-and-flush cases. Per-org global quota: based on entitlements + 1C.7 metering — exceeding the org's quota returns 429 across all sessions.

Observability

Telemetry API itself emits OTel traces / metrics / logs to the platform's observability stack (Datadog at scale, CloudWatch at staging). These are operational signals about the service — separate concern from the product telemetry it ingests.

Migration / older spec

Earlier versions of this doc described POST /v1/analytics/track, POST /v1/errors/report, POST /v1/audit/ingest, and an admin/dashboard surface. All four are out — see index.md → Scope for what each was replaced by (or rejected). When Layer 2 builds the service, this doc is the canonical reference; older references in features specs will be cleaned up at that time.