External Service Providers
Comprehensive inventory of all third-party services that RestartiX Platform integrates with, organized by domain. Each entry includes purpose, data exchanged, compliance requirements, and where it's documented.
Authentication & Identity
Clerk
| Purpose | User authentication, session management, organization switching |
| Used by | Auth feature, all authenticated endpoints |
| Data exchanged | User identity (email, name), session tokens, organization membership |
| Compliance | BAA required (handles user PII). HIPAA-eligible plan needed. |
| Environment variables | CLERK_SECRET_KEY, CLERK_PUBLISHABLE_KEY, CLERK_WEBHOOK_SECRET |
| Failure impact | Critical — no authentication = no API access |
| Docs | features/auth/ |
Integration pattern: Clerk issues JWTs → Core API validates via Clerk SDK → sets RLS session variables (app.current_user_id, app.current_org_id, app.current_role).
Video & Communication
Daily.co
| Purpose | Video call rooms for telehealth appointments |
| Used by | Appointments feature (video appointments) |
| Data exchanged | Room creation, meeting tokens, participant joins/leaves |
| Compliance | BAA required (video sessions may contain PHI). HIPAA-eligible plan needed. |
| Environment variables | DAILY_API_KEY |
| Failure impact | High — video appointments fail, in-person appointments unaffected |
| Docs | features/appointments/, features/integrations/ |
Integration pattern: Core API creates Daily rooms on appointment creation → generates time-limited meeting tokens per participant → frontend joins via Daily SDK.
Bunny Stream
| Purpose | Exercise video hosting, adaptive streaming (HLS), CDN delivery |
| Used by | Exercise library, treatment plan sessions |
| Data exchanged | Video files (upload), stream URLs (playback), video metadata |
| Compliance | No PHI in video content (exercise demonstrations only, not patient recordings). |
| Environment variables | BUNNY_STREAM_API_KEY, BUNNY_STREAM_LIBRARY_ID, BUNNY_STREAM_CDN_HOSTNAME |
| Failure impact | High — exercise videos don't play, treatment plan sessions degraded |
| Docs | features/exercise-library/ |
Integration pattern: Admin uploads video → Bunny Stream processes (transcoding, HLS packaging) → CDN URL stored in exercises.video_url → patient streams via HLS.js on frontend.
File Storage
AWS S3
| Purpose | Document storage (appointment files, PDF reports, form attachments) |
| Used by | Documents, appointments, forms, PDF templates |
| Data exchanged | Files (upload/download), signed URLs |
| Compliance | BAA required (files may contain PHI — reports, prescriptions). S3 encryption at rest enabled. |
| Environment variables | AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_S3_BUCKET, AWS_REGION |
| Failure impact | High — file uploads/downloads fail, PDF generation fails |
| Docs | features/documents/, features/integrations/ |
Integration pattern: Core API generates presigned URLs for upload/download → files encrypted at rest (AES-256) → org-scoped key prefixes (org-{id}/...) for tenant isolation.
AWS CloudFront (Optional)
| Purpose | CDN for S3-hosted files (faster delivery, signed URLs) |
| Used by | Document downloads |
| Data exchanged | Cached file delivery |
| Compliance | Same as S3 (pass-through). |
| Environment variables | CLOUDFRONT_DISTRIBUTION_ID, CLOUDFRONT_KEY_PAIR_ID, CLOUDFRONT_PRIVATE_KEY |
| Failure impact | Low — falls back to direct S3 access |
| Docs | features/integrations/ |
Databases
PostgreSQL (Main)
| Purpose | Primary business database — all feature data, RLS-enforced multi-tenancy |
| Provider | AWS RDS PostgreSQL |
| Data stored | All business entities (55 tables): patients, appointments, forms, services, etc. |
| Compliance | BAA required (contains PHI). Encryption at rest. 7-year audit retention. |
| Environment variables | DATABASE_URL |
| Failure impact | Critical — entire API down |
| Docs | database-overview.md, scaling-architecture.md |
Telemetry storage
Telemetry rides on the same RDS Postgres as the main business DB, plus an S3 bucket for replay blobs. There is no separate Telemetry Postgres, no ClickHouse, no TimescaleDB — see decisions.md → Why telemetry is PG + S3, not ClickHouse. Aggregate tables (pose_session_metrics, pose_rep_metrics, media_session_metrics, media_buffering_events) are part of the Core API migration set. The replay-blob bucket is provisioned alongside the existing S3 buckets.
ClickHouse becomes relevant only at Tier 3 — see ../telemetry/index.md → Scaling roadmap — and is reachable via the swap-point interfaces without re-architecting the ingest pipeline.
Redis
| Purpose | Caching, scheduling hold system, worker queues (audit/analytics), real-time features |
| Provider | AWS ElastiCache Redis |
| Data stored | Scheduling holds (TTL), worker job queues, cached query results, SSE channel state |
| Compliance | No persistent PHI (transient data only, TTL-enforced). |
| Environment variables | REDIS_URL |
| Failure impact | High — scheduling holds fail, worker processing stops, caching disabled |
| Docs | features/scheduling/ |
Infrastructure & Hosting
AWS
| Purpose | Production infrastructure: ECS Fargate, RDS / Aurora Serverless v2, ElastiCache, S3, KMS, SES, ALB, VPC, Secrets Manager, CloudWatch, ECR |
| Used by | All services (compute, databases, caching, file storage, encryption, email, monitoring) |
| Services hosted | Core API + Telemetry API + 3 Next.js apps + pgbouncer (all on ECS Fargate), PostgreSQL (RDS Multi-AZ in production / Aurora Serverless v2 in staging), Redis (ElastiCache) |
| Compliance | BAA available. HIPAA-eligible services used. |
| Environment variables | Managed via AWS Secrets Manager |
| Failure impact | Critical — entire platform down |
| Docs | scaling-architecture.md, monitoring.md, AWS Infrastructure |
Monitoring & Alerting (Infrastructure)
Not to be confused with Telemetry. These tools monitor the servers and infrastructure — CPU, memory, connection pools, request latency, uptime. Telemetry monitors the product — patient exercise engagement and pose-tracking data. They are complementary layers:
Telemetry (yours, Layer 2) Monitoring (third-party) Question "How is this patient progressing through their treatment plan?" "Why is the server slow?" Audience Specialists, clinic admins, patients themselves DevOps, on-call engineer Data Pose landmarks, video engagement events, per-rep aggregates Server metrics, logs, traces Storage Postgres (same RDS as Core API) + S3 replay blobs Provider's SaaS cloud Required? Yes (Layer 2 product feature) Recommended (ops visibility) See ../telemetry/ for the product telemetry layer; Why telemetry is PG + S3, not ClickHouse for the rationale.
What You Need By Phase
| Phase | Infrastructure | Monitoring Stack | Cost |
|---|---|---|---|
| Phase 1 (1-10 orgs, AWS) | CloudWatch built-in metrics | UptimeRobot (free) + Slack alerts | Free |
| Phase 2 (10-50 orgs, AWS) | AWS RDS Multi-AZ + ECS Fargate + CloudWatch | CloudWatch alarms + Sentry (+ Datadog optional later) | ~$50-150/mo |
| At scale (50+ orgs) | Multi-instance, read replicas | Full Datadog APM + PagerDuty on-call | ~$500-800/mo |
Phase 1 is enough to launch. AWS CloudWatch gives you CPU/memory/logs in the console. UptimeRobot pings /health every 5 minutes and alerts via Slack if it's down. That covers the basics.
UptimeRobot / Pingdom (Phase 1 — Start Here)
| Purpose | External health check monitoring — "is the API up?" |
| Used by | Production availability monitoring |
| Data exchanged | HTTP GET to /health endpoint every 5 minutes |
| Cost | Free tier (50 monitors, 5-min interval) |
| Failure impact | None — monitoring only, platform unaffected |
| Docs | monitoring.md |
Why start here: This gives you external uptime monitoring. If AWS itself is experiencing issues, you won't know from CloudWatch alone. UptimeRobot checks from outside and alerts you.
Slack
| Purpose | Alert delivery channel for all phases |
| Used by | UptimeRobot alerts (Phase 1), Datadog warnings (Phase 2+), deployment notifications |
| Data exchanged | Alert messages via incoming webhooks (no PHI) |
| Cost | Free (existing workspace) |
| Failure impact | None — notifications delayed |
Datadog (Phase 2+)
| Purpose | Server metrics, structured logging, APM traces, custom dashboards |
| Used by | All services — Core API, databases, workers |
| Data exchanged | Application logs (PHI sanitized before shipping), CPU/memory/connection metrics, request traces |
| Compliance | No PHI in logs (sanitized at application level before export). |
| Environment variables | DD_API_KEY, DD_SITE |
| Cost | ~$31/mo (2 hosts, 50 metrics, 10GB logs) → scales with infrastructure |
| Failure impact | None — observability degraded, platform unaffected |
| Docs | monitoring.md |
Why Phase 2: In Phase 1, CloudWatch + UptimeRobot cover the basics. When you scale to read replicas and more complex connection pool management, you may want deeper APM metrics — connection pool utilization, query latency percentiles, replication lag. That's when Datadog earns its cost.
What Datadog gives you that the product Telemetry layer doesn't:
- Connection pool utilization trending toward exhaustion
- p95/p99 API response times across all endpoints
- Slow query detection and alerting
- Server CPU/memory/goroutine monitoring
- Correlated request traces (which query made this endpoint slow?)
PagerDuty (Phase 2+)
| Purpose | Critical alert escalation — pages the on-call engineer at 3am |
| Used by | Datadog critical alerts → PagerDuty → phone call/SMS |
| Data exchanged | Alert metadata: "Connection pool exhaustion on core-api" (no PHI) |
| Cost | ~$21/user/mo |
| Failure impact | None — alerts delayed, not lost (Datadog retries) |
| Docs | monitoring.md |
Why Phase 2: With 1-10 orgs, you can check Slack in the morning. With 50+ orgs and paying customers, you need someone woken up when the database runs out of connections.
Planned (Not Yet Integrated)
Payment Processor (Stripe / equivalent)
| Purpose | Service plan billing, product order payments |
| Will be used by | Service plans, patient product orders |
| Data exchanged | Payment intents, subscription management, webhook events |
| Compliance | PCI DSS Level 1. No card data stored locally (tokenized). |
| Status | Not yet chosen. Required before billing features go live. |
| Docs | features/services/ (billing model defined, payment integration TBD) |
Email Service (Resend / SendGrid / equivalent)
| Purpose | Transactional emails (appointment confirmations, form requests, automation actions) |
| Will be used by | Automations (send_email action), appointment notifications |
| Data exchanged | Email addresses, email content (may reference patient names/appointments) |
| Compliance | BAA required if email content includes PHI. |
| Status | Not yet chosen. Required for automation send_email action. |
| Docs | features/automations/ |
SMS & WhatsApp Service (Twilio / equivalent)
| Purpose | Appointment reminders, session reminders, treatment plan notifications |
| Will be used by | Automations (send_sms and send_whatsapp actions) |
| Data exchanged | Phone numbers, message content |
| Compliance | BAA required if messages include PHI. |
| Status | Not yet chosen. Options: Twilio (SMS + WhatsApp Business API), MessageBird, Vonage. WhatsApp Business API requires Meta business verification. |
| Docs | features/automations/ |
Summary
By Criticality
| Level | Services | Impact if Down |
|---|---|---|
| Critical | PostgreSQL (Main), Clerk, AWS (ECS Fargate + RDS), Cloudflare (DNS + edge) | API completely unavailable |
| High | Redis, AWS S3, Daily.co, Bunny Stream | Major features degraded |
| Medium | Telemetry API (Layer 2) | Pose ingest pauses; client buffers in IndexedDB and retries on reconnect; existing aggregates unaffected |
| Low | CloudFront | CDN bypassed; origin fetches from S3/Bunny |
| None | Datadog, PagerDuty, Slack, UptimeRobot | Observability only |
By Compliance Requirement
| Requirement | Services |
|---|---|
| BAA required | Clerk, Daily.co, AWS (S3/RDS — covers main + telemetry replay bucket) |
| BAA required (when chosen) | Payment processor, Email service, SMS service |
| No BAA needed | Bunny Stream (no PHI), Redis (transient), Monitoring stack |
Environment Variables (Complete)
# Auth
CLERK_SECRET_KEY=sk_live_...
CLERK_PUBLISHABLE_KEY=pk_live_...
CLERK_WEBHOOK_SECRET=whsec_...
# Databases
DATABASE_URL=postgresql://... # Postgres (Core API + Telemetry aggregates share the same instance)
REDIS_URL=redis://... # Redis
# File Storage
AWS_ACCESS_KEY_ID=AKIA...
AWS_SECRET_ACCESS_KEY=...
AWS_S3_BUCKET=restartix-files
AWS_REGION=eu-central-1
# CDN (optional)
CLOUDFRONT_DISTRIBUTION_ID=E...
CLOUDFRONT_KEY_PAIR_ID=K...
CLOUDFRONT_PRIVATE_KEY=-----BEGIN RSA PRIVATE KEY-----...
# Video Streaming
BUNNY_STREAM_API_KEY=...
BUNNY_STREAM_LIBRARY_ID=...
BUNNY_STREAM_CDN_HOSTNAME=...
# Video Calls
DAILY_API_KEY=...
# Telemetry (Layer 2)
TELEMETRY_SESSION_TOKEN_SECRET=... # HS256 secret for signed session tokens
TELEMETRY_S3_BUCKET=restartix-telemetry-prod # Replay-blob bucket
TELEMETRY_INTERNAL_SVC_ACCT_TOKEN=... # Cat F service-account credential for callbacks to Core API
# Monitoring
DD_API_KEY=... # Datadog (Phase 2+)
DD_SITE=datadoghq.com
# Payments (planned)
# STRIPE_SECRET_KEY=sk_live_...
# STRIPE_WEBHOOK_SECRET=whsec_...
# Email (planned)
# RESEND_API_KEY=re_...
# SMS (planned)
# TWILIO_ACCOUNT_SID=AC...
# TWILIO_AUTH_TOKEN=...Total Count
| Status | Count |
|---|---|
| Active | 9 services (Clerk, Daily.co, Bunny Stream, AWS S3, CloudFront, Postgres (single RDS, hosts Core API + Telemetry aggregates), Redis, AWS (compute/networking), Datadog) |
| Optional | 3 services (CloudFront, UptimeRobot/Pingdom, PagerDuty) |
| Planned | 3 services (Payment, Email, SMS) |