Skip to content

Scaling Plan

The platform is designed to scale in defined phases — each triggered by concrete metrics, not guesswork. Infrastructure gets more complex only when it needs to.

For the full technical breakdown with connection math, cost estimates, migration checklists, and implementation code, see the Scaling Architecture Reference →


The constraint to understand

The platform uses PostgreSQL Row-Level Security for data isolation. This requires holding a dedicated database connection for the entire duration of each request. This is the right approach for security, but it shapes how the database scales.

In plain terms: if 100 users are active simultaneously, the database needs at least 100 connections. The scaling plan is largely about managing this constraint as the user count grows.


Phase 1 — Single instance (current)

Who: 1–10 clinics, up to 100,000 patients Infrastructure: AWS App Runner (Core API + Telemetry) + RDS PostgreSQL + ElastiCache Redis Cost: ~$475–565/month Timeline: Launch through month 12

                        ┌──────────────┐
                        │  Cloudflare  │
                        │  DDoS + WAF  │
                        └──────┬───────┘
                               │ HTTPS

                     ┌─────────────────────┐
                     │    AWS App Runner    │
                     ├─────────┬───────────┤
                     │ Core API│Telemetry  │
                     │ (1-3)   │ API (1-2) │
                     └────┬────┴─────┬─────┘
                          │          │
              VPC ────────┼──────────┼──────────────
                          │          │
                    ┌─────▼──────────▼─────┐
                    │   RDS PostgreSQL      │
                    │   (single instance)   │
                    │                       │
                    │   db.t4g.medium       │
                    │   max_connections:200 │
                    │                       │
                    │   All tables:         │
                    │   ├── organizations   │
                    │   ├── patient_persons │
                    │   ├── patients        │
                    │   ├── appointments    │
                    │   ├── forms, files... │
                    │   └── audit_log       │
                    └──────────────────────┘

                    ┌──────────────────────┐
                    │   ElastiCache Redis   │
                    │   Rate limits,        │
                    │   sessions, cache     │
                    └──────────────────────┘

Everything runs on AWS. App Runner hosts both services and connects to RDS and Redis through a private VPC. All data lives in one database — patient_persons, patients, and all clinical records side by side. No sharding, no routing logic. See AWS Infrastructure → for the full setup.

Move to Phase 2 when:

  • Database connections are consistently above 80% of capacity
  • Database size exceeds 100GB
  • More than 10 clinics are active

Phase 2 — Read replicas (months 12–24)

Who: 10–50 clinics, up to 500,000 patients Key change: Add read replicas — separate database servers that handle read-only queries Cost: ~$1,235–1,385/month

                        ┌──────────────┐
                        │  Cloudflare  │
                        └──────┬───────┘


                     ┌─────────────────────┐
                     │    AWS App Runner    │
                     ├─────────┬───────────┤
                     │ Core API│Telemetry  │
                     │ (3-5)   │ API (2-3) │
                     └────┬────┴─────┬─────┘
                          │          │
                          │  DatabaseRouter middleware
                          │  routes by HTTP method:

              ┌───────────┴──────────────────────────┐
              │ POST/PUT/PATCH/DELETE                  │ GET
              ▼                                       ▼
   ┌─────────────────────┐              ┌──────────────────────┐
   │   RDS Primary        │  ◄── WAL ──►│  RDS Read Replica 1  │
   │   (read-write)       │  replication │  (read-only)         │
   │                      │              └──────────────────────┘
   │   db.r6g.large       │              ┌──────────────────────┐
   │   max_connections:500│  ◄── WAL ──►│  RDS Read Replica 2  │
   │                      │  replication │  (read-only)         │
   └─────────────────────┘              └──────────────────────┘

Most requests in the platform are reads (~70%). By routing reads to dedicated replicas and writes to the primary database, we roughly triple the system's capacity without changing any application logic.

The DatabaseRouter middleware inspects the HTTP method: mutations go to the primary, queries go round-robin across replicas. RLS session variables are set on all connections, so security behavior is identical.

This is an infrastructure upgrade. Clinics experience no downtime or change in behavior.

Move to Phase 3 when:

  • Any single clinic exceeds 50,000 patients
  • An enterprise clinic requires a dedicated SLA guarantee

Phase 3 — Enterprise tier (months 24–36)

Who: 50–100 clinics, up to 1,000,000 patients Key change: Two tiers — shared infrastructure and dedicated enterprise infrastructure Cost: ~$4,000/month infrastructure | ~$28,500/month projected revenue

                             ┌──────────────┐
                             │  Cloudflare  │
                             └──────┬───────┘


                          ┌─────────────────┐
                          │   App Runner     │
                          │   TenantRouter   │ ◄── "Which shard does this org belong to?"
                          │   middleware     │
                          └────────┬────────┘

                          ┌────────▼────────┐
                          │   Routing DB     │
                          │   (tiny instance)│
                          │                  │
                          │  ┌────────────┐  │
                          │  │tenant_shards│  │
                          │  │org → shard  │  │
                          │  └────────────┘  │
                          └────────┬────────┘

                    ┌──────────────┴──────────────┐
                    │                              │
                    ▼                              ▼
   ┌────────────────────────────┐   ┌────────────────────────────┐
   │ SHARED TIER                │   │ ENTERPRISE TIER             │
   │ (90 small/medium clinics)  │   │ (per large clinic)          │
   │                            │   │                             │
   │  App Runner: shared        │   │  App Runner: org-101        │
   │  ├── Core API (5)          │   │  ├── Core API (2)           │
   │  └── Telemetry API (3)     │   │  └── Telemetry API (1)      │
   │                            │   │                             │
   │  RDS: shared-cluster       │   │  RDS: org-101               │
   │  ├── Primary (r6g.xlarge)  │   │  └── Primary (r6g.large)    │
   │  ├── Read Replica 1        │   │                             │
   │  └── Read Replica 2        │   │  Redis: dedicated           │
   │                            │   │                             │
   │  Redis: shared             │   │  (repeat per enterprise     │
   │                            │   │   clinic)                   │
   └────────────────────────────┘   └────────────────────────────┘

Standard clinics continue on shared infrastructure. Enterprise clinics — those with large patient volumes or specific compliance requirements — get their own dedicated database and compute resources.

TierPriceInfrastructure
Standard$99–199/month per clinicShared — multiple clinics on one database
Enterprise$999–2,999/month per clinicDedicated — own database and servers

The TenantRouter middleware resolves the current organization to a shard using the routing DB (cached in Redis for 5 minutes). All subsequent database operations use the correct shard's connection pool. New enterprise clinics are provisioned automatically — no manual setup required.

This is also when cross-shard patient identity becomes relevant — a patient registered at a shared-tier clinic may later visit an enterprise clinic on a different shard.

Move to Phase 4 when:

  • More than 100 clinics on the shared tier
  • EU clinics require data to remain in European servers (GDPR residency)

Phase 4 — Multi-region (months 36+)

Who: 100–1,000+ clinics, millions of patients Key change: Geographic sharding — US clinics on US servers, EU clinics on EU servers Cost: ~$16,000/month infrastructure | ~$90,000/month projected revenue

                             ┌──────────────┐
                             │  Cloudflare  │
                             │  Geo routing  │
                             └──────┬───────┘

                          ┌─────────▼─────────┐
                          │   Global Routing   │
                          │   DB               │
                          │                    │
                          │  ┌──────────────┐  │
                          │  │ tenant_shards │  │
                          │  │ + region col  │  │
                          │  └──────────────┘  │
                          │  ┌──────────────┐  │
                          │  │ patient_     │  │
                          │  │ person_      │  │
                          │  │ registry     │  │
                          │  └──────────────┘  │
                          └─────────┬─────────┘

              ┌─────────────────────┴─────────────────────┐
              │                                            │
              ▼                                            ▼
┌───────────────────────────────┐       ┌───────────────────────────────┐
│ US REGION (us-east-1)          │       │ EU REGION (eu-west-1)          │
├───────────────────────────────┤       ├───────────────────────────────┤
│                                │       │                                │
│  Shared shards:                │       │  Shared shards:                │
│  ┌────────────┐ ┌────────────┐│       │  ┌────────────┐               │
│  │ US-1       │ │ US-2       ││       │  │ EU-1       │               │
│  │ 50 clinics │ │ 50 clinics ││       │  │ 30 clinics │               │
│  │ RDS+replicas│ │RDS+replicas││       │  │ RDS+replicas│               │
│  └────────────┘ └────────────┘│       │  └────────────┘               │
│                                │       │                                │
│  Enterprise (30 dedicated):    │       │  Enterprise (10 dedicated):    │
│  ┌──────┐┌──────┐┌──────┐    │       │  ┌──────┐┌──────┐            │
│  │org101││org102││ ...  │    │       │  │org201││ ...  │            │
│  │ RDS  ││ RDS  ││      │    │       │  │ RDS  ││      │            │
│  └──────┘└──────┘└──────┘    │       │  └──────┘└──────┘            │
│                                │       │                                │
│  ClickHouse (analytics)        │       │  ClickHouse (analytics)        │
│  Telemetry PostgreSQL (audit)  │       │  Telemetry PostgreSQL (audit)  │
└───────────────────────────────┘       └───────────────────────────────┘

This phase addresses two needs:

  1. Performance — clinics access servers closer to their physical location
  2. Compliance — GDPR requires EU patient data to remain in the EU

Each geographic region runs its own independent cluster of shards. The global routing DB directs each request to the right region and shard. Cross-shard analytics are handled by the Telemetry service (ClickHouse) which aggregates data from all shards — superadmin dashboards query Telemetry, not individual shards.


Cross-shard patient identity

Starting in Phase 3, clinics can live on different shards. But a patient's portable profile (patient_persons) has no organization_id — it's owned by the patient, not any clinic. This creates a question: when a patient registered at Clinic A (Shard 1) walks into Clinic B (Shard 2), how does Shard 2 find them?

┌─────────────────────────────────────────────────────────┐
│ GLOBAL ROUTING DB                                        │
│                                                          │
│  patient_person_registry (lightweight, hashes only)      │
│  ┌─────────────────┬──────────┬────────────────────┐    │
│  │ patient_person_id│phone_hash│ home_shard_id      │    │
│  ├─────────────────┼──────────┼────────────────────┤    │
│  │ 99              │ a3f8...  │ shard_1            │    │
│  │ 100             │ b7c2...  │ shard_2            │    │
│  └─────────────────┴──────────┴────────────────────┘    │
└─────────────────────────────────────────────────────────┘

     DISCOVER           FETCH                  LINK
     "Does this      "Get their             "Register them
      patient          profile from            at this clinic"
      exist?"          the home shard"

┌─ Shard 1 ──────────────────┐  ┌─ Shard 2 ──────────────────┐
│                             │  │                             │
│  patient_persons            │  │  patients (org-link only)   │
│  ┌───────────────────────┐  │  │  ┌───────────────────────┐  │
│  │ id: 99                │◄─┼──┼──│ patient_person_id: 99 │  │
│  │ name: "Maria López"   │  │  │  │ org_id: Clinic B      │  │
│  │ dob, blood_type, ...  │  │  │  │ profile_shared: false │  │
│  └───────────────────────┘  │  │  └───────────────────────┘  │
│                             │  │                             │
│  patients (org-link)        │  │  appointments, forms...     │
│  ┌───────────────────────┐  │  │  (Clinic B's clinical data) │
│  │ patient_person_id: 99 │  │  │                             │
│  │ org_id: Clinic A      │  │  └─────────────────────────────┘
│  │ profile_shared: true  │  │
│  └───────────────────────┘  │
│                             │
│  appointments, forms...     │
│  (Clinic A's clinical data) │
└─────────────────────────────┘

The solution:

  1. Home shard — the patient's full profile stays in whichever shard they first registered in
  2. Global registry — a lightweight lookup table (hashes only, no PII) in the routing DB maps patients to their home shard
  3. Cross-shard fetch — when a clinic on Shard 2 needs the profile, it reads a single row by primary key from Shard 1 (sub-millisecond)
  4. Clinical records never cross shards — appointments, forms, and files stay local to the shard where they were created

This requires no changes in Phase 1–2 (everything is one database). The registry is added as a one-time migration when entering Phase 3.

For the full implementation including Go code, SQL schema, and the cross-shard registration flow, see Cross-Shard Patient Identity →


Infrastructure cost summary

PhaseClinicsPatientsMonthly costMonthly revenue
11–10up to 100k$475–565Early stage
210–50up to 500k$1,235–1,385Growing
350–100up to 1M~$4,000~$28,500
4100–1,000+millions~$16,000~$90,000