Scaling Plan

The platform is designed to scale in defined phases — each triggered by concrete metrics, not guesswork. Infrastructure gets more complex only when it needs to.

For the full technical breakdown with connection math, cost estimates, migration checklists, and implementation code, see the Scaling Architecture Reference →

The constraint to understand

The platform uses PostgreSQL Row-Level Security for data isolation. This requires holding a dedicated database connection for the entire duration of each request. This is the right approach for security, but it shapes how the database scales.

In plain terms: if 100 users are active simultaneously, the database needs at least 100 connections held for the duration of each request. pgbouncer in transaction-pool mode is the primary lever that makes this manageable — it multiplexes many short-lived application connections onto a small pool of backend Postgres connections, so the fleet can scale request volume without scaling Postgres max_connections linearly. Pattern reference: P44 — Connection Pooling via pgbouncer. All Phase-2+ scaling targets below assume pgbouncer is active in front of RDS.

Phase 1 — Single instance (launch)

Who: 1–10 clinics, up to 100,000 patients Infrastructure: ECS Fargate (Core API + Telemetry API + 3 Next.js apps + pgbouncer) + RDS Postgres Multi-AZ + ElastiCache Redis + Cloudflare edge Cost: ~$545/month AWS+Cloudflare + telemetry TBD (see aws-infrastructure.md → Cost: production day 1) Timeline: Launch through month 12

                        ┌──────────────┐
                        │  Cloudflare  │
                        │  DDoS + WAF  │
                        │  CDN + DNS   │
                        │  SaaS custom │
                        │  domains     │
                        └──────┬───────┘
                               │ HTTPS
                               ▼
                     ┌──────────────────────────┐
                     │    AWS ALB                │
                     │    (host-based routing)   │
                     └────────────┬──────────────┘
                                  │
                     ┌────────────▼──────────────┐
                     │    ECS Fargate cluster    │
                     │  ┌────────┐ ┌──────────┐  │
                     │  │ Core   │ │ Telemetry│  │
                     │  │ API    │ │ API      │  │
                     │  │ (2× +  │ │ (TBD)    │  │
                     │  │ scale) │ │          │  │
                     │  └────────┘ └──────────┘  │
                     │  ┌────────┐ ┌──────────┐  │
                     │  │ Clinic │ │ Portal   │  │
                     │  │ (2×)   │ │ (2×)     │  │
                     │  └────────┘ └──────────┘  │
                     │  ┌────────┐ ┌──────────┐  │
                     │  │Console │ │pgbouncer │  │
                     │  │ (1×)   │ │ (2× per  │  │
                     │  │        │ │ AZ)      │  │
                     │  └────────┘ └──────────┘  │
                     └────────────┬──────────────┘
                                  │
              VPC ────────────────┼─────────────────
                                  │
                            ┌─────▼─────┐
                            │ RDS       │
                            │ Postgres  │
                            │ Multi-AZ  │
                            │ db.t4g    │
                            │ .medium   │
                            └───────────┘
                            ┌──────────────────┐
                            │  ElastiCache      │
                            │  Redis            │
                            │  + replica        │
                            └──────────────────┘

Compute, database, and edge run on AWS + Cloudflare. All business data lives in one Postgres instance — organizations, patient_profiles, patients, and all clinical records side by side. No sharding, no routing logic. The Telemetry API is a sibling Fargate service with separate storage; its internal architecture is being scoped (see aws-infrastructure.md → Telemetry sub-stack).

Horizontal scaling of the application tier

Every Fargate service has an Application Auto Scaling target with a target-tracking policy on average CPU utilization (typical target: 70%). Fast scale-out (60s cooldown), slow scale-in (5min cooldown) to avoid flapping.

hcl

resource "aws_appautoscaling_target" "core_api" {
  min_capacity = 2
  max_capacity = 10
  ...
}

When traffic rises, ECS adds tasks automatically — 2 → 4 → 6 → 10 — with no deploy. When it drops, tasks drain and exit. Adjusting the bounds is a Terraform PR + apply, no service restart.

Move to Phase 2 when:

Database connections are consistently above 80% of max_connections after pgbouncer multiplexing
Database size exceeds 100GB
More than 10 clinics are active and read latency on aggregation queries climbs past comfort

Phase 2 — Read replicas (months 12–24)

Who: 10–50 clinics, up to 500,000 patients Key change: Add read replicas — separate database servers that handle read-only queries Cost: ~$1,300–1,500/month AWS+Cloudflare + telemetry TBD (see aws-infrastructure.md → Phase 2 estimate)

                        ┌──────────────┐
                        │  Cloudflare  │
                        └──────┬───────┘
                               │
                               ▼
                     ┌──────────────────────────┐
                     │    AWS ALB                │
                     └────────────┬──────────────┘
                                  │
                     ┌────────────▼──────────────┐
                     │    ECS Fargate cluster    │
                     │  Core API (3-8 tasks)     │
                     │  Telemetry API (2-4)      │
                     │  Clinic / Portal (2-6 ea) │
                     │  Console (1-2)            │
                     │  pgbouncer (2-4)          │
                     └────────────┬──────────────┘
                                  │  DatabaseRouter middleware
                                  │  routes by HTTP method:
                                  │
              ┌───────────────────┴─────────────────────┐
              │ POST/PUT/PATCH/DELETE                    │ GET
              ▼                                          ▼
   ┌─────────────────────┐                ┌──────────────────────┐
   │   RDS Primary        │  ◄── WAL ──► │  RDS Read Replica 1  │
   │   (read-write)       │  replication │  (read-only)         │
   │                      │               └──────────────────────┘
   │   db.r6g.large       │               ┌──────────────────────┐
   │   max_connections:500│  ◄── WAL ──► │  RDS Read Replica 2  │
   │                      │  replication │  (read-only)         │
   └─────────────────────┘               └──────────────────────┘

Most requests in the platform are reads (~70%). By routing reads to dedicated replicas and writes to the primary database, we roughly triple the system's capacity without changing any application logic.

The DatabaseRouter middleware inspects the HTTP method: mutations go to the primary, queries go round-robin across replicas. RLS session variables are set on all connections, so security behavior is identical.

This is an infrastructure upgrade. Clinics experience no downtime or change in behavior.

Move to vertical scaling when:

The primary RDS instance approaches its vertical scaling ceiling under sustained load
Read replicas alone can't keep up with read traffic

Beyond Phase 2 — single-DB scaling levers

The platform serves SMB clinics — both tenancy modes (shared and dedicated) run on the same shared Postgres instance with logical isolation via RLS, per-tenant auth-provider organisation for dedicated tenants (when provisioned), and tenant-scope DPA exclusion. Dedicated infrastructure per tenant, sharded deployments, and per-tenant database routing are permanently out of scope (see CLAUDE.md → Project Overview and features/platform/tenant-isolation.md).

The available scaling levers are all single-DB compatible and largely already foundational:

Vertical RDS scaling — db.r6g.large → db.r6g.xlarge → db.r6g.2xlarge and beyond. AWS-managed; no code changes.
Read replicas — native Postgres feature; route read-heavy queries to a replica in the same region.
pgbouncer connection pooling — already foundational (P44). Multiplexes connections so the Core API fleet scales horizontally without exhausting the DB.
Redis cache — already foundational (P45). Offloads hot reads from the DB.
Event-table partitioning — already foundational (P41). audit_log and any future event tables are monthly-range-partitioned so they don't bloat indefinitely.
Multi-AZ failover + backups — single AWS region, primary + standby in different availability zones; standard RDS features.

A single tuned Aurora/RDS Postgres instance comfortably handles 10TB+ of data and hundreds of thousands of pooled connections. For SMB-clinic workload (hundreds of clinics, hundreds of thousands of patients across the network) this is a 5–10+ year ceiling. Real production data will inform any future architectural change long before then.

Future considered: multi-region for data residency

Some EU clinics may eventually require their data to reside in a specific region (e.g., German clinics on eu-central-1 while Romanian clinics stay on eu-west-3). This is not the same as horizontal sharding — it's per-tenant region selection on logically-isolated data, not multi-DB performance scaling. Today no customer requires it. If/when one does, the design will land as its own ADR; the platform stays single-DB until then.

Scaling Plan ​

The constraint to understand ​

Phase 1 — Single instance (launch) ​

Horizontal scaling of the application tier ​

Phase 2 — Read replicas (months 12–24) ​

Beyond Phase 2 — single-DB scaling levers ​