Scaling Architecture Reference
The technical companion to scaling.md. This document covers the numbers — connection math, Fargate task sizing, RDS / Aurora capacity, when each scaling lever fires — that the higher-level scaling plan abstracts over.
The architecture is deliberately conventional: single Postgres instance behind pgbouncer, horizontally-scaling stateless Fargate fleet, with read replicas and instance-class upsizing as the only Phase 2 levers. Sharding, per-tenant infrastructure, and TenantRouter-style request fanout are permanently out of scope per CLAUDE.md → Project Overview.
This document assumes the topology in aws-infrastructure.md. Numbers below reflect production sizing in eu-central-1.
Connection math
The single most load-bearing number in the architecture is how many database connections the fleet actually opens to RDS. This is what fails first under load if you size it wrong.
The fan-out
Each Fargate task pgbouncer fleet RDS Postgres
(max_connections=200)
Core API task ─┬──── 25 admin pool ──┐
└──── 25 app pool ──┤
│
Core API task ─┬──── 25 admin pool ──┤
└──── 25 app pool ──┤
│
... (5–10 tasks) ├──── 2 pgbouncer tasks ───── ~50 backend conns
│ (default_pool_size=25
Telemetry API task ─── pool ────────┤ per task,
│ transaction mode,
Telemetry API task ─── pool ────────┤ max_client_conn=1000)
│
Migration runner ─── 1–2 conns ─────┘ (DIRECT, bypasses pgbouncer)
via DATABASE_DIRECT_URL ────── ~5 conns
+ monitoring / Performance Insights ── ~5 conns
+ ad-hoc psql via SSM ──────────────── ~2 conns
────────
~62 conns used
~138 headroomWithout pgbouncer, the same 5–10 Core API tasks × 50 connections each = 250–500 connections fanned out to RDS, each holding ~10 MB RSS server-side — RDS would OOM or refuse. pgbouncer in transaction-pool mode multiplexes so the application tier scales horizontally while the backend connection count stays small.
Numbers, by environment
| Source | Production | Staging | Notes |
|---|---|---|---|
| Core API tasks | 2–10 (auto-scaling) | 1 (Fargate Spot) | Each runs pgx with two pools |
| pgx admin pool per task | DB_POOL_MAX=25 | 5 | Owner-role pool, RLS-bypassed |
| pgx app pool per task | DB_POOL_MAX=25 | 5 | Restricted-role pool, RLS-enforced |
| Telemetry API tasks | 2 (Multi-AZ) | 1 | Pool sizing TBD per aws-infrastructure.md → Telemetry sub-stack |
| pgbouncer tasks | 2 (one per AZ) | 1 | Each has its own backend pool |
pgbouncer default_pool_size | 25 | 25 | Backend conns per pgbouncer task per (user, db) |
pgbouncer max_client_conn | 1000 | 1000 | Inbound conn ceiling per pgbouncer task |
pgbouncer max_prepared_statements | 200 | 200 | Required for pgx prepared-statement caching |
RDS / Aurora max_connections | 200 (RDS db.t4g.medium) | Aurora-managed | Aurora SLv2 derives this from ACU |
Why these numbers
DB_POOL_MAX=25 per pool, two pools per task— pgx pools are per-process. With 5 tasks the application has 250 client-side connections to pgbouncer, plenty of headroom. With 10 tasks (auto-scaled) it's 500. Both well under pgbouncer'smax_client_conn=1000.default_pool_size=25 per pgbouncer task— 2 pgbouncer tasks × 25 = 50 backend connections to RDS. Withmax_connections=200, that's 25% utilized — comfortable headroom for migrations, monitoring, occasional admin connections.max_client_conn=1000 per pgbouncer task— well above any realistic Core API fleet size. The cap exists to prevent runaway misconfiguration, not to be hit in practice.
Where the numbers come from
- pgx pool sizing: internal/core/database/ —
Connectcalls construct*pgxpool.PoolwithMaxConns=DB_POOL_MAX. - pgbouncer config: services/api/deploy/pgbouncer/pgbouncer.ini — same file used in local docker-compose and Fargate.
- RDS
max_connections: production parameter group, set to 200 explicitly (default for db.t4g.medium would be lower).
Fargate task sizing
Each service's task definition specifies cpu and memory. Sizing principles:
| Service | CPU | Memory | Why |
|---|---|---|---|
| Core API (prod) | 1 vCPU | 2 GB | Go binary handles ~50–100 concurrent reqs per task; memory headroom for the pgx pools + envelope-encrypted key cache |
| Core API (staging) | 0.5 vCPU | 1 GB | Half-sized; staging traffic is rare enough that one task on Spot is fine |
| Telemetry API | TBD | TBD | Pending aws-infrastructure.md → Telemetry sub-stack decisions |
| Clinic / Portal (prod) | 0.5 vCPU | 1 GB | Server-rendered Next.js handles ~30–50 concurrent SSR renders per task |
| Console (prod) | 0.25 vCPU | 0.5 GB | Single-task fixed; superadmin-only, low concurrency |
| pgbouncer | 0.25 vCPU | 0.5 GB | Single static binary, very small footprint, throughput-bound by network not CPU |
Fargate prices per vCPU-hour and per GB-hour in eu-central-1. See aws-infrastructure.md → Cost: production day 1 for the all-in monthly numbers.
Auto-scaling parameters
Every horizontally-scaling service uses Application Auto Scaling with target-tracking on average CPU utilization:
| Parameter | Production value | Why |
|---|---|---|
target_value (CPU %) | 70 | Above 70% sustained, latency starts climbing; below 70% there's headroom for spikes |
scale_out_cooldown | 60 s | Add tasks fast — under-scaled is user-visible |
scale_in_cooldown | 300 s | Remove tasks slow — to avoid flapping after a brief lull |
min_capacity | 2 (Multi-AZ HA) or 1 (Console) | One task per AZ for HA; Console is a deliberate exception |
max_capacity | 10 (Core API), 8 (clinic/portal), 2 (Console) | Cost ceiling — review these annually as actual traffic data accumulates |
Adjusting bounds is a Terraform PR + apply, no service restart. The auto-scaling target is updated in place.
Database sizing
Production: RDS Postgres
| Phase | Instance class | vCPU | RAM | max_connections | Storage |
|---|---|---|---|---|---|
| Phase 1 (launch, 1–10 clinics) | db.t4g.medium Multi-AZ | 2 | 4 GB | 200 | 50 GB gp3, auto-scale to 200 GB |
| Phase 2 (10–50 clinics) | db.r6g.large Multi-AZ | 2 | 16 GB | 500 | 250 GB gp3 |
| Phase 2 + read replicas | + 2× db.r6g.large replicas | 2 each | 16 GB each | 500 each | Async WAL replication |
| Vertical ceiling | db.r6g.16xlarge | 64 | 512 GB | ~5000 | up to 64 TiB |
Vertical ceiling is included to make the headroom visible — the platform never approaches it within the documented multi-year ceiling. Single-DB scaling is a 5–10+ year story for the SMB-clinic shape.
Staging: Aurora Serverless v2
| Setting | Value | Why |
|---|---|---|
| Engine | aurora-postgresql 17 | Same wire protocol + extension surface as RDS |
| Capacity | 0.5–2 ACU, scale-to-zero | Idle = $0/hr compute; wake in 5–15 s on first request |
| Multi-AZ | Disabled (single-AZ) | Staging accepts the loss of HA for cost |
| Backup retention | 1 day | Staging-grade |
ACU sizing reference (Aurora Serverless v2):
- 0.5 ACU ≈ ~1 GB RAM, ~0.25 vCPU equivalent — sufficient for staging idle and light dev usage
- 1 ACU ≈ ~2 GB RAM, ~0.5 vCPU
- 2 ACU ≈ ~4 GB RAM, ~1 vCPU — burst ceiling for staging when devs are stressing it
The cluster auto-scales between min and max ACU based on connection count, CPU, and active sessions. Scale-to-zero kicks in after ~5 minutes of inactivity.
When each scaling lever fires
The triggers are deliberately metric-based, not time-based. Each lever has a clear "scale this when X" criterion.
Lever 1: Auto-scale Fargate tasks (automatic, no action)
- Trigger: average CPU > 70% sustained
- Action: ECS adds tasks within the configured min/max range
- Cost: linear with task count
Lever 2: Raise auto-scaling ceiling (Terraform PR)
- Trigger: auto-scaling already at
max_capacityand CPU still elevated - Action: edit
max_capacityin Terraform,terraform apply. No service restart. - Cost: no immediate change; the new ceiling only matters under sustained load
Lever 3: Resize Fargate tasks (Terraform PR + rolling deploy)
- Trigger: CPU stays high even with many tasks (suggests per-request CPU is the bottleneck, not concurrency)
- Action: raise
cpu/memoryin the task definition,terraform apply. ECS rolls a new task definition revision and does a rolling deploy. - Cost: linear with new task size × task count
Lever 4: Vertical RDS instance upgrade (downtime via Multi-AZ failover)
- Trigger:
- RDS CPU > 70% sustained, OR
- DB connection count > 80% of
max_connectionsafter pgbouncer is sized correctly, OR - Memory pressure (low free memory, page cache pressure)
- Action: Modify-instance to a larger class (db.t4g.medium → db.r6g.large). RDS uses Multi-AZ failover to apply with ~30s downtime.
- Cost: approximately doubles per class step
Lever 5: Add read replicas (Terraform PR + application code change)
- Trigger: read:write ratio is read-dominated (>70%) and primary CPU is constrained
- Action: provision 1–2 read replicas in Terraform. Application middleware (
DatabaseRouter) routes GETs to replicas, mutations to primary. - Cost: ~equal to one primary instance per replica
Lever 6: Tune pgbouncer pool sizes (Terraform PR + pgbouncer task restart)
- Trigger: pgbouncer queue depth rising (clients waiting for backend conn), but RDS connection count is well under ceiling
- Action: raise
default_pool_sizeper pgbouncer task. Apply via Terraform → ECS rolls pgbouncer tasks (one at a time per AZ to maintain availability). - Cost: none directly; uses existing RDS connection headroom
Lever 7: Storage growth (auto-scaling, no action)
- Trigger: RDS free storage drops below threshold
- Action: RDS auto-scales gp3 storage up to the configured maximum (currently 200 GB). No downtime.
- Cost: linear with storage GB
Lever 8: Storage IOPS upgrade (Terraform PR)
- Trigger: sustained read or write IOPS approaching the gp3 baseline ceiling (3000 IOPS for the default 50 GB)
- Action: raise provisioned IOPS in Terraform; gp3 supports up to 16,000 IOPS independent of size
- Cost: small additional charge per provisioned IOPS
Migration checklist (Phase 1 → Phase 2)
When the metrics in the scaling plan say "move to Phase 2," execute in order:
Pre-migration
- [ ] Confirm the trigger condition has been sustained for at least 2 weeks (not a transient spike)
- [ ] Take a manual RDS snapshot of the production database (named
pre-phase-2-upgrade-YYYYMMDD) - [ ] Update the runbook with any environment-specific notes from the staging dry run
RDS instance upgrade
- [ ] In Terraform, change the
instance_classfromdb.t4g.mediumtodb.r6g.large - [ ] Run
terraform planand review the diff (should show only the instance modification) - [ ] Schedule the change for a low-traffic window — Multi-AZ failover causes ~30s downtime
- [ ] Run
terraform apply - [ ] Watch for the failover event in CloudWatch; verify the application reconnects cleanly
- [ ] Confirm Performance Insights shows the new instance class
Add read replicas
- [ ] In Terraform, add 1–2
aws_db_instanceresources withreplicate_source_dbpointing at the primary - [ ] Run
terraform apply— replicas spin up in 10–20 minutes - [ ] Wire up the application's
DatabaseRoutermiddleware (if not already present) and configureDATABASE_REPLICA_URLS - [ ] Deploy a Core API release that uses the new middleware
- [ ] Monitor replica lag in CloudWatch — should stay under 100ms in steady state
Scale Fargate fleet
- [ ] Raise
max_capacityfor Core API to match anticipated load (e.g., 8 → 15) - [ ] Same for Clinic and Portal Next.js services
- [ ] Increase
min_capacityif base load justifies (e.g., 2 → 3 or 4) - [ ] Verify auto-scaling reacts as expected via a synthetic load test
Verify
- [ ] Run end-to-end acceptance against staging (which mirrors the new shape) before flipping any production setting
- [ ] Update aws-infrastructure.md → Cost: production day 1 with the new sizing if it has drifted from the documented values
What changes between staging and production
Sizing aside, the architectural shape is identical between environments — the same Terraform modules, the same task definitions, the same wire protocols. Differences:
| Dimension | Staging | Production |
|---|---|---|
| Database | Aurora Serverless v2 (scale-to-zero) | RDS db.t4g.medium Multi-AZ |
| Database backup retention | 1 day | 7 days + manual snapshots |
| Compute pricing | Fargate Spot for app services | On-demand for everything |
| AZ posture | Single-AZ | Multi-AZ (RDS, pgbouncer, ALB) |
| NAT | t4g.nano NAT instance | NAT Gateway |
| Auto-scaling bounds | 1 task min, 1 task max (most services) | 2 task min, 8–10 task max |
| Cache replica | None (single Redis node) | 1 replica for HA |
| Log retention | 30 days | 90 days |
| Cost target | Under $100/mo idle | ~$545/mo + telemetry TBD |
This is on purpose. Staging exists to validate the topology cheaply; production runs the same topology with redundancy added.
Related documentation
- Scaling plan — high-level phase narrative
- AWS infrastructure — full topology and cost
- P44 — Connection pooling via pgbouncer
- P45 — Redis-backed query cache (cache-aside)
- Backup & DR — how Phase 2 storage/DR posture differs from Phase 1
- IaC layout — where the auto-scaling and instance-class settings live in Terraform
- Monitoring — alarms that fire the scaling-trigger conditions documented above