Scaling Architecture Reference

The technical companion to scaling.md. This document covers the numbers — connection math, Fargate task sizing, RDS / Aurora capacity, when each scaling lever fires — that the higher-level scaling plan abstracts over.

The architecture is deliberately conventional: single Postgres instance behind pgbouncer, horizontally-scaling stateless Fargate fleet, with read replicas and instance-class upsizing as the only Phase 2 levers. Sharding, per-tenant infrastructure, and TenantRouter-style request fanout are permanently out of scope per CLAUDE.md → Project Overview.

This document assumes the topology in aws-infrastructure.md. Numbers below reflect production sizing in eu-central-1.

Connection math

The single most load-bearing number in the architecture is how many database connections the fleet actually opens to RDS. This is what fails first under load if you size it wrong.

The fan-out

   Each Fargate task                    pgbouncer fleet                    RDS Postgres
                                                                          (max_connections=200)

   Core API task ─┬──── 25 admin pool ──┐
                  └──── 25 app pool   ──┤
                                        │
   Core API task ─┬──── 25 admin pool ──┤
                  └──── 25 app pool   ──┤
                                        │
   ... (5–10 tasks)                     ├──── 2 pgbouncer tasks ───── ~50 backend conns
                                        │     (default_pool_size=25
   Telemetry API task ─── pool ────────┤      per task,
                                        │      transaction mode,
   Telemetry API task ─── pool ────────┤      max_client_conn=1000)
                                        │
   Migration runner ─── 1–2 conns ─────┘ (DIRECT, bypasses pgbouncer)
                                          via DATABASE_DIRECT_URL ────── ~5 conns

                                          + monitoring / Performance Insights ── ~5 conns
                                          + ad-hoc psql via SSM ──────────────── ~2 conns
                                                                                  ────────
                                                                                  ~62 conns used
                                                                                  ~138 headroom

Without pgbouncer, the same 5–10 Core API tasks × 50 connections each = 250–500 connections fanned out to RDS, each holding ~10 MB RSS server-side — RDS would OOM or refuse. pgbouncer in transaction-pool mode multiplexes so the application tier scales horizontally while the backend connection count stays small.

Numbers, by environment

Source	Production	Staging	Notes
Core API tasks	2–10 (auto-scaling)	1 (Fargate Spot)	Each runs pgx with two pools
pgx admin pool per task	`DB_POOL_MAX=25`	5	Owner-role pool, RLS-bypassed
pgx app pool per task	`DB_POOL_MAX=25`	5	Restricted-role pool, RLS-enforced
Telemetry API tasks	2 (Multi-AZ)	1	Pool sizing TBD per aws-infrastructure.md → Telemetry sub-stack
pgbouncer tasks	2 (one per AZ)	1	Each has its own backend pool
pgbouncer `default_pool_size`	25	25	Backend conns per pgbouncer task per (user, db)
pgbouncer `max_client_conn`	1000	1000	Inbound conn ceiling per pgbouncer task
pgbouncer `max_prepared_statements`	200	200	Required for pgx prepared-statement caching
RDS / Aurora `max_connections`	200 (RDS db.t4g.medium)	Aurora-managed	Aurora SLv2 derives this from ACU

Why these numbers

DB_POOL_MAX=25 per pool, two pools per task — pgx pools are per-process. With 5 tasks the application has 250 client-side connections to pgbouncer, plenty of headroom. With 10 tasks (auto-scaled) it's 500. Both well under pgbouncer's max_client_conn=1000.
default_pool_size=25 per pgbouncer task — 2 pgbouncer tasks × 25 = 50 backend connections to RDS. With max_connections=200, that's 25% utilized — comfortable headroom for migrations, monitoring, occasional admin connections.
max_client_conn=1000 per pgbouncer task — well above any realistic Core API fleet size. The cap exists to prevent runaway misconfiguration, not to be hit in practice.

Where the numbers come from

pgx pool sizing: internal/core/database/ — Connect calls construct *pgxpool.Pool with MaxConns=DB_POOL_MAX.
pgbouncer config: services/api/deploy/pgbouncer/pgbouncer.ini — same file used in local docker-compose and Fargate.
RDS max_connections: production parameter group, set to 200 explicitly (default for db.t4g.medium would be lower).

Fargate task sizing

Each service's task definition specifies cpu and memory. Sizing principles:

Service	CPU	Memory	Why
Core API (prod)	1 vCPU	2 GB	Go binary handles ~50–100 concurrent reqs per task; memory headroom for the pgx pools + envelope-encrypted key cache
Core API (staging)	0.5 vCPU	1 GB	Half-sized; staging traffic is rare enough that one task on Spot is fine
Telemetry API	TBD	TBD	Pending aws-infrastructure.md → Telemetry sub-stack decisions
Clinic / Portal (prod)	0.5 vCPU	1 GB	Server-rendered Next.js handles ~30–50 concurrent SSR renders per task
Console (prod)	0.25 vCPU	0.5 GB	Single-task fixed; superadmin-only, low concurrency
pgbouncer	0.25 vCPU	0.5 GB	Single static binary, very small footprint, throughput-bound by network not CPU

Fargate prices per vCPU-hour and per GB-hour in eu-central-1. See aws-infrastructure.md → Cost: production day 1 for the all-in monthly numbers.

Auto-scaling parameters

Every horizontally-scaling service uses Application Auto Scaling with target-tracking on average CPU utilization:

Parameter	Production value	Why
`target_value` (CPU %)	70	Above 70% sustained, latency starts climbing; below 70% there's headroom for spikes
`scale_out_cooldown`	60 s	Add tasks fast — under-scaled is user-visible
`scale_in_cooldown`	300 s	Remove tasks slow — to avoid flapping after a brief lull
`min_capacity`	2 (Multi-AZ HA) or 1 (Console)	One task per AZ for HA; Console is a deliberate exception
`max_capacity`	10 (Core API), 8 (clinic/portal), 2 (Console)	Cost ceiling — review these annually as actual traffic data accumulates

Adjusting bounds is a Terraform PR + apply, no service restart. The auto-scaling target is updated in place.

Database sizing

Production: RDS Postgres

Phase	Instance class	vCPU	RAM	`max_connections`	Storage
Phase 1 (launch, 1–10 clinics)	db.t4g.medium Multi-AZ	2	4 GB	200	50 GB gp3, auto-scale to 200 GB
Phase 2 (10–50 clinics)	db.r6g.large Multi-AZ	2	16 GB	500	250 GB gp3
Phase 2 + read replicas	+ 2× db.r6g.large replicas	2 each	16 GB each	500 each	Async WAL replication
Vertical ceiling	db.r6g.16xlarge	64	512 GB	~5000	up to 64 TiB

Vertical ceiling is included to make the headroom visible — the platform never approaches it within the documented multi-year ceiling. Single-DB scaling is a 5–10+ year story for the SMB-clinic shape.

Staging: Aurora Serverless v2

Setting	Value	Why
Engine	`aurora-postgresql 17`	Same wire protocol + extension surface as RDS
Capacity	0.5–2 ACU, scale-to-zero	Idle = $0/hr compute; wake in 5–15 s on first request
Multi-AZ	Disabled (single-AZ)	Staging accepts the loss of HA for cost
Backup retention	1 day	Staging-grade

ACU sizing reference (Aurora Serverless v2):

0.5 ACU ≈ ~1 GB RAM, ~0.25 vCPU equivalent — sufficient for staging idle and light dev usage
1 ACU ≈ ~2 GB RAM, ~0.5 vCPU
2 ACU ≈ ~4 GB RAM, ~1 vCPU — burst ceiling for staging when devs are stressing it

The cluster auto-scales between min and max ACU based on connection count, CPU, and active sessions. Scale-to-zero kicks in after ~5 minutes of inactivity.

When each scaling lever fires

The triggers are deliberately metric-based, not time-based. Each lever has a clear "scale this when X" criterion.

Lever 1: Auto-scale Fargate tasks (automatic, no action)

Trigger: average CPU > 70% sustained
Action: ECS adds tasks within the configured min/max range
Cost: linear with task count

Lever 2: Raise auto-scaling ceiling (Terraform PR)

Trigger: auto-scaling already at max_capacity and CPU still elevated
Action: edit max_capacity in Terraform, terraform apply. No service restart.
Cost: no immediate change; the new ceiling only matters under sustained load

Lever 3: Resize Fargate tasks (Terraform PR + rolling deploy)

Trigger: CPU stays high even with many tasks (suggests per-request CPU is the bottleneck, not concurrency)
Action: raise cpu / memory in the task definition, terraform apply. ECS rolls a new task definition revision and does a rolling deploy.
Cost: linear with new task size × task count

Lever 4: Vertical RDS instance upgrade (downtime via Multi-AZ failover)

Trigger:
- RDS CPU > 70% sustained, OR
- DB connection count > 80% of max_connections after pgbouncer is sized correctly, OR
- Memory pressure (low free memory, page cache pressure)
Action: Modify-instance to a larger class (db.t4g.medium → db.r6g.large). RDS uses Multi-AZ failover to apply with ~30s downtime.
Cost: approximately doubles per class step

Lever 5: Add read replicas (Terraform PR + application code change)

Trigger: read:write ratio is read-dominated (>70%) and primary CPU is constrained
Action: provision 1–2 read replicas in Terraform. Application middleware (DatabaseRouter) routes GETs to replicas, mutations to primary.
Cost: ~equal to one primary instance per replica

Lever 6: Tune pgbouncer pool sizes (Terraform PR + pgbouncer task restart)

Trigger: pgbouncer queue depth rising (clients waiting for backend conn), but RDS connection count is well under ceiling
Action: raise default_pool_size per pgbouncer task. Apply via Terraform → ECS rolls pgbouncer tasks (one at a time per AZ to maintain availability).
Cost: none directly; uses existing RDS connection headroom

Lever 7: Storage growth (auto-scaling, no action)

Trigger: RDS free storage drops below threshold
Action: RDS auto-scales gp3 storage up to the configured maximum (currently 200 GB). No downtime.
Cost: linear with storage GB

Lever 8: Storage IOPS upgrade (Terraform PR)

Trigger: sustained read or write IOPS approaching the gp3 baseline ceiling (3000 IOPS for the default 50 GB)
Action: raise provisioned IOPS in Terraform; gp3 supports up to 16,000 IOPS independent of size
Cost: small additional charge per provisioned IOPS

Migration checklist (Phase 1 → Phase 2)

When the metrics in the scaling plan say "move to Phase 2," execute in order:

Pre-migration

[ ] Confirm the trigger condition has been sustained for at least 2 weeks (not a transient spike)
[ ] Take a manual RDS snapshot of the production database (named pre-phase-2-upgrade-YYYYMMDD)
[ ] Update the runbook with any environment-specific notes from the staging dry run

RDS instance upgrade

[ ] In Terraform, change the instance_class from db.t4g.medium to db.r6g.large
[ ] Run terraform plan and review the diff (should show only the instance modification)
[ ] Schedule the change for a low-traffic window — Multi-AZ failover causes ~30s downtime
[ ] Run terraform apply
[ ] Watch for the failover event in CloudWatch; verify the application reconnects cleanly
[ ] Confirm Performance Insights shows the new instance class

Add read replicas

[ ] In Terraform, add 1–2 aws_db_instance resources with replicate_source_db pointing at the primary
[ ] Run terraform apply — replicas spin up in 10–20 minutes
[ ] Wire up the application's DatabaseRouter middleware (if not already present) and configure DATABASE_REPLICA_URLS
[ ] Deploy a Core API release that uses the new middleware
[ ] Monitor replica lag in CloudWatch — should stay under 100ms in steady state

Scale Fargate fleet

[ ] Raise max_capacity for Core API to match anticipated load (e.g., 8 → 15)
[ ] Same for Clinic and Portal Next.js services
[ ] Increase min_capacity if base load justifies (e.g., 2 → 3 or 4)
[ ] Verify auto-scaling reacts as expected via a synthetic load test

Verify

[ ] Run end-to-end acceptance against staging (which mirrors the new shape) before flipping any production setting
[ ] Update aws-infrastructure.md → Cost: production day 1 with the new sizing if it has drifted from the documented values

What changes between staging and production

Sizing aside, the architectural shape is identical between environments — the same Terraform modules, the same task definitions, the same wire protocols. Differences:

Dimension	Staging	Production
Database	Aurora Serverless v2 (scale-to-zero)	RDS db.t4g.medium Multi-AZ
Database backup retention	1 day	7 days + manual snapshots
Compute pricing	Fargate Spot for app services	On-demand for everything
AZ posture	Single-AZ	Multi-AZ (RDS, pgbouncer, ALB)
NAT	t4g.nano NAT instance	NAT Gateway
Auto-scaling bounds	1 task min, 1 task max (most services)	2 task min, 8–10 task max
Cache replica	None (single Redis node)	1 replica for HA
Log retention	30 days	90 days
Cost target	Under $100/mo idle	~$545/mo + telemetry TBD

This is on purpose. Staging exists to validate the topology cheaply; production runs the same topology with redundancy added.

Scaling plan — high-level phase narrative
AWS infrastructure — full topology and cost
P44 — Connection pooling via pgbouncer
P45 — Redis-backed query cache (cache-aside)
Backup & DR — how Phase 2 storage/DR posture differs from Phase 1
IaC layout — where the auto-scaling and instance-class settings live in Terraform
Monitoring — alarms that fire the scaling-trigger conditions documented above

Scaling Architecture Reference ​

Connection math ​

The fan-out ​

Numbers, by environment ​

Why these numbers ​

Where the numbers come from ​

Fargate task sizing ​

Auto-scaling parameters ​

Database sizing ​

Production: RDS Postgres ​

Staging: Aurora Serverless v2 ​

When each scaling lever fires ​

Lever 1: Auto-scale Fargate tasks (automatic, no action) ​

Lever 2: Raise auto-scaling ceiling (Terraform PR) ​

Lever 3: Resize Fargate tasks (Terraform PR + rolling deploy) ​

Lever 4: Vertical RDS instance upgrade (downtime via Multi-AZ failover) ​

Lever 5: Add read replicas (Terraform PR + application code change) ​

Lever 6: Tune pgbouncer pool sizes (Terraform PR + pgbouncer task restart) ​

Lever 7: Storage growth (auto-scaling, no action) ​

Lever 8: Storage IOPS upgrade (Terraform PR) ​

Migration checklist (Phase 1 → Phase 2) ​

Pre-migration ​

RDS instance upgrade ​

Add read replicas ​

Scale Fargate fleet ​

Verify ​

What changes between staging and production ​

Related documentation ​