Skip to content

Scaling Architecture Reference

The technical companion to scaling.md. This document covers the numbers — connection math, Fargate task sizing, RDS / Aurora capacity, when each scaling lever fires — that the higher-level scaling plan abstracts over.

The architecture is deliberately conventional: single Postgres instance behind pgbouncer, horizontally-scaling stateless Fargate fleet, with read replicas and instance-class upsizing as the only Phase 2 levers. Sharding, per-tenant infrastructure, and TenantRouter-style request fanout are permanently out of scope per CLAUDE.md → Project Overview.

This document assumes the topology in aws-infrastructure.md. Numbers below reflect production sizing in eu-central-1.


Connection math

The single most load-bearing number in the architecture is how many database connections the fleet actually opens to RDS. This is what fails first under load if you size it wrong.

The fan-out

   Each Fargate task                    pgbouncer fleet                    RDS Postgres
                                                                          (max_connections=200)

   Core API task ─┬──── 25 admin pool ──┐
                  └──── 25 app pool   ──┤

   Core API task ─┬──── 25 admin pool ──┤
                  └──── 25 app pool   ──┤

   ... (5–10 tasks)                     ├──── 2 pgbouncer tasks ───── ~50 backend conns
                                        │     (default_pool_size=25
   Telemetry API task ─── pool ────────┤      per task,
                                        │      transaction mode,
   Telemetry API task ─── pool ────────┤      max_client_conn=1000)

   Migration runner ─── 1–2 conns ─────┘ (DIRECT, bypasses pgbouncer)
                                          via DATABASE_DIRECT_URL ────── ~5 conns

                                          + monitoring / Performance Insights ── ~5 conns
                                          + ad-hoc psql via SSM ──────────────── ~2 conns
                                                                                  ────────
                                                                                  ~62 conns used
                                                                                  ~138 headroom

Without pgbouncer, the same 5–10 Core API tasks × 50 connections each = 250–500 connections fanned out to RDS, each holding ~10 MB RSS server-side — RDS would OOM or refuse. pgbouncer in transaction-pool mode multiplexes so the application tier scales horizontally while the backend connection count stays small.

Numbers, by environment

SourceProductionStagingNotes
Core API tasks2–10 (auto-scaling)1 (Fargate Spot)Each runs pgx with two pools
pgx admin pool per taskDB_POOL_MAX=255Owner-role pool, RLS-bypassed
pgx app pool per taskDB_POOL_MAX=255Restricted-role pool, RLS-enforced
Telemetry API tasks2 (Multi-AZ)1Pool sizing TBD per aws-infrastructure.md → Telemetry sub-stack
pgbouncer tasks2 (one per AZ)1Each has its own backend pool
pgbouncer default_pool_size2525Backend conns per pgbouncer task per (user, db)
pgbouncer max_client_conn10001000Inbound conn ceiling per pgbouncer task
pgbouncer max_prepared_statements200200Required for pgx prepared-statement caching
RDS / Aurora max_connections200 (RDS db.t4g.medium)Aurora-managedAurora SLv2 derives this from ACU

Why these numbers

  • DB_POOL_MAX=25 per pool, two pools per task — pgx pools are per-process. With 5 tasks the application has 250 client-side connections to pgbouncer, plenty of headroom. With 10 tasks (auto-scaled) it's 500. Both well under pgbouncer's max_client_conn=1000.
  • default_pool_size=25 per pgbouncer task — 2 pgbouncer tasks × 25 = 50 backend connections to RDS. With max_connections=200, that's 25% utilized — comfortable headroom for migrations, monitoring, occasional admin connections.
  • max_client_conn=1000 per pgbouncer task — well above any realistic Core API fleet size. The cap exists to prevent runaway misconfiguration, not to be hit in practice.

Where the numbers come from

  • pgx pool sizing: internal/core/database/Connect calls construct *pgxpool.Pool with MaxConns=DB_POOL_MAX.
  • pgbouncer config: services/api/deploy/pgbouncer/pgbouncer.ini — same file used in local docker-compose and Fargate.
  • RDS max_connections: production parameter group, set to 200 explicitly (default for db.t4g.medium would be lower).

Fargate task sizing

Each service's task definition specifies cpu and memory. Sizing principles:

ServiceCPUMemoryWhy
Core API (prod)1 vCPU2 GBGo binary handles ~50–100 concurrent reqs per task; memory headroom for the pgx pools + envelope-encrypted key cache
Core API (staging)0.5 vCPU1 GBHalf-sized; staging traffic is rare enough that one task on Spot is fine
Telemetry APITBDTBDPending aws-infrastructure.md → Telemetry sub-stack decisions
Clinic / Portal (prod)0.5 vCPU1 GBServer-rendered Next.js handles ~30–50 concurrent SSR renders per task
Console (prod)0.25 vCPU0.5 GBSingle-task fixed; superadmin-only, low concurrency
pgbouncer0.25 vCPU0.5 GBSingle static binary, very small footprint, throughput-bound by network not CPU

Fargate prices per vCPU-hour and per GB-hour in eu-central-1. See aws-infrastructure.md → Cost: production day 1 for the all-in monthly numbers.

Auto-scaling parameters

Every horizontally-scaling service uses Application Auto Scaling with target-tracking on average CPU utilization:

ParameterProduction valueWhy
target_value (CPU %)70Above 70% sustained, latency starts climbing; below 70% there's headroom for spikes
scale_out_cooldown60 sAdd tasks fast — under-scaled is user-visible
scale_in_cooldown300 sRemove tasks slow — to avoid flapping after a brief lull
min_capacity2 (Multi-AZ HA) or 1 (Console)One task per AZ for HA; Console is a deliberate exception
max_capacity10 (Core API), 8 (clinic/portal), 2 (Console)Cost ceiling — review these annually as actual traffic data accumulates

Adjusting bounds is a Terraform PR + apply, no service restart. The auto-scaling target is updated in place.


Database sizing

Production: RDS Postgres

PhaseInstance classvCPURAMmax_connectionsStorage
Phase 1 (launch, 1–10 clinics)db.t4g.medium Multi-AZ24 GB20050 GB gp3, auto-scale to 200 GB
Phase 2 (10–50 clinics)db.r6g.large Multi-AZ216 GB500250 GB gp3
Phase 2 + read replicas+ 2× db.r6g.large replicas2 each16 GB each500 eachAsync WAL replication
Vertical ceilingdb.r6g.16xlarge64512 GB~5000up to 64 TiB

Vertical ceiling is included to make the headroom visible — the platform never approaches it within the documented multi-year ceiling. Single-DB scaling is a 5–10+ year story for the SMB-clinic shape.

Staging: Aurora Serverless v2

SettingValueWhy
Engineaurora-postgresql 17Same wire protocol + extension surface as RDS
Capacity0.5–2 ACU, scale-to-zeroIdle = $0/hr compute; wake in 5–15 s on first request
Multi-AZDisabled (single-AZ)Staging accepts the loss of HA for cost
Backup retention1 dayStaging-grade

ACU sizing reference (Aurora Serverless v2):

  • 0.5 ACU ≈ ~1 GB RAM, ~0.25 vCPU equivalent — sufficient for staging idle and light dev usage
  • 1 ACU ≈ ~2 GB RAM, ~0.5 vCPU
  • 2 ACU ≈ ~4 GB RAM, ~1 vCPU — burst ceiling for staging when devs are stressing it

The cluster auto-scales between min and max ACU based on connection count, CPU, and active sessions. Scale-to-zero kicks in after ~5 minutes of inactivity.


When each scaling lever fires

The triggers are deliberately metric-based, not time-based. Each lever has a clear "scale this when X" criterion.

Lever 1: Auto-scale Fargate tasks (automatic, no action)

  • Trigger: average CPU > 70% sustained
  • Action: ECS adds tasks within the configured min/max range
  • Cost: linear with task count

Lever 2: Raise auto-scaling ceiling (Terraform PR)

  • Trigger: auto-scaling already at max_capacity and CPU still elevated
  • Action: edit max_capacity in Terraform, terraform apply. No service restart.
  • Cost: no immediate change; the new ceiling only matters under sustained load

Lever 3: Resize Fargate tasks (Terraform PR + rolling deploy)

  • Trigger: CPU stays high even with many tasks (suggests per-request CPU is the bottleneck, not concurrency)
  • Action: raise cpu / memory in the task definition, terraform apply. ECS rolls a new task definition revision and does a rolling deploy.
  • Cost: linear with new task size × task count

Lever 4: Vertical RDS instance upgrade (downtime via Multi-AZ failover)

  • Trigger:
    • RDS CPU > 70% sustained, OR
    • DB connection count > 80% of max_connections after pgbouncer is sized correctly, OR
    • Memory pressure (low free memory, page cache pressure)
  • Action: Modify-instance to a larger class (db.t4g.medium → db.r6g.large). RDS uses Multi-AZ failover to apply with ~30s downtime.
  • Cost: approximately doubles per class step

Lever 5: Add read replicas (Terraform PR + application code change)

  • Trigger: read:write ratio is read-dominated (>70%) and primary CPU is constrained
  • Action: provision 1–2 read replicas in Terraform. Application middleware (DatabaseRouter) routes GETs to replicas, mutations to primary.
  • Cost: ~equal to one primary instance per replica

Lever 6: Tune pgbouncer pool sizes (Terraform PR + pgbouncer task restart)

  • Trigger: pgbouncer queue depth rising (clients waiting for backend conn), but RDS connection count is well under ceiling
  • Action: raise default_pool_size per pgbouncer task. Apply via Terraform → ECS rolls pgbouncer tasks (one at a time per AZ to maintain availability).
  • Cost: none directly; uses existing RDS connection headroom

Lever 7: Storage growth (auto-scaling, no action)

  • Trigger: RDS free storage drops below threshold
  • Action: RDS auto-scales gp3 storage up to the configured maximum (currently 200 GB). No downtime.
  • Cost: linear with storage GB

Lever 8: Storage IOPS upgrade (Terraform PR)

  • Trigger: sustained read or write IOPS approaching the gp3 baseline ceiling (3000 IOPS for the default 50 GB)
  • Action: raise provisioned IOPS in Terraform; gp3 supports up to 16,000 IOPS independent of size
  • Cost: small additional charge per provisioned IOPS

Migration checklist (Phase 1 → Phase 2)

When the metrics in the scaling plan say "move to Phase 2," execute in order:

Pre-migration

  • [ ] Confirm the trigger condition has been sustained for at least 2 weeks (not a transient spike)
  • [ ] Take a manual RDS snapshot of the production database (named pre-phase-2-upgrade-YYYYMMDD)
  • [ ] Update the runbook with any environment-specific notes from the staging dry run

RDS instance upgrade

  • [ ] In Terraform, change the instance_class from db.t4g.medium to db.r6g.large
  • [ ] Run terraform plan and review the diff (should show only the instance modification)
  • [ ] Schedule the change for a low-traffic window — Multi-AZ failover causes ~30s downtime
  • [ ] Run terraform apply
  • [ ] Watch for the failover event in CloudWatch; verify the application reconnects cleanly
  • [ ] Confirm Performance Insights shows the new instance class

Add read replicas

  • [ ] In Terraform, add 1–2 aws_db_instance resources with replicate_source_db pointing at the primary
  • [ ] Run terraform apply — replicas spin up in 10–20 minutes
  • [ ] Wire up the application's DatabaseRouter middleware (if not already present) and configure DATABASE_REPLICA_URLS
  • [ ] Deploy a Core API release that uses the new middleware
  • [ ] Monitor replica lag in CloudWatch — should stay under 100ms in steady state

Scale Fargate fleet

  • [ ] Raise max_capacity for Core API to match anticipated load (e.g., 8 → 15)
  • [ ] Same for Clinic and Portal Next.js services
  • [ ] Increase min_capacity if base load justifies (e.g., 2 → 3 or 4)
  • [ ] Verify auto-scaling reacts as expected via a synthetic load test

Verify


What changes between staging and production

Sizing aside, the architectural shape is identical between environments — the same Terraform modules, the same task definitions, the same wire protocols. Differences:

DimensionStagingProduction
DatabaseAurora Serverless v2 (scale-to-zero)RDS db.t4g.medium Multi-AZ
Database backup retention1 day7 days + manual snapshots
Compute pricingFargate Spot for app servicesOn-demand for everything
AZ postureSingle-AZMulti-AZ (RDS, pgbouncer, ALB)
NATt4g.nano NAT instanceNAT Gateway
Auto-scaling bounds1 task min, 1 task max (most services)2 task min, 8–10 task max
Cache replicaNone (single Redis node)1 replica for HA
Log retention30 days90 days
Cost targetUnder $100/mo idle~$545/mo + telemetry TBD

This is on purpose. Staging exists to validate the topology cheaply; production runs the same topology with redundancy added.