Skip to content

Scaling Architecture & Infrastructure Strategy

Overview

RestartiX Platform is designed for multi-tenant healthcare SaaS. This document outlines the infrastructure evolution from Phase 1 (few organizations, 10k patients each) through Phase 4 (1000+ organizations, millions of patients).

Key Constraint: Our RLS (Row-Level Security) pattern requires holding a dedicated PostgreSQL connection for the entire request duration. This is the correct approach for security but creates connection pool pressure that shapes our scaling strategy.


Core Architecture Pattern

Request Flow with RLS

1. Request arrives → ClerkAuth middleware validates session
2. OrganizationContext middleware:
   - Acquires dedicated connection from pool
   - Sets RLS session variables: app.current_user_id, app.current_org_id, app.current_role
   - Stores connection in request context
3. All queries use this connection (via database.ConnFromContext)
4. Connection released when request completes

Why this matters for scaling:

  • Connection held for entire request duration (not just query time)
  • Slow request = connection blocked for seconds
  • DB_POOL_MAX must be >= peak concurrent requests
  • PostgreSQL max_connections limit becomes critical bottleneck

Phase 1: Single Shared Instance (1-10 Organizations)

Target Profile

  • Organizations: 1-10
  • Total patients: 10k-100k
  • Database size: 10-50GB
  • Peak concurrent requests: 50-100
  • Timeline: Launch → First 12 months

Infrastructure

AWS App Runner Services
├── Core API
│   ├── Instances: 1-3 (auto-scale on concurrent requests)
│   ├── RAM per instance: 512MB-1GB
│   └── Connection pool: DB_POOL_MAX=30

├── Telemetry API
│   ├── Instances: 1-2
│   ├── RAM per instance: 512MB
│   └── Connection pool: DB_POOL_MAX=20

├── AWS RDS PostgreSQL
│   ├── Instance: db.t4g.medium (2 vCPU, 4GB RAM)
│   ├── Storage: 50GB gp3
│   ├── Max connections: 200
│   ├── Backups: Automated daily, 7-day retention + PITR
│   └── HIPAA: BAA available (enable in AWS console)

├── AWS ElastiCache Redis
│   ├── Instance: cache.t4g.micro (1GB RAM)
│   └── Usage: Rate limiting, sessions, webhook idempotency

└── VPC + VPC Connector
    └── App Runner connects to RDS/Redis via private VPC

Cloudflare (Edge)
├── DDoS protection
├── WAF + OWASP rulesets
├── TLS termination
└── Edge rate limiting: 1000 req/min per IP

AWS S3
├── Bucket: restartix-uploads-prod
├── Encryption: SSE-S3
└── Lifecycle: Archive to Glacier after 1 year

For the full AWS infrastructure setup, see AWS Infrastructure.

Connection Math

Total PostgreSQL connections needed:
- Core API: 3 replicas × 20 pool = 60 connections
- Telemetry API: 2 replicas × 15 pool = 30 connections
- Background jobs: 5 connections
- PgAdmin/monitoring: 5 connections
────────────────────────────────────────────────
Total: 100 connections

PostgreSQL max_connections: 200
Headroom: 100 connections (50%)

Expected Performance

MetricTargetAlert Threshold
API response time (p99)< 500ms> 1s
Database active connections50-100> 160 (80% of max)
Connection pool wait count0> 0 for 1+ min
Database size< 50GB> 80GB
Error rate (5xx)< 0.1%> 1%

RDS Configuration (Production)

Critical settings:

bash
# 1. Enable automated backups with PITR
aws rds modify-db-instance \
  --db-instance-identifier restartix-prod \
  --backup-retention-period 7

# 2. Enable encryption at rest (set at creation)
# --storage-encrypted flag on create-db-instance

# 3. Enable Multi-AZ for high availability
aws rds modify-db-instance \
  --db-instance-identifier restartix-prod \
  --multi-az

# 4. Verify connection via VPC Connector
# App Runner services connect to RDS through the private VPC

HIPAA Compliance:

bash
# Enable BAA via AWS console (free, just enable in account settings)
# AWS RDS encrypts at rest by default when --storage-encrypted is set
# No additional enterprise plan needed for BAA

Cost Estimate

AWS App Runner:
- Core API (3 instances): ~$45/mo
- Telemetry API (2 instances): ~$30/mo
App Runner Total: ~$75/mo

AWS RDS PostgreSQL:
- db.t4g.medium instance: ~$60/mo
- Storage (50GB gp3): ~$10/mo
- Automated backups: included
RDS Total: ~$70/mo

AWS ElastiCache Redis:
- cache.t4g.micro: ~$15/mo

AWS Secrets Manager: ~$5/mo

AWS S3: ~$10-50/mo (depends on upload volume)
Cloudflare: Free tier (sufficient for Phase 1)
Clerk: $200/mo (Enterprise with HIPAA BAA)
Daily.co: ~$100/mo

TOTAL: ~$475-565/month
Cost per patient: ~$5-56/patient/year (depends on org count)

When to Graduate to Phase 2

Hard limits (must upgrade):

  • ✅ PostgreSQL active connections consistently > 150
  • ✅ Database size > 100GB
  • ✅ API p99 latency > 1s
  • ✅ Any organization > 50k patients

Soft signals (plan upgrade):

  • ✅ More than 10 organizations
  • ✅ Enterprise customer requiring SLA guarantees
  • ✅ Connection pool wait events occurring

Phase 2: Shared Tier + Read Replicas (10-50 Organizations)

Target Profile

  • Organizations: 10-50
  • Total patients: 100k-500k
  • Database size: 50-250GB
  • Peak concurrent requests: 200-500
  • Timeline: Months 12-24

Key Change: Add Read Replicas

Problem: Write queries (mutations) + read queries (GET requests) compete for the same connection pool.

Solution: Route read-only queries to read replicas.

Infrastructure Evolution: RDS Read Replicas

AWS App Runner Services (Core API + Telemetry)
↓ (via VPC Connector)
AWS RDS PostgreSQL
├── Primary (read-write)
│   ├── Instance: db.r6g.large (2 vCPU, 16GB RAM)
│   ├── Storage: 250GB gp3 (10k IOPS)
│   ├── max_connections: 500
│   └── Endpoint: restartix-prod.xxxxx.rds.amazonaws.com

├── Read Replica 1 (read-only, same spec as primary)
│   └── Endpoint: restartix-prod-replica1.xxxxx.rds.amazonaws.com

└── Read Replica 2 (read-only, same spec as primary)
    └── Endpoint: restartix-prod-replica2.xxxxx.rds.amazonaws.com

Connection Routing:
- Mutations (POST/PUT/PATCH/DELETE) → Primary
- Queries (GET) → Round-robin across read replicas
- RLS still works (session vars set on all connections)

Cost: ~$600-800/mo

Read Replica Routing Middleware

go
// internal/middleware/read_replica.go

type DatabaseRouter struct {
    primary  *pgxpool.Pool
    replicas []*pgxpool.Pool
    rr       uint32 // round-robin counter (atomic)
}

func (dr *DatabaseRouter) AcquireConnection(ctx context.Context, r *http.Request) (*pgxpool.Conn, error) {
    // Mutations always use primary
    if r.Method != http.MethodGet {
        return dr.primary.Acquire(ctx)
    }

    // Reads use round-robin across replicas
    idx := atomic.AddUint32(&dr.rr, 1) % uint32(len(dr.replicas))
    return dr.replicas[idx].Acquire(ctx)
}

Connection Math (with replicas)

Primary (write path):
- Mutations: ~30% of requests
- Core API: 3 replicas × 10 pool = 30 connections
- Telemetry API: 2 replicas × 5 pool = 10 connections
Total primary: 40 connections

Read Replica 1 + 2 (read path):
- Queries: ~70% of requests
- Core API: 3 replicas × 10 pool × 2 replicas = 60 connections
- Telemetry API: 2 replicas × 10 pool × 2 replicas = 40 connections
Total per replica: 50 connections

max_connections: 500 (plenty of headroom)

Setup Procedure: Add RDS Read Replicas

bash
# Step 1: Create read replicas via AWS CLI
aws rds create-db-instance-read-replica \
  --db-instance-identifier restartix-prod-replica1 \
  --source-db-instance-identifier restartix-prod

aws rds create-db-instance-read-replica \
  --db-instance-identifier restartix-prod-replica2 \
  --source-db-instance-identifier restartix-prod

# Step 2: Wait for replicas to become available
aws rds wait db-instance-available \
  --db-instance-identifier restartix-prod-replica1
aws rds wait db-instance-available \
  --db-instance-identifier restartix-prod-replica2

# Step 3: Get connection endpoints
aws rds describe-db-instances \
  --db-instance-identifier restartix-prod \
  --query 'DBInstances[0].Endpoint'

aws rds describe-db-instances \
  --db-instance-identifier restartix-prod-replica1 \
  --query 'DBInstances[0].Endpoint'

aws rds describe-db-instances \
  --db-instance-identifier restartix-prod-replica2 \
  --query 'DBInstances[0].Endpoint'

# Step 4: Update environment variables in AWS Secrets Manager
aws secretsmanager update-secret \
  --secret-id restartix-prod/env \
  --secret-string '{"DATABASE_URL":"postgres://restartix_admin:[email protected]/restartix","DATABASE_REPLICA_1_URL":"postgres://restartix_admin:[email protected]/restartix","DATABASE_REPLICA_2_URL":"postgres://restartix_admin:[email protected]/restartix",...}'

# Step 5: Deploy with read replica support
git push origin main  # GitHub Actions handles ECR build + App Runner deploy

# Step 6: Verify replication lag < 1s
# On read replica:
SELECT EXTRACT(EPOCH FROM (now() - pg_last_xact_replay_timestamp())) AS lag_seconds;
# Expected: < 1 second

Cost Estimate

AWS App Runner:
- Core API (5 instances): $75/mo
- Telemetry API (3 instances): $45/mo
App Runner Total: $120/mo

AWS RDS PostgreSQL:
- Primary (db.r6g.large): $200/mo
- Read Replica 1: $200/mo
- Read Replica 2: $200/mo
- Backup storage (50GB): $10/mo
RDS Total: $610/mo

AWS ElastiCache Redis: ~$30/mo
AWS Secrets Manager: ~$5/mo

AWS S3: ~$50-100/mo
Cloudflare: Free or $20/mo (Pro)
Clerk: $200/mo
Daily.co: ~$200/mo (more orgs)

TOTAL: ~$1,235-1,385/month
Cost per patient: ~$2-3/patient/year (500k patients)

When to Graduate to Phase 3

Hard limits:

  • ✅ More than 50 organizations
  • ✅ Database size > 500GB
  • ✅ Any single organization > 100k patients
  • ✅ Enterprise customer requesting dedicated infrastructure

Soft signals:

  • ✅ Primary database CPU consistently > 70%
  • ✅ Replication lag > 5 seconds
  • ✅ Customer complaints about "noisy neighbor" performance

Phase 3: Hybrid Model (50-100 Organizations)

Target Profile

  • Shared tier: 90 small/medium organizations (< 10k patients each)
  • Enterprise tier: 10 large organizations (50k+ patients each)
  • Total patients: 500k-1M
  • Timeline: Months 24-36

Architecture: Two-Tier System

┌─────────────────────────────────────────────────────────────┐
│ SHARED TIER (90 organizations)                              │
├─────────────────────────────────────────────────────────────┤
│ AWS App Runner: restartix-shared                            │
│ ├── Core API (5 instances)                                  │
│ ├── Telemetry API (3 instances)                              │
│ └── ElastiCache Redis                                       │
│                                                              │
│ AWS RDS: restartix-shared-cluster                           │
│ ├── Primary (db.r6g.xlarge: 4 vCPU, 32GB RAM)              │
│ ├── Read Replica 1                                          │
│ └── Read Replica 2                                          │
└─────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────┐
│ ENTERPRISE TIER (10 organizations, dedicated services)      │
├─────────────────────────────────────────────────────────────┤
│ AWS App Runner: restartix-org-101                           │
│ ├── Core API (2 instances)                                  │
│ ├── Telemetry API (1 instance)                               │
│ └── ElastiCache Redis (dedicated)                           │
│                                                              │
│ AWS RDS: restartix-org-101                                  │
│ └── Primary (db.r6g.large: 2 vCPU, 16GB RAM)               │
│                                                              │
│ [Repeat for each enterprise org...]                         │
└─────────────────────────────────────────────────────────────┘

Tenant Assignment Logic

Shared Tier (default):

  • Organizations with < 50k patients
  • No SLA guarantees (best-effort)
  • Pricing: $99-199/month per org

Enterprise Tier (opt-in):

  • Organizations with > 50k patients OR
  • Organizations requiring custom SLA OR
  • Regulatory requirement for data isolation (rare)
  • Pricing: $999-2,999/month per org

Tenant Routing Table

sql
-- Stored in a separate "routing database" (tiny PostgreSQL instance)
CREATE TABLE tenant_shards (
    organization_id BIGINT PRIMARY KEY,
    shard_type      TEXT NOT NULL,  -- 'shared' or 'enterprise'
    shard_id        TEXT NOT NULL,  -- 'shared' or 'org-101', 'org-102', etc.

    -- Connection details
    database_primary_url    TEXT NOT NULL,
    database_replica_urls   TEXT[],  -- Array of replica URLs

    -- Metadata
    max_patients            INT,
    sla_tier                TEXT,  -- 'standard', 'enterprise'
    billing_tier            TEXT,  -- 'shared-99', 'enterprise-999', etc.

    created_at              TIMESTAMPTZ DEFAULT NOW(),
    updated_at              TIMESTAMPTZ DEFAULT NOW()
);

-- Example data
INSERT INTO tenant_shards (organization_id, shard_type, shard_id, database_primary_url) VALUES
(1, 'shared', 'shared', 'postgres://...restartix-shared-cluster...'),
(2, 'shared', 'shared', 'postgres://...restartix-shared-cluster...'),
-- ... 90 orgs on shared
(101, 'enterprise', 'org-101', 'postgres://...restartix-org-101...'),
(102, 'enterprise', 'org-102', 'postgres://...restartix-org-102...');

Routing Middleware

go
// internal/middleware/tenant_router.go

type TenantRouter struct {
    routingDB *pgxpool.Pool  // Tiny DB with tenant_shards table
    pools     map[string]*DatabaseRouter  // shard_id → pool(s)
    mu        sync.RWMutex
}

func (tr *TenantRouter) GetPool(ctx context.Context, orgID int64) (*DatabaseRouter, error) {
    // 1. Lookup shard assignment (cached in Redis, TTL 5 min)
    cacheKey := fmt.Sprintf("shard:%d", orgID)
    if cached, err := tr.redis.Get(ctx, cacheKey).Result(); err == nil {
        return tr.pools[cached], nil
    }

    // 2. Query routing database
    var shard struct {
        ShardID              string
        DatabasePrimaryURL   string
        DatabaseReplicaURLs  []string
    }
    err := tr.routingDB.QueryRow(ctx, `
        SELECT shard_id, database_primary_url, database_replica_urls
        FROM tenant_shards
        WHERE organization_id = $1
    `, orgID).Scan(&shard.ShardID, &shard.DatabasePrimaryURL, &shard.DatabaseReplicaURLs)

    if err != nil {
        return nil, fmt.Errorf("org %d not found in routing table", orgID)
    }

    // 3. Cache for 5 minutes
    tr.redis.Set(ctx, cacheKey, shard.ShardID, 5*time.Minute)

    // 4. Return existing pool or create new one
    tr.mu.RLock()
    pool, exists := tr.pools[shard.ShardID]
    tr.mu.RUnlock()

    if !exists {
        pool = tr.createPoolForShard(shard)
        tr.mu.Lock()
        tr.pools[shard.ShardID] = pool
        tr.mu.Unlock()
    }

    return pool, nil
}

// Updated OrganizationContext middleware
func OrganizationContext(router *TenantRouter) func(http.Handler) http.Handler {
    return func(next http.Handler) http.Handler {
        return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
            user := UserFromContext(r.Context())
            orgID := user.CurrentOrganizationID

            // Get the correct database pool for this org
            dbRouter, err := router.GetPool(r.Context(), orgID)
            if err != nil {
                slog.Error("failed to route tenant", "org_id", orgID, "error", err)
                httputil.InternalError(w)
                return
            }

            // Acquire connection from the correct shard
            conn, err := dbRouter.AcquireConnection(r.Context(), r)
            if err != nil {
                slog.Error("failed to acquire connection", "error", err)
                httputil.InternalError(w)
                return
            }
            defer conn.Release()

            // Set RLS context as before
            database.SetRLSContext(r.Context(), conn, user.ID, orgID, user.Role)

            ctx := context.WithValue(r.Context(), dbConnKey, conn)
            next.ServeHTTP(w, r.WithContext(ctx))
        })
    }
}

Automated Enterprise Provisioning

go
// internal/jobs/provision_enterprise.go

func ProvisionEnterpriseOrg(ctx context.Context, orgID int64) error {
    // 1. Provision AWS RDS instance
    rdsInstance := rds.CreateDBInstance(&rds.CreateDBInstanceInput{
        DBInstanceIdentifier: aws.String(fmt.Sprintf("restartix-org-%d", orgID)),
        DBInstanceClass:      aws.String("db.r6g.large"),
        Engine:               aws.String("postgres"),
        EngineVersion:        aws.String("16.2"),
        AllocatedStorage:     aws.Int64(250),
        StorageEncrypted:     aws.Bool(true),
        MultiAZ:              aws.Bool(true),
    })

    // 2. Wait for RDS to be available
    rds.WaitUntilDBInstanceAvailable(/* ... */)

    // 3. Run migrations
    runMigrations(rdsInstance.Endpoint)

    // 4. Store secrets in AWS Secrets Manager
    secretsManager.CreateSecret(&secretsmanager.CreateSecretInput{
        Name:         aws.String(fmt.Sprintf("restartix-org-%d/env", orgID)),
        SecretString: aws.String(fmt.Sprintf(`{"ORGANIZATION_ID":"%d","DATABASE_URL":"%s"}`, orgID, rdsInstance.Endpoint.Address)),
    })

    // 5. Update tenant_shards table
    routingDB.Exec(ctx, `
        INSERT INTO tenant_shards (organization_id, shard_type, shard_id, database_primary_url)
        VALUES ($1, 'enterprise', $2, $3)
    `, orgID, fmt.Sprintf("org-%d", orgID), rdsInstance.Endpoint.Address)

    // 6. Create App Runner services for the Core API + Telemetry
    appRunner.CreateService(&apprunner.CreateServiceInput{
        ServiceName: aws.String(fmt.Sprintf("restartix-org-%d-api", orgID)),
        // ... ECR image, VPC connector, etc.
    })

    return nil
}

Cost Estimate

Shared Tier:
- App Runner (Core API + Telemetry): $120/mo
- AWS RDS (db.r6g.xlarge + 2 replicas): $1,200/mo
- ElastiCache Redis: $30/mo
Shared Total: $1,350/mo
Revenue: 90 orgs × $150/mo = $13,500/mo
Margin: $12,150/mo (90%)

Enterprise Tier (10 orgs):
- App Runner per org: $50/mo
- AWS RDS per org: $200/mo
- ElastiCache Redis per org: $15/mo
Per-org cost: $265/mo
Total: 10 × $265 = $2,650/mo
Revenue: 10 orgs × $1,500/mo = $15,000/mo
Margin: $12,350/mo (82%)

COMBINED TOTAL: $4,000/mo infrastructure cost
COMBINED REVENUE: $28,500/mo
GROSS MARGIN: $24,500/mo (86%)

When to Graduate to Phase 4

Hard limits:

  • ✅ More than 100 organizations on shared tier
  • ✅ Shared tier database size > 1TB
  • ✅ More than 20 enterprise organizations (manual provisioning too slow)

Soft signals:

  • ✅ Enterprise provisioning takes > 2 hours (needs automation)
  • ✅ Support burden from shared tier "noisy neighbor" issues
  • ✅ Regulatory requirement for geographic data residency (EU vs US shards)

Phase 4: Multi-Shard with Geographic Distribution (100-1000 Organizations)

Target Profile

  • Shared tiers: 5-10 regional shards (20-50 orgs each)
  • Enterprise tier: 50-100 dedicated projects
  • Total patients: 1M-10M
  • Geographic regions: US, EU, possibly APAC
  • Timeline: Months 36+

Architecture: Geographic Sharding

┌──────────────────────────────────────────────────────────┐
│ US REGION                                                │
├──────────────────────────────────────────────────────────┤
│ Shared Shard: US-1 (50 orgs)                            │
│ Shared Shard: US-2 (50 orgs)                            │
│ Enterprise: 30 dedicated projects                        │
└──────────────────────────────────────────────────────────┘

┌──────────────────────────────────────────────────────────┐
│ EU REGION (GDPR data residency)                          │
├──────────────────────────────────────────────────────────┤
│ Shared Shard: EU-1 (30 orgs)                            │
│ Enterprise: 10 dedicated projects                        │
└──────────────────────────────────────────────────────────┘

Global Routing Table (DynamoDB or managed PostgreSQL)
├── Organization → Shard assignment
├── Shard → Database endpoints
└── Cached in Redis (1 min TTL)

Shard Assignment Strategy

Factors:

  1. Geographic preference (GDPR compliance)
  2. Organization size (large orgs → dedicated)
  3. Load balancing (distribute evenly across shared shards)
sql
-- Enhanced tenant_shards table
ALTER TABLE tenant_shards ADD COLUMN region TEXT;  -- 'us-east-1', 'eu-west-1'
ALTER TABLE tenant_shards ADD COLUMN shard_load_score INT;  -- For load balancing

-- Shard assignment algorithm
CREATE OR REPLACE FUNCTION assign_shard(
    p_org_id BIGINT,
    p_region TEXT,
    p_estimated_patients INT
) RETURNS TEXT AS $$
DECLARE
    v_shard_id TEXT;
BEGIN
    -- Enterprise tier if > 50k patients
    IF p_estimated_patients > 50000 THEN
        v_shard_id := 'org-' || p_org_id;
        -- Trigger provisioning job
        PERFORM pg_notify('provision_enterprise', p_org_id::text);
        RETURN v_shard_id;
    END IF;

    -- Shared tier: find least-loaded shard in region
    SELECT shard_id INTO v_shard_id
    FROM shard_capacity
    WHERE region = p_region
      AND shard_type = 'shared'
      AND current_orgs < max_orgs
      AND current_patients + p_estimated_patients <= max_patients
    ORDER BY current_orgs ASC
    LIMIT 1;

    -- If no capacity, create new shared shard
    IF v_shard_id IS NULL THEN
        v_shard_id := p_region || '-' || (
            SELECT COUNT(*) + 1 FROM tenant_shards
            WHERE region = p_region AND shard_type = 'shared'
        );
        PERFORM pg_notify('provision_shared_shard', v_shard_id);
    END IF;

    RETURN v_shard_id;
END;
$$ LANGUAGE plpgsql;

Cross-Shard Operations

Problem: Some queries need to aggregate across shards (analytics, superadmin dashboards).

Solutions:

  1. Telemetry Service Aggregation

    • Telemetry collects audit events from all shards
    • ClickHouse aggregates across organizations
    • Superadmin dashboard queries Telemetry, not individual shards
  2. Federated Queries (when needed)

    go
    // internal/superadmin/cross_shard_query.go
    
    func GetAllOrganizations(ctx context.Context) ([]*Organization, error) {
        shards := getAllShards() // Query routing DB
    
        results := make([]*Organization, 0)
        errChan := make(chan error, len(shards))
        resultChan := make(chan []*Organization, len(shards))
    
        // Fan-out: query all shards in parallel
        for _, shard := range shards {
            go func(s Shard) {
                orgs, err := s.Pool.Query(ctx, "SELECT * FROM organizations")
                if err != nil {
                    errChan <- err
                    return
                }
                resultChan <- orgs
            }(shard)
        }
    
        // Fan-in: collect results
        for i := 0; i < len(shards); i++ {
            select {
            case orgs := <-resultChan:
                results = append(results, orgs...)
            case err := <-errChan:
                return nil, err
            }
        }
    
        return results, nil
    }
  3. Materialized Views in Routing DB

    • Nightly ETL job aggregates key metrics from all shards
    • Routing DB has organization_summary table
    • Fast reads, eventual consistency (good enough for dashboards)

Cost Estimate (100 orgs shared, 50 enterprise)

Shared Tier (5 shards × 20 orgs each):
- App Runner (5 service sets): 5 × $120 = $600/mo
- AWS RDS (5 × db.r6g.xlarge): 5 × $400 = $2,000/mo
- ElastiCache Redis (5 clusters): 5 × $30 = $150/mo
Shared Total: $2,750/mo
Revenue: 100 orgs × $150/mo = $15,000/mo

Enterprise Tier (50 dedicated service sets):
- Infrastructure: 50 × $265 = $13,250/mo
Revenue: 50 orgs × $1,500/mo = $75,000/mo

COMBINED TOTAL: $16,000/mo infrastructure cost
COMBINED REVENUE: $90,000/mo
GROSS MARGIN: $74,000/mo (82%)

Cross-Shard Patient Identity

The problem

patient_persons is the patient's portable profile — it has no organization_id. When every shard is scoped to organizations, where does this table live? And when a patient registered at Clinic A (Shard 1) walks into Clinic B (Shard 2), how does Shard 2 find them?

Architecture: Global Patient Registry + Home Shard

The solution splits into two parts:

  1. Patient data stays in a "home shard" — whichever shard the patient first registered in
  2. A lightweight registry in the routing DB enables cross-shard discovery
┌──────────────────────────────────────────────────────────────────────┐
│ ROUTING / GLOBAL DB                                                  │
│ (already exists for tenant_shards)                                   │
├──────────────────────────────────────────────────────────────────────┤
│                                                                      │
│  tenant_shards                    patient_person_registry             │
│  ┌──────────────────────┐         ┌───────────────────────────────┐  │
│  │ organization_id (PK) │         │ patient_person_id (PK)        │  │
│  │ shard_id             │         │ user_id (unique, nullable)    │  │
│  │ database_primary_url │         │ phone_hash (indexed)          │  │
│  │ ...                  │         │ email_hash (indexed, nullable)│  │
│  └──────────────────────┘         │ home_shard_id                 │  │
│                                   │ created_at                    │  │
│                                   └───────────────────────────────┘  │
│                                                                      │
└──────────────────────────────────────────────────────────────────────┘

┌─────────────────────────────┐     ┌─────────────────────────────────┐
│ SHARD 1                     │     │ SHARD 2                         │
├─────────────────────────────┤     ├─────────────────────────────────┤
│                             │     │                                 │
│  patient_persons            │     │  patient_persons                │
│  ┌───────────────────────┐  │     │  ┌───────────────────────┐     │
│  │ id: 99                │  │     │  │ (empty — patient 99   │     │
│  │ name: "Maria López"   │  │     │  │  lives in Shard 1)    │     │
│  │ dob, blood_type, ...  │  │     │  └───────────────────────┘     │
│  └───────────────────────┘  │     │                                 │
│                             │     │  patients (org-link)            │
│  patients (org-link)        │     │  ┌───────────────────────┐     │
│  ┌───────────────────────┐  │     │  │ org_id: Clinic B      │     │
│  │ org_id: Clinic A      │  │     │  │ patient_person_id: 99 │     │
│  │ patient_person_id: 99 │  │     │  │ profile_shared: false │     │
│  │ profile_shared: true  │  │     │  └───────────────────────┘     │
│  └───────────────────────┘  │     │                                 │
│                             │     │  appointments, forms, files...  │
│  appointments, forms, ...   │     │  (Clinic B's clinical records)  │
│  (Clinic A's clinical data) │     │                                 │
└─────────────────────────────┘     └─────────────────────────────────┘

Key principle: The full patient_persons row stays in the home shard. Other shards reference it by ID and fetch on demand. Clinical records (appointments, forms, files) always stay local to the shard where they were created.

Global patient registry schema

sql
-- Stored in the routing database (same DB as tenant_shards)
-- This is a lightweight lookup table, NOT the full patient profile

CREATE TABLE patient_person_registry (
    patient_person_id   BIGINT PRIMARY KEY,
    user_id             BIGINT UNIQUE,              -- for login-based lookup
    phone_hash          TEXT,                        -- SHA-256 of normalized phone
    email_hash          TEXT,                        -- SHA-256 of email (if available)
    home_shard_id       TEXT NOT NULL,               -- shard where patient_persons row lives
    created_at          TIMESTAMPTZ DEFAULT NOW()
);

CREATE INDEX idx_patient_registry_phone ON patient_person_registry (phone_hash);
CREATE INDEX idx_patient_registry_email ON patient_person_registry (email_hash)
    WHERE email_hash IS NOT NULL;
CREATE INDEX idx_patient_registry_user  ON patient_person_registry (user_id)
    WHERE user_id IS NOT NULL;

This table stores hashes only — no PII. It answers one question: "Does this patient exist, and which shard holds their profile?"

Cross-shard registration flow

When a patient registered at Clinic A (Shard 1) registers at Clinic B (Shard 2):

Patient walks into Clinic B, gives phone number

    ├── 1. DISCOVER: Query routing DB
    │      SELECT home_shard_id, patient_person_id
    │      FROM patient_person_registry
    │      WHERE phone_hash = sha256(normalize(phone))
    │      → Found: patient_person_id=99, home_shard=shard_1

    ├── 2. FETCH: Read portable profile from home shard
    │      Shard 1 → SELECT name, date_of_birth, blood_type, ...
    │                FROM patient_persons WHERE id = 99
    │      → Single row by PK (sub-millisecond)

    ├── 3. LINK: Create org-patient link in Shard 2
    │      INSERT INTO patients (organization_id, patient_person_id, profile_shared)
    │      VALUES (clinic_b_id, 99, false)
    │      → Consent pending, Clinic B sees name only

    └── 4. CONSENT: Normal consent flow
           Patient signs consent form at Clinic B
           → profile_shared = true
           → Clinic B staff now sees full portable profile

Implementation

go
// internal/patients/cross_shard.go

// DiscoverPatient checks the global registry for an existing patient identity.
// Returns the patient_person_id and home shard, or empty if new patient.
func DiscoverPatient(ctx context.Context, routingDB *pgxpool.Pool, phone string) (*PatientRef, error) {
    phoneHash := sha256Hex(normalizePhone(phone))

    var ref PatientRef
    err := routingDB.QueryRow(ctx, `
        SELECT patient_person_id, home_shard_id
        FROM patient_person_registry
        WHERE phone_hash = $1
    `, phoneHash).Scan(&ref.PatientPersonID, &ref.HomeShardID)

    if errors.Is(err, pgx.ErrNoRows) {
        return nil, nil // New patient — no existing identity
    }
    return &ref, err
}

// FetchPortableProfile reads the patient_persons row from the home shard.
// Used when a patient registers at a clinic on a different shard.
func FetchPortableProfile(ctx context.Context, shards *TenantRouter, ref *PatientRef) (*PatientPerson, error) {
    pool, err := shards.GetPoolByShard(ctx, ref.HomeShardID)
    if err != nil {
        return nil, fmt.Errorf("resolve home shard %s: %w", ref.HomeShardID, err)
    }

    var p PatientPerson
    err = pool.QueryRow(ctx, `
        SELECT id, name, date_of_birth, sex, blood_type, allergies, chronic_conditions,
               emergency_contact_name, insurance_entries
        FROM patient_persons
        WHERE id = $1
    `, ref.PatientPersonID).Scan(
        &p.ID, &p.Name, &p.DateOfBirth, &p.Sex, &p.BloodType, &p.Allergies,
        &p.ChronicConditions, &p.EmergencyContactName, &p.InsuranceEntries,
    )
    return &p, err
}

// RegisterPatientAtOrg creates the org-patient link in the current shard
// and registers in the global registry if this is a new patient.
func RegisterPatientAtOrg(ctx context.Context, routingDB *pgxpool.Pool, localConn *pgxpool.Conn, req RegisterRequest) error {
    ref, err := DiscoverPatient(ctx, routingDB, req.Phone)
    if err != nil {
        return err
    }

    if ref == nil {
        // New patient — create patient_persons in current shard + register globally
        var personID int64
        err = localConn.QueryRow(ctx, `
            INSERT INTO patient_persons (name, date_of_birth, sex, phone_encrypted)
            VALUES ($1, $2, $3, $4)
            RETURNING id
        `, req.Name, req.DOB, req.Sex, encrypt(req.Phone)).Scan(&personID)
        if err != nil {
            return err
        }

        // Register in global lookup
        _, err = routingDB.Exec(ctx, `
            INSERT INTO patient_person_registry (patient_person_id, phone_hash, home_shard_id)
            VALUES ($1, $2, $3)
        `, personID, sha256Hex(normalizePhone(req.Phone)), req.CurrentShardID)
        if err != nil {
            return err
        }

        ref = &PatientRef{PatientPersonID: personID, HomeShardID: req.CurrentShardID}
    }

    // Create org-patient link in current shard
    _, err = localConn.Exec(ctx, `
        INSERT INTO patients (organization_id, patient_person_id, profile_shared)
        VALUES ($1, $2, false)
    `, req.OrgID, ref.PatientPersonID)
    return err
}

When to introduce this

PhaseWhat changes
1-2Nothing. Single database, patient_persons and patients are in the same DB. The two-table split already exists for the consent model.
3Add patient_person_registry to the routing DB. Populate it from existing patient_persons rows (one-time migration). Cross-shard reads only needed for enterprise-tier orgs on separate shards.
4Same pattern, more shards. Add Redis cache (5 min TTL) for registry lookups and fetched profiles if latency matters.

Profile reads at the application layer

When Clinic B (Shard 2) needs to display a patient whose profile lives in Shard 1, the read path is:

GET /v1/patients/:id

    ├── 1. Local query (Shard 2): patients + patient_person_id

    ├── 2. Is patient_person_id in this shard's patient_persons table?
    │      YES → local join (fast, normal path)
    │      NO  → cross-shard fetch via home_shard_id from registry

    └── 3. Apply profile_shared gate as usual
           → profile_shared=false: return name only
           → profile_shared=true: return full profile

The cross-shard fetch is a single row by primary key — sub-millisecond on the remote shard. For frequently accessed profiles, cache in Redis with a short TTL (5 minutes). Invalidate on patient profile update via a pub/sub event from the home shard.

What does NOT cross shards

Clinical records never leave their shard. This is enforced architecturally, not just by policy:

  • Appointments created at Clinic B → stored in Shard 2, scoped to Clinic B's organization_id
  • Forms filled at Clinic B → stored in Shard 2
  • Files uploaded at Clinic B → stored in Shard 2's S3 prefix
  • Treatment plans from Clinic B → stored in Shard 2

Shard 1 has no mechanism to query Shard 2's clinical records, and vice versa. The only cross-shard data flow is the portable profile read described above.


Database Optimization Strategies

1. Audit Log Partitioning (Critical at Scale)

Problem: audit_log table grows to billions of rows. Queries slow down, backups take hours.

Solution: Monthly partitions with automated archival.

sql
-- Migration: Convert audit_log to partitioned table
CREATE TABLE audit_log_new (
    id              BIGSERIAL,
    organization_id BIGINT,
    user_id         BIGINT,
    action          TEXT,
    entity_type     TEXT,
    entity_id       BIGINT,
    changes         JSONB,
    ip_address      INET,
    user_agent      TEXT,
    request_path    TEXT,
    status_code     INT,
    created_at      TIMESTAMPTZ NOT NULL DEFAULT NOW(),

    -- Partitioning columns
    action_context  TEXT,
    break_glass_id  BIGINT,
    impersonation_id BIGINT,

    PRIMARY KEY (id, created_at)  -- Include partition key in PK
) PARTITION BY RANGE (created_at);

-- Create partitions for current + next 3 months
CREATE TABLE audit_log_2026_02 PARTITION OF audit_log_new
    FOR VALUES FROM ('2026-02-01') TO ('2026-03-01');

CREATE TABLE audit_log_2026_03 PARTITION OF audit_log_new
    FOR VALUES FROM ('2026-03-01') TO ('2026-04-01');

-- Automated partition management
CREATE OR REPLACE FUNCTION create_audit_log_partition()
RETURNS void AS $$
DECLARE
    v_next_month DATE := date_trunc('month', CURRENT_DATE + interval '1 month');
    v_partition_name TEXT := 'audit_log_' || to_char(v_next_month, 'YYYY_MM');
    v_start_date TEXT := v_next_month::text;
    v_end_date TEXT := (v_next_month + interval '1 month')::text;
BEGIN
    EXECUTE format(
        'CREATE TABLE IF NOT EXISTS %I PARTITION OF audit_log_new
         FOR VALUES FROM (%L) TO (%L)',
        v_partition_name, v_start_date, v_end_date
    );
END;
$$ LANGUAGE plpgsql;

-- Run monthly via cron
SELECT cron.schedule('create-audit-partition', '0 0 1 * *', 'SELECT create_audit_log_partition()');

2. Index Optimization

sql
-- Composite indexes for common query patterns
CREATE INDEX idx_audit_org_created ON audit_log_new (organization_id, created_at DESC);
CREATE INDEX idx_audit_entity ON audit_log_new (entity_type, entity_id, created_at DESC);
CREATE INDEX idx_audit_user ON audit_log_new (user_id, created_at DESC);

-- Partial indexes for break-glass and impersonation (rare events)
CREATE INDEX idx_audit_break_glass ON audit_log_new (break_glass_id)
    WHERE break_glass_id IS NOT NULL;

-- GIN index for JSONB changes column (if querying change contents)
CREATE INDEX idx_audit_changes ON audit_log_new USING GIN (changes);

3. Connection Pool Tuning

go
// internal/config/config.go

type PoolConfig struct {
    // Phase 1: Conservative
    MinConns int `default:"2"`
    MaxConns int `default:"20"`

    // Phase 2+: Aggressive (with read replicas)
    // MinConns int `default:"5"`
    // MaxConns int `default:"50"`

    MaxConnLifetime   time.Duration `default:"1h"`
    MaxConnIdleTime   time.Duration `default:"30m"`

    // New: Health check interval
    HealthCheckPeriod time.Duration `default:"1m"`
}

// internal/database/postgres.go
func NewPool(ctx context.Context, cfg *config.PoolConfig) (*pgxpool.Pool, error) {
    config, err := pgxpool.ParseConfig(cfg.DatabaseURL)
    if err != nil {
        return nil, err
    }

    config.MinConns = int32(cfg.MinConns)
    config.MaxConns = int32(cfg.MaxConns)
    config.MaxConnLifetime = cfg.MaxConnLifetime
    config.MaxConnIdleTime = cfg.MaxConnIdleTime
    config.HealthCheckPeriod = cfg.HealthCheckPeriod

    // Connection pool metrics (export to Telemetry)
    config.AfterConnect = func(ctx context.Context, conn *pgx.Conn) error {
        slog.Debug("connection acquired", "remote_addr", conn.PgConn().RemoteAddr())
        return nil
    }

    config.AfterRelease = func(conn *pgx.Conn) bool {
        // Return true to destroy connection, false to return to pool
        // Destroy if idle > MaxConnIdleTime
        return false
    }

    return pgxpool.NewWithConfig(ctx, config)
}

4. Query Performance Monitoring

go
// internal/middleware/query_logger.go

func QueryPerformanceMonitor(next http.Handler) http.Handler {
    return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
        start := time.Now()

        // Wrap connection to intercept queries
        conn := database.ConnFromContext(r.Context())
        if conn != nil {
            tracer := &queryTracer{requestID: middleware.RequestIDFromContext(r.Context())}
            conn.Config().Tracer = tracer
        }

        next.ServeHTTP(w, r)

        duration := time.Since(start)
        if duration > 1*time.Second {
            slog.Warn("slow request",
                "path", r.URL.Path,
                "method", r.Method,
                "duration_ms", duration.Milliseconds(),
                "request_id", middleware.RequestIDFromContext(r.Context()),
            )
        }
    })
}

type queryTracer struct {
    requestID string
}

func (qt *queryTracer) TraceQueryStart(ctx context.Context, conn *pgx.Conn, data pgx.TraceQueryStartData) context.Context {
    return context.WithValue(ctx, "query_start", time.Now())
}

func (qt *queryTracer) TraceQueryEnd(ctx context.Context, conn *pgx.Conn, data pgx.TraceQueryEndData) {
    start := ctx.Value("query_start").(time.Time)
    duration := time.Since(start)

    if duration > 500*time.Millisecond {
        slog.Warn("slow query",
            "request_id", qt.requestID,
            "duration_ms", duration.Milliseconds(),
            "sql", data.SQL[:min(200, len(data.SQL))],  // Truncate long queries
        )
    }
}

Monitoring & Observability

Critical Metrics to Track

MetricSourceAlert ThresholdPhase
Connection pool wait countpool.Stat().WaitCount> 0 for 1+ minAll
Active DB connectionspool.Stat().AcquiredConns()> 80% of maxAll
Query latency (p99)Query tracer> 500msAll
Request latency (p99)Middleware> 1sAll
Database sizepg_database_size()> 80% of allocatedAll
Replication lagpg_last_xact_replay_timestamp()> 5sPhase 2+
Error rate (5xx)HTTP middleware> 1%All
Shard imbalanceRouting DBMax shard load > 2× minPhase 3+

Observability Stack

Application Logs (slog JSON)

CloudWatch Logs

CloudWatch / Datadog / Grafana Loki

Dashboards + Alerts

Example Datadog Monitors

yaml
# datadog-monitors.yaml

- name: "PostgreSQL Connection Pool Exhaustion"
  type: metric alert
  query: "avg(last_5m):avg:postgres.pool.wait_count{env:prod} > 10"
  message: |
    Connection pool is experiencing waits. Immediate action required.

    Troubleshooting:
    1. Check active connections: SELECT count(*) FROM pg_stat_activity
    2. Check long-running queries: SELECT * FROM pg_stat_activity WHERE state != 'idle' AND query_start < now() - interval '30 seconds'
    3. Consider scaling DB replicas or increasing pool size

  escalation:
    - "@pagerduty-engineering"
    - "@slack-alerts"

- name: "Slow Query Detection"
  type: log alert
  query: 'logs("slow query").rollup("count").last("5m") > 100'
  message: |
    High volume of slow queries detected (> 500ms).

    Check query tracer logs for offending queries.
    May need to add indexes or optimize query patterns.

- name: "Database Size Growth"
  type: metric alert
  query: "avg(last_1h):avg:postgres.database_size{env:prod} > 800000000000"  # 800GB
  message: |
    Database approaching 1TB. Plan migration to larger instance or sharding.

Migration Checklists

Phase 1 → Phase 2 (Add Read Replicas)

Pre-migration:

  • [ ] Verify RDS primary instance health and backups
  • [ ] Estimate downtime window (typically minimal for adding replicas)
  • [ ] Schedule maintenance window (communicate to customers)

Migration:

  • [ ] Create read replicas: aws rds create-db-instance-read-replica
  • [ ] Wait for replication to catch up
  • [ ] Update AWS Secrets Manager with primary + replica URLs
  • [ ] Deploy the Core API + Telemetry with read replica support (git push origin main)
  • [ ] Verify queries routing to replicas: check RDS connection count
  • [ ] Monitor for 24 hours

Rollback plan:

  • [ ] If issues detected, revert environment variables to single primary URL
  • [ ] Re-deploy without read replica code

Post-migration:

  • [ ] Verify replication lag < 1s
  • [ ] Update disaster recovery runbook

Phase 2 → Phase 3 (Add Enterprise Tier)

Pre-migration:

  • [ ] Implement tenant routing middleware
  • [ ] Create routing database (separate PostgreSQL instance)
  • [ ] Test enterprise provisioning script (dry-run)
  • [ ] Identify first enterprise customer (largest org)

Migration:

  • [ ] Provision first enterprise project (manually)
  • [ ] Migrate customer data: pg_dump --schema=public --table=appointments --table=patients ... | psql
  • [ ] Update tenant_shards routing table
  • [ ] Deploy tenant routing middleware to production
  • [ ] Verify customer routed to enterprise shard
  • [ ] Monitor for 48 hours
  • [ ] Automate provisioning script
  • [ ] Migrate remaining enterprise customers (1 per week)

Post-migration:

  • [ ] Document enterprise provisioning runbook
  • [ ] Train support team on tier differences
  • [ ] Update billing system to track shard assignments

Cost Optimization Strategies

1. Reserved Instances (20-30% savings)

bash
# AWS RDS Reserved Instances (1-year or 3-year commitment)
# Example: db.r6g.large, 1-year, all upfront
# On-demand: $200/mo → Reserved: $140/mo (30% savings)

aws rds purchase-reserved-db-instances-offering \
  --reserved-db-instances-offering-id <offering-id> \
  --reserved-db-instance-id restartix-reserved-1 \
  --db-instance-count 3

2. Storage Tiering

bash
# Hot data (0-12 months): gp3 SSD (fast, expensive)
# Warm data (1-6 years): S3 JSONL archives (cheap)
# Cold data (6+ years): S3 Glacier (very cheap)

# S3 Lifecycle Policy
{
  "Rules": [
    {
      "Id": "Archive audit logs",
      "Prefix": "audit-archive/",
      "Status": "Enabled",
      "Transitions": [
        {
          "Days": 365,
          "StorageClass": "GLACIER"
        }
      ],
      "Expiration": {
        "Days": 2190  # 6 years
      }
    }
  ]
}

3. Autoscaling Core API Replicas

bash
# App Runner autoscaling config (set via AWS CLI or console)
aws apprunner update-service \
  --service-arn <service-arn> \
  --auto-scaling-configuration-arn <config-arn>

# Create autoscaling configuration
aws apprunner create-auto-scaling-configuration \
  --auto-scaling-configuration-name restartix-platform-scaling \
  --max-concurrency 100 \
  --min-size 2 \
  --max-size 10

4. Database Connection Pooler (External)

Not recommended for RLS-heavy apps, but worth mentioning:

  • PgBouncer in session mode (preserves RLS session variables)
  • Deploy as sidecar container or use RDS Proxy
  • Reduces actual PostgreSQL connections by 50-70%
  • Tradeoff: Adds 1-5ms latency per query
Core API (100 connections) → PgBouncer (50 connections) → PostgreSQL

Our verdict: Skip for now. RLS requires connection-scoped state, and PgBouncer session mode doesn't offer enough savings to justify the complexity.


Disaster Recovery

Backup Strategy

TierMethodFrequencyRetentionRTORPO
Phase 1 (RDS)RDS automated backupsContinuous WAL7 days PITR15 min5 min
Phase 2+ (RDS)RDS automated backupsContinuous WAL7 days PITR15 min5 min
All phasesManual pre-migration backupsBefore each deployPermanent (S3)2 hours0 (planned)

Recovery Procedures

Scenario 1: Accidental data deletion

sql
-- RDS Point-in-Time Recovery
aws rds restore-db-instance-to-point-in-time \
  --source-db-instance-identifier restartix-prod \
  --target-db-instance-identifier restartix-prod-recovered \
  --restore-time 2026-02-13T14:30:00Z

Scenario 2: Database corruption

bash
# Restore from latest snapshot
aws rds restore-db-instance-from-db-snapshot \
  --db-instance-identifier restartix-prod-restored \
  --db-snapshot-identifier rds:restartix-prod-2026-02-13-06-00

# Update DNS or connection string to point to restored instance

Scenario 3: Complete region failure

  • Cross-region RDS read replica (Phase 4)
  • Promote replica to primary
  • Update Cloudflare DNS
  • RTO: 30 minutes, RPO: 5 minutes

Summary: Scaling Decision Tree

START: How many organizations?

├─ 1-10 orgs
│  └─ Phase 1: Single App Runner instance ✅

├─ 10-50 orgs
│  ├─ Is DB size > 100GB? YES → Phase 2 (Read replicas)
│  ├─ Are connections maxing out? YES → Phase 2
│  └─ Otherwise → Stay in Phase 1, monitor closely

├─ 50-100 orgs
│  ├─ Any org > 50k patients? YES → Phase 3 (Hybrid model)
│  ├─ Enterprise SLA required? YES → Phase 3
│  └─ Otherwise → Phase 2 with larger RDS instance

└─ 100+ orgs
   └─ Phase 4: Multi-shard + geographic distribution

Special cases:
├─ GDPR data residency required? → Phase 4 (regional shards)
├─ Customer willing to pay $1000+/mo? → Enterprise tier (any phase)
└─ Regulatory requirement for isolation? → Enterprise tier

Next Steps

Immediate (Phase 1 - this week)

  1. ✅ Implement connection pool monitoring (see next document)
  2. ✅ Add query timeout middleware
  3. ✅ Set up Datadog/CloudWatch alerts for connection pool metrics
  4. ✅ Document current connection pool configuration

Short-term (Phase 1 - next 3 months)

  1. Monitor metrics weekly
  2. Tune DB_POOL_MAX based on actual usage
  3. Plan Phase 2 migration when hitting trigger thresholds
  4. Set up automated audit log partitioning

Long-term (Phase 2+)

  1. Scale RDS when > 10 orgs or > 100GB
  2. Implement tenant routing for enterprise tier
  3. Build automated provisioning scripts
  4. Plan geographic expansion (EU shard)