Session Management & RLS Setup
Stale post Layer-1.24 Go cleanup — pending holistic rewrite. Type/middleware/column names below are pre-cleanup:
PrincipalContextis nowprincipal.Subject,ClerkAuthis nowmiddleware.Authenticate(verifier, resolver, loader),humanRepo.LoadPermissionsis nowprincipal.SubjectLoader.LoadPermissions,humans.clerk_user_idis nowhumans.provider_subject_id. The flow shape (verify → resolve → load Subject → attach RLS tx) is unchanged; only names and packages moved. Verify against the code before quoting. Authoritative pointer: decisions.md → Auth chain shape.
Describes the request-scoped session model that's actually shipped today (Layer 0 + Layer 1.1). Identifiers are UUIDv7 throughout — no
int64IDs anywhere in this path. Authorization is per-org permissions, not role-string compares.
Overview
Every authenticated request establishes a session context by:
- Verifying the Clerk session token (
ClerkAuthmiddleware). - Loading the internal
humansrow (joined with the matchingprincipalsrow) from the database via the AdminPool (bypasses RLS) and JIT-provisioning both rows in a single transaction on first sign-in. - Loading the principal's memberships and platform-role grants —
IsSuperadmin,Memberships[]— onto aPrincipalContextstored in the request context. - Resolving the current org for this request (
OrganizationContextmiddleware), validating membership, then loading the per-org permission set fromrole_permissionsfor the role the principal holds in that org. - Routing to the correct connection pool: AdminPool (owner role, bypasses RLS) for superadmins; AppPool (restricted role, RLS enforced) for everyone else.
- Acquiring a connection, opening an explicit
pgx.Tx, and binding transaction-scoped Postgres session variables to it via theset_app_*_contextSECURITY DEFINER wrappers so RLS policies can scope queries to the current principal / actor type / org / role.
The same pgx.Tx is reused by every repository call in the request — repositories pull it out via database.TxFromContext(ctx). The transaction commits at the end of the request (or rolls back on a 5xx / panic), at which point Postgres wipes the session variables automatically — there is no clear-state function callable from AppPool, so a returning pooled connection cannot leak authorization state to the next request.
The result: tenant isolation is enforced by Postgres RLS, not application filtering, and authorization decisions are made against a permission code (organizations.update, audit_log.view_org, …) — never a hardcoded role-string comparison.
Session Variables
Four Postgres session variables drive RLS. They're typed via the current_app_*() helpers in migrations/core/000001_init.up.sql.
| Variable | Type (after NULLIF/cast) | Set by | Read by |
|---|---|---|---|
app.current_principal_id | UUID | OrganizationContext | current_app_principal_id() — used in self-scoped RLS |
app.current_actor_type | TEXT | OrganizationContext | current_app_principal_type() — used by policies that branch on 'human' | 'agent' | 'service_account' | 'system' |
app.current_org_id | UUID | OrganizationContext | current_app_org_id() — used by every tenant-scoped policy |
app.current_role | TEXT | OrganizationContext | current_app_role() — kept for diagnostic logging; no longer read by any RLS policy (see "Permission-based, never role-string" below) |
They're set inside the request's transaction with set_config(..., true) (the third argument) so they are transaction-scoped — Postgres clears them at COMMIT/ROLLBACK, so the next caller using this pooled connection cannot inherit a previous request's identity. Combined with the lock-once gate inside each set_app_*_context wrapper (refuses to re-bind principal_id to a different value within the same transaction), this is the load-bearing defense against an SQLi attacker on AppPool spoofing into another tenant's identity.
uuid.Nil is rendered as the empty string when set; the helpers use NULLIF so the SQL side sees NULL. Public /v1/public/* endpoints (e.g. GET /v1/public/organizations/resolve) run on AdminPool — owner role, RLS bypassed — with the handler doing a single-row equality match on slug or verified domain. AppPool with no session vars intentionally sees zero rows on every authenticated table; there are no public RLS carve-outs anymore.
Middleware Stack
The pipeline is wired in services/api/internal/core/server/routes.go for the /v1 group: ClerkAuth → OrganizationContext → ActivityTracker → audit.Middleware.
1. ClerkAuth (internal/core/middleware/auth.go)
Verifies the bearer JWT, JIT-provisions the principals + humans rows if this is the first sign-in, and assembles a PrincipalContext.
// internal/core/auth/context.go (current shape, simplified)
type Membership struct {
OrganizationID uuid.UUID
RoleID uuid.UUID
RoleCode string
}
type PrincipalContext struct {
PrincipalID uuid.UUID // FK to principals.id
ActorType string // 'human' | 'agent' | 'service_account' | 'system'
ClerkUserID string // populated when ActorType = 'human'
Email string
IsSuperadmin bool // derived from platform_roles (humans only)
PlatformRoles []string // ["superadmin"] when present
Memberships []Membership // every (org, role) pair
CurrentOrganizationID uuid.UUID // populated by OrganizationContext
CurrentRoleCode string // role held in CurrentOrganizationID
Permissions map[string]struct{} // codes for that role; nil for superadmin
}
func (p *PrincipalContext) HasPermission(code string) bool {
if p.IsSuperadmin {
return true
}
_, ok := p.Permissions[code]
return ok
}What it does end-to-end:
- Extract the
Authorization: Bearer <jwt>header. Missing / malformed → 401. - Verify the token via the Clerk SDK. Bad signature, expired, blocked session → 401.
- Look up
humans.clerk_user_id. If absent, runprovisionHuman(insert aprincipalsrow + matchinghumansrow in one transaction). Patient onboarding (which provisions thepatient_profiles+patientsrows for the org indicated byX-Organization-ID) runs separately via the/v1/portal/onboardflow — see patients/onboarding.md. Patients are not memberships post-1.26. - If
humans.blocked = TRUE, return 403 — and audit-log the failed request. - Load the principal's
platform_rolesandorganization_membershipsrows. - Store the assembled
PrincipalContextonr.Context()viaauth.ContextWithPrincipal. Retrieve it downstream withauth.PrincipalFromContext.
The lookup uses the AdminPool (bypasses RLS) because the humans / principals rows may not yet have an organization scope when the request arrives. Permissions are not loaded here — they depend on which org the request is scoped to, and that's resolved in the next middleware.
2. OrganizationContext (internal/core/middleware/organization.go)
Resolves the current org, validates membership, loads permissions for that org, picks the pool, acquires a connection, and sets the RLS session variables.
// services/api/internal/core/middleware/organization.go (signature)
func OrganizationContext(
adminPool, appPool *pgxpool.Pool,
humanRepo *human.Repository, // needed for LoadPermissions(principalID, orgID)
) func(http.Handler) http.HandlerThe third argument matters: permissions are loaded per request, scoped to the current org, because the same principal can hold different roles in different orgs (admin at clinic A, customer_support at clinic B). They cannot be cached on PrincipalContext ahead of time without binding the cache to (principal, org), which we don't bother with — Postgres serves the lookup quickly enough.
Order of operations for each request:
- Pull the
PrincipalContextfrom the request context. Missing → 401 (ClerkAuthshould have run first). - Resolve the org ID, in this order:
X-Organization-IDheader (set by the frontendproxy.tsfrom the resolved subdomain or custom domain) →humans.current_organization_idif the principal is still a member of it → first membership →uuid.Nil(safe default). - Validate membership. A non-superadmin who passes a real org ID they aren't a member of gets 403.
uuid.Nilis a valid pass-through (used by patients on first sign-up before auto-link runs, or by superadmins operating without org context). - Load permissions for
(principalID, orgID)viahumanRepo.LoadPermissions, which joinsorganization_memberships → roles → role_permissions → permissions. The result populatesPrincipalContext.CurrentRoleCodeandPrincipalContext.Permissions. Skipped for superadmins (HasPermissionshort-circuits totruefor them). - Pick the pool: AdminPool for
IsSuperadmin, AppPool otherwise. - Acquire a connection,
BeginTx, defer the cleanup that decides commit-vs-rollback from the response status (commit on< 500, rollback on5xxor panic). - Set the RLS session variables by calling one of the SECURITY DEFINER wrappers (
set_app_staff_context,set_app_patient_context, orset_app_principal) inside the transaction. The wrapper validates the (principal, org, role) tuple against realorganization_memberships/patientsrows, normalizesactor_typefrom the principal record, refuses to attach a session for a blocked / soft-deleted / revoked principal, and binds the four GUCs viaset_config(..., true). - Stash the transaction on the context via
database.ContextWithTx. Repositories use this samepgx.Txfor the rest of the request viadatabase.TxFromContext(ctx). - Tell the audit middleware which org this request is scoped to (
audit.SetCurrentOrg(ctx, orgID)) so 403/500 failure-path rows downstream are attributed correctly.
3. ActivityTracker (internal/core/middleware/activity.go)
Bumps organization_memberships.last_used_at for the (principal, org) of the current request. Throttled by an in-process 60-second cache so most requests are zero-DB-cost. See reference/activity-tracking.md.
4. audit.Middleware (internal/core/audit/middleware.go)
Captures request metadata, lets the handler run, then writes one audit row for failure paths it owns: every 5xx, every 403, and 401s only when the request carried a Bearer token (so unauthenticated polling doesn't pollute the log). Successful mutations are audited from inside the handler — the recorder writes a single row per logical event, with the diff in changes and sensitive keys redacted (P11).
Repository Pattern
Repositories don't reach for the pool — they pull the request-scoped pgx.Tx from the context so RLS session variables are in scope:
// internal/core/database/postgres.go
func TxFromContext(ctx context.Context) pgx.Tx { /* ... */ }// Pattern used by every repo (e.g. services/api/internal/core/domain/organization/repository.go)
func (r *Repository) FindByID(ctx context.Context, id uuid.UUID) (*Organization, error) {
tx := database.TxFromContext(ctx)
if tx == nil {
// Background path — no request context. Caller is responsible for
// beginning a transaction and setting session variables.
return nil, errors.New("no transaction in context")
}
var org Organization
err := tx.QueryRow(ctx, `
SELECT id, name, slug, /* ... */
FROM organizations
WHERE id = $1
`, id).Scan(&org.ID, &org.Name, &org.Slug /* ... */)
return &org, err
}RLS does the org filtering automatically — even if a future bug ever hands a different org's ID to the query, the policy id = current_app_org_id() means a non-superadmin transaction sees zero rows back.
Atomicity is a free side benefit: every mutation inside one request runs in the same transaction, so a handler that does INSERT A → UPDATE B → INSERT C and errors on the third leaves no partial state behind. The audit log is written through Recorder on AdminPool and is therefore independent of this transaction — failed requests still leave their 401/403/500 audit row even when the request transaction rolls back.
RLS Helper Functions
Defined in migrations/core/000001_init.up.sql. They return UUID / TEXT typed values, with NULLIF so uuid.Nil reaches Postgres as SQL NULL:
CREATE OR REPLACE FUNCTION current_app_principal_id() RETURNS UUID AS $$
SELECT NULLIF(current_setting('app.current_principal_id', true), '')::UUID;
$$ LANGUAGE SQL STABLE;
CREATE OR REPLACE FUNCTION current_app_principal_type() RETURNS TEXT AS $$
SELECT NULLIF(current_setting('app.current_actor_type', true), '');
$$ LANGUAGE SQL STABLE;
CREATE OR REPLACE FUNCTION current_app_org_id() RETURNS UUID AS $$
SELECT NULLIF(current_setting('app.current_org_id', true), '')::UUID;
$$ LANGUAGE SQL STABLE;
CREATE OR REPLACE FUNCTION current_app_role() RETURNS TEXT AS $$
SELECT NULLIF(current_setting('app.current_role', true), '');
$$ LANGUAGE SQL STABLE;current_app_role() is no longer read by any RLS policy — the previous bare-staff-signal current_app_role() <> '' qualifiers were replaced by permission checks (current_app_has_permission('organizations', 'view_directory') for staff-directory tables, current_app_has_permission('patients', 'view') for the patient roster) so a custom no-permission per-org role doesn't over-grant visibility. app.current_role is still set by the wrappers for diagnostic logging. Tenant policies branch on current_app_has_permission(resource, action) (defined in 000002_tenancy_rbac.up.sql), which joins through organization_memberships → role_permissions → permissions to answer "does the current (principal, org) hold this capability?".
There is intentionally no is_superadmin() helper. Superadmins bypass RLS via the AdminPool's owner role, not via a policy clause.
Permission Loading
humanRepo.LoadPermissions(ctx, principalID, orgID) runs:
SELECT p.code
FROM organization_memberships po
JOIN role_permissions rp ON rp.role_id = po.role_id
JOIN permissions p ON p.code = rp.permission_code
WHERE po.principal_id = $1
AND po.organization_id = $2;The result becomes PrincipalContext.Permissions. RequirePermission(code) middleware in internal/core/middleware/rbac.go then checks membership in this set and returns 403 if absent. RequireSuperadmin() checks the platform-level IsSuperadmin flag.
Don't compare PrincipalContext.CurrentRoleCode against a literal — it's exposed for diagnostics, not authorization. Every authorization decision goes through HasPermission / RequirePermission. See reference/rbac-permissions.md for the catalog.
Connection Pool Management
Each request holds its connection (and its single pgx.Tx) for its full duration so the RLS GUCs stay in scope. Pool sizing must therefore match peak concurrency:
DB_POOL_MIN=5
DB_POOL_MAX=25 # raise if you see waits at the poolCritical metric: pgxpool.Stats().AcquireWaitCount. If it ever leaves zero under sustained load, raise DB_POOL_MAX.
Scaling note: because the session GUCs are bound to the request's transaction (not the pooled connection), pgbouncer in transaction-pooling mode is a drop-in fit when concurrent connections push past ~1k. Each client request maps to one pgbouncer transaction lease — the unit our authorization context lives in. See Scaling Architecture → PgBouncer.
Security Guarantees
- Tenant isolation is enforced by Postgres, not by handler code. RLS rewrites the query before execution; even if a handler forgets a
WHERE org_id = ?, the policy addsorganization_id = current_app_org_id(). - Superadmin bypass is at the connection layer, not in policies. A superadmin request runs on the owner pool, where Postgres does not enforce RLS at all. No policy uses an
is_superadmin()clause. - Session variables are transaction-scoped. Postgres clears them at COMMIT/ROLLBACK, so a connection returned to the pool cannot leak state to the next caller. There is no
clear_app_context()callable from AppPool — an SQLi attacker cannot reset the lock-once gate to spoof a different principal because the function doesn't exist; within the same transaction, the lock-once check on the wrappers blocks rebind. - Public endpoints run on AdminPool, not AppPool. The hostname-resolve endpoint (
GET /v1/public/organizations/resolve) runs as the owner role with a single-row equality match on slug or verified domain — RLS is bypassed for that one path. AppPool with no session variables intentionally sees zero rows on every authenticated table. - Authorization is permission-based, not role-string-based. Custom per-org roles can hold a subset of permissions and the same code paths gate them — no application changes needed.
- Actor-type-aware policies can read
current_app_principal_type()when a policy needs to differentiate humans from agents / service accounts (e.g., AI-driven writes that should record provenance).
Fail Modes
| Failure | Status | Where it's detected |
|---|---|---|
Missing Authorization header | 401 | ClerkAuth |
| Invalid / expired Clerk JWT | 401 | ClerkAuth |
humans.blocked = TRUE | 403 (audited via audit.Middleware) | ClerkAuth |
X-Organization-ID for an org the principal is not a member of | 403 | OrganizationContext |
humanRepo.LoadPermissions fails | 500 (generic envelope, original error logged) | OrganizationContext |
| Pool acquire fails | 500 | OrganizationContext |
set_config(...) fails | 500 | OrganizationContext |
| Caller lacks the gated permission | 403 | RequirePermission |
Testing RLS
The Layer 1.2 harness at services/api/internal/test/rlstest/ runs against a real Postgres in testcontainers, seeds two orgs and three humans (admin of A, patient of B, superadmin), and exercises the policies directly. Run it with:
make test-integrationManual psql verification still works — wrap the work in an explicit transaction and call the production wrapper, exactly like middleware does:
-- Run as the restricted role to actually observe RLS.
SET ROLE restartix_app;
BEGIN;
SELECT set_app_staff_context('<principal-uuid>', '<org-uuid>', 'admin');
SELECT count(*) FROM organizations; -- 1 (the current org only)
SELECT count(*) FROM organizations WHERE id <> '<org-uuid>'; -- 0
COMMIT; -- Postgres wipes the GUCs here; a subsequent BEGIN starts clean.
RESET ROLE;Calling set_config directly at the psql prompt also works for ad-hoc probing, but the wrapper exercises the real validation path (membership existence, principal liveness, lock-once) and is what production hits — preferring it keeps the manual probe and the production codepath in sync.
Related Documentation
- api.md —
/v1/me,/v1/me/switch-organization - clerk-integration.md — Clerk token verification + JIT provisioning
- reference/rbac-permissions.md — Permission catalog and seeded grants
- reference/rls-policies.md — System-wide RLS reference
- architecture/data-model.md — Table shapes for
principals,humans,organization_memberships,roles,role_permissions,permissions,platform_roles