Skip to content

Foundation

The actor-identity stack, cross-cutting runtime, capability + integration architecture, and admin surfaces — across all three audiences (superadmin, clinic staff, patient). Built once, before any feature lands. See the implementation plan index for the framing.

Acceptance. A superadmin manages orgs and platform-level templates (privacy notices, plans, entitlement flags) from Console; a clinic admin manages their org's settings, billing, members, roles, domains, tiers, privacy notice, integrations, and audit log from Clinic; a patient signs up, accepts the platform + clinic consents, onboards, manages their own profile, subscription, consent toggles, and data export from Portal. All three flows pass on AWS staging with real Clerk, real KMS, real S3.

Sub-phases. 1A (cross-cutting runtime), 1B (identity & tenancy), 1C (capabilities, integrations & metering — new), 1D (admin surfaces), 1E (foundation gate). Taxonomy used here is canonical per glossary.md.

Discipline. Foundation gaps that surface after a sub-phase ships live as new sub-sections under that sub-phase. Layer 2 (Features) does not start until 1E closes.


Phase 0 -- Boot (DONE)

Project scaffolding, DB foundation, three Next.js apps with Clerk auth, multi-tenant orgs, per-org RBAC, RLS, custom domain routing, role-gated dashboards. Everything that shipped in commit 3ade31c. Detail in the git history and in the architecture docs.

Status: shipped. Do not re-implement.


1A. Cross-Cutting Runtime

Every layer above uses these. Built once, hardened in 1E against staging.

1A.1 Audit Logging Infrastructure

Status: shipped. Centralised audit recorder writes one row per mutation with redaction; failed-request logging covers 401/403/5xx; append-only at the DB layer. Detail: patterns.md P10/P11, internal/core/audit/.

1A.2 RLS Integration Test Harness

Status: shipped. testcontainers-based harness exercises the full identity stack under the renamed RLS helpers. make test-integration runs 33+ test cases against real Postgres in ~2s warm. Detail: internal/test/rlstest/.

1A.3 Encryption Helper

Status: shipped. AES-256-GCM helper with versioned ciphertext, multi-version dual-decrypt rotation, in-memory keyring loaded from ENCRYPTION_KEYS (Phase 1 production path: keys live in a KMS-envelope-protected Secrets Manager secret). The kmsKeyring stub is reserved for Phase 2 (direct per-data-key KMS calls + BYOK) — see aws-infrastructure.md → Direct-KMS keyring + BYOK (Phase 2+). Detail: reference/encryption.md, internal/core/crypto/.

  • [ ] Customer-managed CMK provisioned + restartix/{env}/encryption SM secret created under that envelope (closes in 1E.3)

1A.4 Soft-Delete Pattern

Status: shipped. deleted_at TIMESTAMPTZ NULL convention, partial-index pattern, repo helpers, RLS exclusion of deleted rows unless caller has data.view_deleted, GDPR anonymisation primitive. Detail: patterns.md P13, internal/shared/softdelete/.

1A.5 PII Redaction in Logs

Status: shipped. slog redaction of sensitive keys (password, secret, token, etc.), audit JSONB walker shares the same predicate, every log call site audited for raw PII, telemetry pseudonymisation helper ready for F10. Detail: patterns.md P11, internal/shared/redact/.

1A.6 Error-Response Surface Audit

Status: shipped. *AppError envelope everywhere, recovery middleware → generic 500 with request_id, validation 422 with field-level reasons, no leaky DB internals. Detail: reference/error-envelope.md.

1A.7 API Contract Conventions

Status: shipped. Pagination + sort + filter conventions, idempotency-key middleware, OpenAPI spec-first (oapi-codegen Go + openapi-typescript frontends) with three-way drift test, picker + date-range URL conventions. Detail: reference/api-conventions.md, apps/docs/openapi.yaml.

1A.8 File Storage (S3)

Status: shipped (bucket provisioning deferred to 1E). Org-scoped key prefixes, signed URLs, MIME validation, file-size caps per surface, bucket-policy authored, LocalStack integration test for cross-org isolation. Detail: reference/file-storage.md, internal/integration/s3/.

  • [ ] AWS S3 bucket provisioned in staging (closed by 1E)

1A.9 Internal Event Bus

Status: shipped. In-process pub/sub with envelope, panic recovery, backpressure policy, schedule-stub interface, code → catalog drift check. Org lifecycle publishers wired. Detail: patterns.md P28, events.md, internal/core/events/.

1A.10 Translation Infrastructure (UI)

Status: shipped. next-intl wired in all three apps, en + ro seeded, org language_code drives default locale, convention documented. Detail: reference/i18n.md.

1A.11 Activity Tracking Columns

Status: shipped. Middleware bumps organization_memberships.last_used_at and patients.last_used_at with throttled in-process cache. Convention: best-effort, ~minutes precision, never a substitute for audit. Detail: reference/activity-tracking.md.

1A.12 Reserved Column Inventory

Status: shipped. audit_log reserved columns verified, organization_memberships.invited_at / invited_by / accepted_at reserved for 1B's invite flow. Detail: architecture/reserved-columns.md.

1A.13 Sensitive-Endpoint Rate Limiting + SOUP List

Status: shipped. Redis-based rate limiter with per-IP and per-principal extractors, auth_verify and public_resolve policies wired, SOUP inventory for backend + frontend deps with AI/ML model schema, cmd/check-soup enforces append-on-add at PR time. Detail: reference/soup.md, internal/core/ratelimit/.

1A.14 Column-Level Data Classification

Status: shipped. Every column carries class + egress allow-list in data-classification.md; cmd/check-classification fails the build when a column lacks a registry entry. Default is block. Detail: patterns.md P39.

1A.15 Audit Log Partitioning

Status: shipped (staging cron wiring deferred to 1E). audit_log and audit_ai_provenance are range-partitioned monthly on created_at / audit_log_created_at. PK on audit_log is (id, created_at); the AI-provenance FK is composite (audit_log_id, audit_log_created_at) → audit_log(id, created_at) so both tables hand off the same monthly window together when archived. No DEFAULT partition — missed rollover is loud, not silent. The migration seeds only the current month so the rollover cron is exercised in real environments rather than masked by a long pre-seeded runway; audit.EnsurePartitions + cmd/audit-partition-roll (default -ahead=3) maintain the forward window. Detail: internal/core/audit/partitions.go, cmd/audit-partition-roll/main.go.

Why a foundation item. Audit_log carries the longest retention on the platform (≥ 6 years), grows monotonically with every mutation × every tenant, and the hot/warm/cold tier hand-off in CLAUDE.md is exactly what partitioning is built for. Retrofitting a partitioned-from-unpartitioned table after launch is a multi-day operation on a billion-row append-only table with no allowable write gap (audit_log gaps are themselves a compliance finding). Foundation discipline says fix it once, in pre-prod, where the cost is a 60-line migration edit.

  • [ ] Wire audit-partition-roll into the staging scheduler (k8s CronJob or GitHub Actions, monthly cadence; default -ahead=3). Closed by 1E.
  • [ ] Add a staging alert when the newest existing partition is less than 1 month ahead — early warning that the rollover stopped firing.

1A.16 Postgres Extension Preinstall

Status: shipped (RDS parameter group deferred to 1E). Enables unaccent + pg_trgm for diacritic-folded fuzzy search (Romanian picker UIs need unaccent("Stefan") = unaccent("Ștefan")), vector for AI-feature embedding columns, and pg_stat_statements for top-N slow-query observability. Extensions live in 000001_init.up.sql; local docker-compose.yml switched to pgvector/pgvector:pg17 and configured shared_preload_libraries=pg_stat_statements.

Why a foundation item. Each of these costs more to retrofit than to enable preemptively. unaccent + pg_trgm need to be present before the first picker migration adds a GIN trigram index; vector needs the right Postgres image at the infrastructure layer, not a per-feature decision; pg_stat_statements needs shared_preload_libraries configured at server start, which means a Postgres restart in production. Doing all three in pre-prod is a 6-line migration edit; doing them piecemeal post-launch is three separate operational coordinations.

Conventions for feature migrations:

  • Picker-queryable text columns: CREATE INDEX ... ON tbl USING GIN (col gin_trgm_ops); and query with unaccent(col) ILIKE unaccent('%' || $1 || '%'). The AsyncMultiSelectFilter typeahead path expects this shape.

  • Embedding columns: embedding vector(N) with N matching the model's output dimensionality. Add an HNSW or IVFFlat index when the column is queried at scale; small tables can scan.

  • Slow-query inspection (locally): SELECT query, calls, mean_exec_time FROM pg_stat_statements ORDER BY mean_exec_time DESC LIMIT 20;.

  • [ ] AWS RDS parameter group sets shared_preload_libraries=pg_stat_statements. Closed by 1E.

1A.17 Frontend Performance Foundation

Status: shipped. Five composable patterns + one security middleware, all live-verified end-to-end against a production Console build. The bundle's load-bearing pieces:

  • P44 — Connection pooling via pgbouncer (patterns.md). Local docker-compose service mirrors the AWS ECS Fargate setup; same pgbouncer.ini. DATABASE_URL / DATABASE_APP_URL route through :6432; migrations bypass via DATABASE_DIRECT_URL because golang-migrate uses session-scoped pg_advisory_lock. min_pool_size=5 keeps backends warm, eliminating the 1.3s cold-start spike we observed live.
  • P43 — Tuned undici dispatcher (patterns.md). Each Next.js app's instrumentation.ts installs a global Agent with 30s keep-alive + 64-conn pool per origin. Default 4s keep-alive cycles TCP under bursty traffic; this kills the TIME_WAIT pile-up at fleet scale.
  • P42 — Server-side response caching with scope-keyed tags (patterns.md). Tagged GETs in packages/api-client go through unstable_cache from next/cache; server actions invalidate via updateTag() (Next.js 16+). Tag taxonomy in packages/api-client/src/cache-tags.ts. Live-discovery: Next.js's built-in fetch tags hash Authorization into the cache key, so rotating Clerk JWTs make every request a miss — unstable_cache is the only mechanism that works with auth-walled APIs.
  • P45 — Redis-backed query cache (cache-aside in repo layer) (patterns.md). services/api/internal/core/cache/ provides Aside + Invalidate + key builders mirroring the P42 namespace. Two layers compose: Next.js's unstable_cache is per-process (each Next.js instance has its own); Redis is shared across the Core API fleet. At 10k concurrent the layering goes 0 / 1 / 1+ Postgres queries (Next.js hit / Redis hit / Redis miss).
  • URL ≡ scope guard (url_org_scope.go). RequireURLOrgMatchesScope("id") mounted on /v1/organizations/{id} route group. Closes a latent gap surfaced by the cache work: RLS protects the FIRST request from a URL/header mismatch, but caching propagates the response — turning a one-time silent 404 into a recurring data leak. Apply preemptively on every per-org route group whether or not the endpoint caches today; cost is one UUID parse, benefit is that adding caching later is mechanical.
  • P46 — Portal hybrid architecture (patterns.md). Decision matrix for server-render vs client-side data per type. Documents-only at this phase; SWR install + first wired client-cache example land with the first Portal F-feature that needs them.

Wired examples (the references future domain work copies from):

  • Canonical: getOrganization(id) (P42 + P45) + updateOrganizationAction (updateTag invalidation). Org summary read on every Console org-detail render and on future Clinic / Portal "my clinic" pages.
  • Hot proxy path: organization.Service.ResolveBySlug / ResolveByDomain (P45). Hit by every Next.js app's proxy.ts cold-load — the original hand-rolled cache, refactored onto the helper.
  • Portal-critical: consents.Service.ListPurposesWithLatestForOrg (P45) + listConsentPurposes (P42). Read by every patient on every consent gate check.
  • Platform-scope reference: listLegalDocumentTemplates + publishLegalDocumentTemplateAction (P42). Console superadmin scope; kept as the simplest platform-scope example.

Why a foundation item. All five patterns + the URL≡scope guard sit in the load-bearing layer every feature rests on. Retrofitting them after Portal F-features ship means revisiting every endpoint that was authored without the cache contract in mind, plus a security audit pass to confirm no per-org route mounts caching without the URL guard. Each individual pattern is small; the cost of bundling them now is one PR; the cost of doing them piecemeal post-launch is one PR per feature plus the integration audit between them. Same calculus as 1A.15 audit partitioning and 1A.16 postgres extensions.

The URL≡scope guard is also a security improvement independent of caching — RLS-only protection at the DB layer left the API tolerating mismatched header/URL combinations that future agents could trip into a real leak. Foundation discipline catches these now.

1A.18 Notifications & Email Transport

Durable email + multi-channel notification primitive. Every foundation consumer that needs to reach a human outside the request path goes through this — no ad-hoc ses.SendEmail calls. The calling contract, outbox shape, recipient-preference model, audit/classification integration, and timezone-aware scheduling are all expensive to retrofit once 50+ features write to the primitive.

Status: shipped except the SES infra closes (production identity verification + suppression list, both gated by 1E AWS staging) and the recipient-timezone profile UI (closes alongside 1D.3 patient self-service). Schema, dispatcher, EmailChannel + FakeChannel, foundation templates (MemberInvite + BreakGlassOpened × email × en+ro), and the integration acceptance test are in place. Wire-in calls into 1B.11 / 1B.12 light up when those features ship — the primitive is ready. F-tier features (1D.3 export-ready / account-deletion-confirmed in F11; F2/F5 appointment reminders; F8 automations engine) layer additional categories onto the same primitive without schema change.

Deltas from the original spec, locked at implementation:

  • Audit row removed from notify.Send. The calling handler audits the originating event (e.g., 1B.12 invite endpoint audits organization_membership CREATE); the notification row IS the body-of-record (RLS-scoped to recipient). The original spec's redundant notification.enqueue audit row would land on the request's tx while the notification row lands on an admin-pool tx — a rollback inconsistency the simpler design avoids. Forensic story is unchanged: "what did we send?" = SELECT on notifications.
  • Per-recipient rate limit deferred. The safety-net cap was design intent for misbehaving F-tier producers, not foundation-tier consumers (which are transactional and bypass it anyway). Lands when the first operational producer (appointment reminders) ships and the rate-limit math is concrete. Code path is not stubbed — adding to Send is a one-line guard against notification_preferences.global_cap_* columns when those land.
  • organization_settings.default_timezone (TEXT NULL, IANA) added in 000003 to close the resolution chain (humans.timezone → org default → 'Europe/Bucharest'). Same column anchors P23's scheduling-timezone chain (location → specialist → here → platform default), so it serves both 1A.18 (recipient-where-they-read) and 1B.14 (slot-where-it-happens).
  • CategoryDefinition.BillingScope added 2026-05-12 to distinguish platform→user from tenant→customer mail. All four foundation categories (owner_welcome, break_glass_opened, member_invite, webhook_subscription_paused) are RestartiX talking to a human about their RestartiX account — they send from a platform-owned SES identity and are not metered against any org's plan. Tenant-scope categories (F-tier; none today) route through WrapMeteredProvider and resolve to the clinic's per-org SES identity when one is configured. The split was originally implicit — the dispatcher wrapped every email with MeteredChannel, which crashed every platform-scope send with ErrUnauthenticated because the metering soft-quota gate required a request-scoped principal.Subject that the background dispatcher does not carry. Discovered live when owner-welcome stopped sending. Closed by notify.email.ScopedChannel (routes by BillingScope).
  • Quota soft-gate works on system paths too (2026-05-12). The enforceQuota middleware used to require a principal.Subject and bailed with ErrUnauthenticated on the notify dispatcher's system-driven path. Closed by capabilities.SetLimitLookup + metering.Repository.LookupLimit — when no Subject is in context but a metering org IS attached via ContextWithMeteringOrg, the gate reads the same organization_subscription_limits + usage_quotas source LoadLimits reads for request paths. Request and system paths now see identical soft-gate behavior; the inner atomic Reserve in meterAroundCall remains the load-bearing cap enforcement either way.
  • notify.Send rejects principal-id-only recipients on email-routed categories (2026-05-12). The email channel does not resolve principal → humans.email by design (callers resolve at the call site, the rendered email address is the body-of-record). Previously this convention was unenforced — a producer that called notify.Send(notify.To(principalID), CategoryOwnerWelcome, ...) would queue successfully and dead-letter at dispatch time with recipient_email NULL. The guard in Service.Send now fails fast at queue time with a message pointing the producer at the right helper. categoryNeedsAddress covers email / sms / whatsapp; in-app / push channels are address-free and still accept principal-id recipients.
  • Address-recipient producers carry locale + timezone responsibility too (2026-05-12). The locale + timezone resolution chain (humans.preferred_language → org default → "en", humans.timezone → organization_settings.default_timezone → "Europe/Bucharest") reads the humans row ONLY for principal-id recipients. Address-based recipients (notify.ToAddress) skip the humans read and fall straight through to org default → platform default, losing any humans.preferred_language / humans.timezone preference. Producers that have a principal_id at the call site (the common shape — break-glass, owner-welcome, webhook auto-pause) must look up the recipient's locale + timezone alongside their email and pass all three through (notify.ToAddress(...) + notify.WithLocale(...) + notify.WithTimezone(...)). Foundation acceptance test in setup_clinic_test.go demonstrates the canonical pattern. Today's foundation producers (break-glass, owner-welcome, webhook auto-pause) pass email only and fall through to org defaults — acceptable for now because Romanian orgs default to ro / Europe/Bucharest anyway, but the convention is documented so F-tier producers don't repeat the partial fix. A future notify.ResolveForPrincipal(principalID) (email, locale, timezone) helper would collapse the three lookups; not built today (premature against one consumer).

Gaps deferred to first consumer (no foundation category exercises them):

  • Operational-category preferences read path. Spec says the dispatcher should consult notification_preferences for explicit opt-outs before fan-out. Foundation categories are transactional and skip preferences; the actual lookup code does not yet exist. CategoryDefinition.Classification carries the discriminator so the branch slots in cleanly when the first operational category (appointment reminder) ships. Writing the lookup now would be speculation against an unknown shape (does opt-out apply pre-render or per-channel? does the dispatcher fan back in if all channels are opt-out?); deferring keeps that decision concrete-driven.
  • Marketing-category consents read path. Same shape — spec says the dispatcher should consult the consents ledger for marketing_{channel} purpose at the recipient's org. Lands when the first marketing category surfaces. Foundation has no marketing producer.

Calling contract. notify.Send(ctx, recipient, category, data, opts...). Caller never names a channel — the dispatcher picks channels from the category default × recipient preferences. Opt-outs become impossible to forget because the dispatcher is the only code that maps category → channels. Transactional categories (MemberInvite, BreakGlassOpened, future AccountDeleted) carry their own legal basis and bypass preference filters; operational categories (future appointment reminders, treatment nudges) respect them.

Schema — three new tables + one column on humans.

notifications (parent — one row per logical send). Columns: id, organization_id UUID NULL (mirrors patient-portable nullable scoping — NULL for cross-org sends like account-deletion-confirmed), recipient_principal_id UUID NULL + recipient_email TEXT NULL (CHECK exactly one set; address-based supports invite where there is no principal yet), category VARCHAR(64), idempotency_key TEXT NULL (partial unique on (category, idempotency_key) WHERE idempotency_key IS NOT NULL — producer dedup), locale VARCHAR(8) + timezone VARCHAR(64) (snapshotted at enqueue), subject TEXT + body_text TEXT + body_html TEXT NULL (rendered at enqueue, immutable thereafter), scheduled_at TIMESTAMPTZ NULL (NULL = ASAP; non-NULL = worker holds dispatch until NOW() >= scheduled_at; caller computes via notify.AtRecipientLocal() so wall-clock-to-UTC conversion happens once when the offset is known — DST-correct), created_at. RLS: recipient reads own; org members read organization_id = current_app_org_id(); platform-scope (NULL org_id) admin-only via AdminPool / superadmin (existing pattern; no new permission for foundation).

notification_deliveries (child — one row per (notification × channel)). Columns: id, notification_id FK, channel VARCHAR(16) (email | sms | whatsapp | push | in_app), status VARCHAR(16) (pending → claimed → sent | failed | dead_letter), attempts SMALLINT, claimed_at TIMESTAMPTZ NULL + claimed_by_worker_id TEXT NULL (SKIP LOCKED claim), next_attempt_at TIMESTAMPTZ NULL (exponential backoff: 1m / 5m / 30m / 1h / 6h, dead-letter at 5 attempts), sent_at TIMESTAMPTZ NULL + provider_message_id TEXT NULL + last_error TEXT NULL, read_at TIMESTAMPTZ NULL (meaningful only for channel='in_app'; ignored by other adapters — read state is per-recipient across apps, so a clinic admin who reads in Clinic sees it as read in Console). RLS inherits via notification_id join.

notification_preferences (sparse override — only rows that DIFFER from category defaults exist). Columns: recipient_principal_id, category, channel, enabled BOOLEAN, updated_at. PK (recipient_principal_id, category, channel). No org_id — preferences are cross-org per the patient-portable model. RLS: recipient reads/writes own. At 20k humans × ~1% explicit opt-outs across ~10 future categories × 4 opt-outable channels, table sits in low thousands of rows for the foreseeable future.

Edit-in-place in 000002_tenancy_rbac.up.sql: add humans.timezone VARCHAR(64) NULL next to humans.locale. Resolution chain at enqueue: humans.timezone → organization_settings.default_timezone → 'Europe/Bucharest'. Distinct from P23's scheduling-timezone chain (recipient-where-they-read vs. slot-where-it-happens — both true simultaneously). Per CLAUDE.md "Migrations are editable pre-production."

Worker model. In-process polling goroutine, one per Core API instance, polls every 1–2s with ... WHERE status='pending' AND (scheduled_at IS NULL OR scheduled_at <= NOW()) ... FOR UPDATE SKIP LOCKED LIMIT N. SKIP LOCKED handles cross-instance coordination naturally — no advisory locks (P44), no LISTEN/NOTIFY (P44). Migration to a separate cmd/notification-dispatch binary is mechanical when volume warrants independent scaling: same dispatcher.Run() function, hosted by a different process.

Channel adapters. Channel interface: Send(ctx, *Notification, *Delivery) error. Foundation registers EmailChannel (AWS SDK v2 SES). Tests inject FakeChannel capturing sends to an in-memory slice. F-tier additions (SMSChannel, WhatsAppChannel, PushChannel) register against the same interface. In-app is a channel — dispatching channel='in_app' writes the delivery row with status='sent' immediately (no transport call); the future inbox bell queries JOIN notifications ... WHERE channel='in_app' AND recipient = me. In-app is always-on, no opt-out (a recipient who doesn't want to see in-app messages can simply not look at the bell — opting out at delivery would lose the audit trail of "we tried to tell you").

Categories + templates. Go-source enum + map in internal/core/notify/categories.go. Each entry: {default_channels, classification, template_key}. Foundation defines MemberInvite and BreakGlassOpened (both transactional, email-only). The transactional/operational classification is GDPR-significant (legitimate-interest vs. consent-based processing) — never a runtime config; always code. Templates: //go:embed templates/*.tmpl, one file per {category, channel, locale} combo (~4 files for foundation: 2 categories × email × en+ro). Engine: stdlib text/template + html/template. All timestamp rendering goes through inTZ / inLocale helpers; raw UTC dumps in template bodies are a code-review violation. F8's clinic-customizable templates layer a DB overlay table on top of the embedded foundation defaults.

Render + resolve at enqueue. Send() resolves recipient locale + timezone (one humans row read), renders the template, stores the rendered subject + body in the parent row. Worker is dumb — ships stored bytes to a stored address. Audit story is "what did we send?" = SELECT on the row. Rendered PII (subject, body_text, body_html, recipient_email) registers as pii_basic in data-classification.md — same shape as forms.submitted_data.

Idempotency. Caller-provided key (opts.IdempotencyKey(eventID)) enforced via partial unique. Foundation callers all have a natural key (invite_id, break_glass_session_id). Worker double-delivery (worker crashes between SES call and status update) is accepted as the failure mode — patient gets two copies of a transactional email, recoverable. Real exactly-once via two-phase commit not justified at foundation consumer scale.

Preferences vs consents — two distinct concepts dispatched separately. Marketing categories (future): consult consents ledger for marketing_{channel} purpose at the recipient's org. Required for legal basis. Operational categories (future appointment reminders): consult notification_preferences for explicit opt-outs. Transactional categories (foundation's two): skip both — these are part of the service contract. Foundation's two consumers are transactional, so the preferences/consents read paths exist as code but are not exercised at gate close.

Rate limiting. Per-recipient global cap, checked at enqueue, transactional bypass. Reuses 1A.13's Redis ratelimit. Default: 50 notifications/recipient/hour for operational categories. Catches stuck retry loops and misconfigured automation rules before they flood an inbox. Producer-level rate-limiting (one rule firing 1000×/sec) is a Layer 8 concern — this is the safety net.

Audit + classification. Per CLAUDE.md operational-metadata-exempt rule: enqueue is a state-changing mutation → audit_log row written (action notification.enqueue; recipient + category + idempotency_key recorded; body content NOT recorded — the row itself stores the body and is RLS-scoped to recipient). Per-delivery transitions (claimed, sent, failed, read) are operational metadata, no audit row — saves ~10–50× audit volume; forensic reconstruction available from the delivery row's columns directly. Classification entries for every new column ship in the same PR as the migration (CI gate enforces).

F7 webhook boundary. F7's webhook dispatcher subscribes to events.Bus (1A.9) for outbound deliveries to clinic systems. 1A.18's notification dispatcher reads from notification_deliveries for first-party email/SMS/in-app to humans. Both can fire from the same domain event (e.g., appointment.completed → webhook to clinic CRM AND notification email to patient) but are independent code paths writing to independent stores. Two transports, one event — neither cascades into the other.

Test strategy. Existing testcontainers Postgres + new FakeChannel. Acceptance test: enqueue → polling loop runs → fake captures one rendered email with the right subject / body / locale / timezone snapshots. Wired into setup_clinic_test.go once 1B.12's invite flow consumes the primitive.

Deferred to F-tier. Per-app inbox bell UI (no foundation consumer needs it; storage + RLS in place so F-tier just adds a GET /v1/me/notifications endpoint via P42's me:{principalId}:notifications tag + a shared packages/ui component). Quiet hours (no foundation consumer needs "don't email after 10pm recipient-local"; lands later as a recipient-preference column + dispatch-time check, additive). SMS / WhatsApp / push adapters (slot in via the same Channel interface when their first consumer ships). Bounce / complaint webhooks + suppression list automation (lands when first SES bounce noise warrants — manual SES suppression list config ships at 1E as the floor).

Foundation email-presentation polish (deferred — does not block 1E). Two items shipped functional-but-rough in foundation; both are visible to the first real owner that signs up:

  1. Shared HTML shell + retrofit four foundation templates. Today every template ships plaintext-only (subject + body_text blocks). HTML clients (Gmail / Outlook / Apple Mail — i.e., everyone) render this as monospace walls with raw 600-char URLs, which feels 1990s and degrades the brand. The fix is a shared templates/_layout.email.tmpl defining the HTML shell (logo block, type scale, footer with platform contact, "you got this because..." trust line) that each per-category template extends with Go-template define "body_text" / define "body_html" blocks. Notify already supports body_html end-to-end (column on notifications, sent by email.go:177-182 when present); the work is template-side. Retrofit all four foundation categories in the same PR (owner_welcome, break_glass_opened, member_invite, webhook_subscription_paused) — the shell only pays off when adoption is consistent, and four templates is small enough that piecemeal would mean two visual styles in the wild during the rollout window. Cross-template considerations to settle in that PR: the "address the recipient by name" data shape (which producers gain the responsibility to load humans.full_name alongside humans.email + locale + timezone — same ResolveForPrincipal helper noted under the address-recipient delta above), the From-name display string (today raw email; HTML lets us render "RestartiX <[email protected]>"), and whether to load any imagery (recommend NO — images block in many clients, leak read-tracking pixels, and tank deliverability if remotely hosted).

  2. Owner-welcome magic link should land on our Clinic app, not Clerk's hosted sign-in page. Today auth/clerk/provisioning.go:80-99 returns the URL signintoken.Create produces (a Clerk-hosted *.accounts.dev / auth.<clerk-domain> URL), which is then dropped into the email body. The new owner clicks "Welcome to Demo Clinic on RestartiX" and lands on a domain they don't recognize (true-sawfly-92.accounts.dev in dev; some auth.clerk.com-shaped string in prod) with no RestartiX branding, no "Welcome to Demo Clinic" context, and a generic Clerk sign-in form. Phishy-looking and brand-discontinuous. Proper fix: CreateSignInLink returns a URL on our domain (e.g. https://{slug}.clinic.restartix.pro/welcome?ticket=<clerk_ticket>); the Clinic app's /welcome route consumes the ticket server-side via Clerk's API (SignInToken.Verify or equivalent), renders a RestartiX-branded "Welcome to {org_name}" page, has the user set their password inline, drops the session cookie, redirects to the dashboard. Foundation work touches auth.Provisioning interface (the return becomes "our URL", not "Clerk's URL"); the Clinic-app /welcome route is 1D.2 territory. Splittable across PRs: backend swaps URL construction first (the route can land as a stub that just bounces to Clerk-hosted as a transition), then Clinic-app implements the real ticket-consuming route, then the stub is removed. Until this lands, the owner-welcome email functionally works but is one of the weakest first impressions in the product.

Why a foundation item. Multiple foundation consumers (1B.11 break-glass alert, 1B.12 invite) and every F-tier feature that ever notifies a human all write through this same primitive. Retrofitting the outbox shape, recipient model, preferences/consents boundary, idempotency contract, audit treatment, or timezone-aware scheduling once 50+ producers exist is exactly the cross-cutting cost foundation discipline exists to prevent. Adding humans.timezone after first prod deploy means a backfill across the whole humans table; doing it now is one column in an editable migration. Same calculus as 1A.15 audit partitioning, 1A.16 postgres extensions, 1A.17 frontend performance.

The 1B.11 spec previously punted the break-glass email to "alongside F7 OR a foundation-tier direct mail send (decide at implementation)." This section closes that punt — break-glass emails go through notify.Send, not a TODO comment.

  • [x] Migration: create notifications, notification_deliveries, notification_preferences tables with RLS policies (recipient reads own; org members with organizations.view_directory read org-scoped; sparse-prefs writable by recipient). Shipped in 000010_notifications.
  • [x] Edit-in-place: add humans.timezone VARCHAR(64) NULL to 000002_tenancy_rbac.up.sql next to humans.preferred_language.
  • [x] Edit-in-place: add organization_settings.default_timezone TEXT NULL to 000003_org_settings.up.sql — closes the resolution chain (humans → org → platform).
  • [x] Data classification entries for every new column in data-classification.md — same PR as the migration (CI gate enforces).
  • [x] internal/core/notify/ package: Send, AtRecipientLocal, Channel interface, dispatcher loop with SKIP LOCKED claim, exponential-backoff retry (1m/5m/30m/1h/6h), dead-letter cap at 5 attempts, transactional-category bypass for preferences. Per-recipient rate limit deferred (see Deltas above).
  • [x] EmailChannel adapter using aws-sdk-go-v2/service/sesv2 in internal/core/notify/email/. FakeChannel for tests in the notify package itself (so the integration suite can compose it without a build-tag dance).
  • [x] Embed MemberInvite + BreakGlassOpened templates × email × en+ro (4 .tmpl files).
  • [x] Template helpers: inTZ, inLocale, locale-aware date/time formatting (folded into template.go).
  • [x] Wire 1B.11 elevation endpoint to call notify.Send(adminPrincipal, BreakGlassOpened, data, opts.IdempotencyKey(sessionID)) after the session row commits. Wired in breakglass/service.go — fan-out lookup by admin system role + per-(session × admin) idempotency keys.
  • [x] Wire 1B.12 invite endpoint to call notify.Send(notify.ToAddress(email), MemberInvite, data, opts.IdempotencyKey(membershipID)). Deferred to BYO ESP migration — 1B.12 chose Clerk's Invitations API for invite delivery while we're on Clerk for auth emails (see 1B.12 decisions). The MemberInvite template + notify.Send plumbing are shipped and ready; the wire flips when Clerk auth emails migrate to our SES at the BYO ESP cutover. Foundation acceptance is satisfied by the break-glass consumer above + the setup_clinic_test.go acceptance test.
  • [x] SOUP row for aws-sdk-go-v2/service/sesv2 in reference/soup.md — same PR as the adapter (CI gate enforces).
  • [x] Acceptance test in setup_clinic_test.go: enqueue → dispatcher.RunOnce → fake captures rendered email with correct subject + body + locale + timezone. Plus a retry/dead-letter test exercising the failure path with MaxAttempts=2.
  • [x] CategoryDefinition.BillingScope + notify.email.ScopedChannel routing — platform-scope categories bypass metering and send from the platform-owned SES identity; tenant-scope categories (F-tier; none today) route through WrapMeteredProvider.
  • [x] capabilities.SetLimitLookup + metering.Repository.LookupLimit — system-driven dispatch paths (notify dispatcher, background jobs) get the same soft pre-check as request paths. Wired in cmd/api/main.go alongside SetMeterStore. Tests: TestWrapMeteredProvider_SystemDriven_QuotaExceeded / _QuotaUnderCap_RunsInner / _NoLookup_PassesThrough in capabilities_test.go; TestMetering_LookupLimit_* in metering_test.go.
  • [x] notify.Send guard rejects principal-id-only recipients on email-routed categories at queue time. The convention (callers resolve principal → humans.email at the call site; the email channel does not resolve) is now compiler-enforced for any future producer. Tests: TestCategoryNeedsAddress_* in category_test.go.
  • [x] AWS SES production identity verification + DKIM for the platform sender domain. Shipped 2026-05-13. restartix.pro domain identity verified in eu-central-1; DKIM CNAMEs + SPF (v=spf1 include:amazonses.com -all) + DMARC (v=DMARC1; p=none;) published in Cloudflare. Sandbox exited account-wide (211k/day quota). Configuration set + bounce/complaint webhook handler still gated on F-tier app-layer work (per production-launch-readiness.md L65-70).
  • [x] AWS SES suppression list initial config (auto-add hard bounces + complaints, per-account suppression). Enabled in console 2026-05-12.
  • [ ] Foundation email-presentation polish (deferred). Shared HTML shell + retrofit four foundation templates AND owner-welcome magic link lands on our Clinic app's /welcome route instead of Clerk's hosted sign-in page. Both described in the "Foundation email-presentation polish" paragraph above. Functional but rough; the first real owner that signs up sees both rough edges. Closed once those two land — neither blocks 1E.
  • [ ] Recipient-timezone profile UI in 1D.3 patient self-service — IANA picker, default to browser-detected at first set. Closed alongside 1D.3.

1B. Identity & Tenancy

The actor-identity model + tenancy economics. This is where patient identity now lives — patients are not memberships, patient tiers are not roles. See decisions.md → Why patients are not memberships, and patient tiers are not roles.

Build order inside 1B. Each item here depends on the items in its column or to its left. Items in the same column or rightmost are independent of each other and can run in parallel.

1B.1 Principal Model            (foundational — everything else builds on this)

1B.2 Org Settings / Billing / Entitlements
1B.3 Plans / Subscriptions / Overrides                       (parallel with 1B.2)
1B.4 Patient Tiers Catalog                                   (depends on 1B.3 catalog tables)
1B.5 Plan-Entitlement / Org-Entitlement / Limit Middleware   (depends on 1B.2 + 1B.3 + 1B.4)

1B.6 Patient Identity                          (depends on 1B.1 principals + 1B.4 default tier reference)

1B.7 Patient Subscriptions                     (depends on 1B.6 patients + 1B.4 tiers)
1B.8 Portal Onboarding                         (depends on 1B.6 + 1B.7 — provisions all three in one txn)

1B.9 Consents Ledger                           (depends on 1B.6 patient_profile_id FK)

1B.10 Privacy Notice Templates                 (depends on 1B.9 consent_purpose_versions)

1B.11 Platform Break-Glass Access              (independent of 1B.9/1B.10 except for auditing of consent reads — can run parallel to 1B.10)
1B.12 Member Invite Flow                       (independent — can run any time after 1B.1)
1B.13 Patient Impersonation Sessions           (independent — can run any time after 1B.6; mirrors 1B.11)
1B.14 Locations & Multi-Site Support           (independent — can run any time after 1B.2; blocks F4 Scheduling and any specialist/appointment work)

1B.1 through 1B.14 are shipped backend end-to-end. 1B.1–1B.10 carry their UI consumers as well (Console template management, clinic-admin legal-documents editor, portal re-consent modal); 1B.11–1B.14 ship backend + RLS integration tests, with their UI consumers (Console + Clinic admin surfaces for break-glass / invites / impersonation oversight / locations CRUD) deferred to the unified UI pass once the foundation backend is fully locked. 1B.14 closing before any Layer 2 scheduling / specialist / appointment work was the load-bearing constraint — once those tables exist, retrofitting location_id is a cross-cutting backfill the foundation discipline rule exists to prevent.

What moved out of 1B. The original 1B.15 "Per-Org Integrations Catalog" was retired into the new § 1C. Capabilities, Integrations & Metering — its scope expanded into a dedicated sub-phase covering the full integration architecture (Curated Providers, Connected Accounts, Outbound + Inbound Webhooks, Internal Events, Metering, AI Hooks, Entitlements rename). See glossary.md for the canonical taxonomy that drove the move.

1B.1 Principal Model as Root Identity (was 1.24)

Status: shipped. principals is the actor-identity root; humans, agents, service_accounts are siblings sharing the principal_id PK; the singleton 'system' principal attributes trigger fan-out and unauthenticated paths; audit_log.actor_id + actor_type carry the actor; AI provenance lives in a sibling audit_ai_provenance table. Detail: decisions.md → Why principals as the root identity, data-model.md § Area 1.

1B.2 Org-Level Settings, Billing & Entitlements (was 1.19)

Status: shipped. Three typed companion tables (organization_settings, organization_billing, organization_entitlements) auto-created on org INSERT via trigger; entitlement flags are AdminPool-only (regulated trust boundary); organizations.update_settings and organizations.manage_billing permissions seeded. Detail: architecture/org-settings.md.

1B.3 Plans, Subscriptions & Sales Overrides (was 1.20)

Status: shipped. Catalog tables (plans, plan_versions, entitlements, limit_definitions, plan_entitlements, plan_limits); per-org tables (organization_subscriptions, organization_subscription_entitlements, organization_subscription_limits, organization_subscription_overrides); snapshot-on-subscribe (P14b); entitlement projection from regulated entitlements onto organization_entitlements. Detail: architecture/plans-and-subscriptions.md.

1B.4 Patient Tiers Catalog (was 1.21, restructured by 1.26)

Status: shipped. Per-org tier catalog with versioning columns, parallel patient_tier_entitlements / patient_tier_limits mirroring the org-side billing engine, default-tier invariants (single-default partial unique index + atomic flip). No tier→role binding. Detail: architecture/plans-and-subscriptions.md § Patient tiers.

1B.5 Plan-Entitlement / Org-Entitlement / Limit Middleware (was 1.22)

Status: shipped. Four-gate model (Permission / PlanEntitlement / OrgEntitlement / Limit) with distinct error codes (403 / 402 / 403 / 402); RequirePlanEntitlement, RequireOrgEntitlement, EnforceLimit middlewares; Subject.{OrgEntitlements, PlanEntitlements, Limits} extensions; behaviour-aware Limit semantics (hard_block / soft_meter / informational). Detail: architecture/middleware-composition.md.

1B.6 Patient Identity (was 2.1, 2.2)

Status: shipped. patient_profiles (portable, no org_id) + patient_caregivers + patients (per-org link with profile_shared, consumer_id, last_used_at); patient-scoped RLS helper current_human_patient_profile_ids(); patients table grants portal access by row existence (no app.access_portal permission); per-org admin endpoints under patients/. Detail: data-model.md § Area 2, decisions.md → Why patients are not memberships.

  • [x] patients.profile_shared flips when the profile_sharing consent (org-scope, Tier A toggle in 1B.9) is granted, and back to FALSE when it's withdrawn. Wire as a trigger on consents insert/update. (Shipped in 000008 — trigger_consent_profile_sharing_flip.)

1B.7 Patient Subscriptions (was 2.5)

Status: shipped. patient_subscriptions + snapshot tables (patient_subscription_entitlements, patient_subscription_limits, patient_subscription_overrides); every onboarded patient has a defined subscription state from day one (default tier, status active, snapshots frozen at subscribe time). Detail: data-model.md § Area 16.

  • [ ] Patient self-service tier change (PATCH /v1/me/patient-subscription) — defer until external billing wires up.
  • [ ] Override grant/revoke admin endpoints — table + RLS shipped, CRUD pending a concrete use case.

1B.8 Portal Onboarding (was 2.4 portal endpoint)

Status: shipped. POST /v1/portal/onboard provisions patient_profiles + patients + patient_subscriptions (snapshots from the org's default tier) in one AdminPool transaction; idempotent on re-onboard; mounted outside OrganizationContext because that gate would 403 a fresh human with no patients row. Detail: portalonboarding/.

  • [x] patient.onboarded event publish — wired in portalonboarding/handler.go via the 1A.9 event bus. Fires only when result.PatientCreated is true (re-onboarding at a clinic where the patient already has a row stays silent). Payload carries patient_profile_id, tier_id / tier_version, and a source discriminator (invite / share_link / self_signup) plus the corresponding source-id when relevant. F7 webhook consumers subscribe by Type = "patient.onboarded".
  • [ ] Caregiver onboarding for account-less patients — table + RLS shipped; admin endpoint deferred until a real product use case.

1B.9 Consents Ledger

Single append-on-grant table that records every consent event across both platform-scope and org-scope purposes. The "trail" UX (granted, withdrawn, re-granted, current state) falls out of WHERE patient_profile_id = $1 ORDER BY granted_at DESC. Substantive design rationale in decisions.md → Why clinic is controller, platform is processor.

Status: shipped. Schema + RLS + cascade trigger + permission seeds + initial purpose/version seeds in 000008. Consents domain (model/repository/service/handler) + grant + withdraw + trail-view + catalog endpoints. Two-step onboarding (decisions.md → Why two-step onboarding): step 1 (POST /v1/me/patient-profile) creates the portable patient_profiles row + writes platform-scope consents in one admin tx; step 2 (POST /v1/portal/onboard) requires the profile to exist (409 profile_missing otherwise) and writes the per-clinic patients + patient_subscriptions chain + org-scope consents. Re-consent middleware (412 consent_required with missing list), RequireConsent(purpose) foundation-tier stub, version-supersession path on Service.Grant (so a v2 republish unblocks the gate via re-grant), and patient trail UI at /(patient)/consents consuming /v1/me/consents + /v1/consent-purposes. The current_required_consent_versions helper filters to non-consent legal-basis purposes — optional toggles never block. User.has_patient_profile exposed on /v1/me so the portal /onboard page routes step 1 vs step 2 without an extra round-trip. Withdrawing org_terms from the trail UI fires the cascade with an explicit "Leave clinic" confirmation dialog (the action is per-clinic relationship-ending, not a regular toggle).

Catalog tables:

  • [x] consent_purposes(code, scope, name, description, legal_basis, withdrawable, created_at). scope IN ('platform', 'org'). legal_basis IN ('contract', 'legitimate_interest', 'consent', 'legal_obligation', 'vital_interest'). withdrawable is derived in code (TRUE only when legal_basis='consent') but stored as a column for query speed.
  • [x] consent_purpose_versions(id, purpose_code, organization_id NULL, version, body_translations JSONB, published_at, published_by_principal_id). NULL organization_id = platform-default text. Set organization_id = org override (only valid for org-scope purposes).
  • [x] Initial purpose seeds:
    • Platform scope: platform_terms (contract, non-withdrawable), platform_privacy_notice (legitimate_interest, non-withdrawable, informational acceptance).
    • Org scope: org_terms (contract), org_privacy_notice (legal_obligation + legitimate_interest), profile_sharing (consent — patient lets the clinic see DOB, allergies, insurance instead of name only), marketing_email (consent), marketing_sms (consent), analytics (consent), ai_processing (consent).
    • Reserved for F3.5: telemedicine, video_recording, biometric_capture, treatment_specific_* (all consent-basis, registered when F3 ships).

Ledger:

  • [x] consents(id, organization_id NULL, patient_profile_id, purpose_code, purpose_version, source, source_form_id NULL, granted_at, granted_by_principal_id, granted_via_ip, withdrawn_at, withdrawn_by_principal_id, withdrawal_reason). NULL organization_id = platform-scope grant. source IN ('signup_checkbox', 'self_toggle', 'form', 'staff_action', 'api'). source_form_id is NULL except when source='form' (FK to F3's forms table). Append-only on grant — re-grant after withdrawal = new row. Withdrawal = UPDATE that sets withdrawn_at + withdrawn_by_principal_id (the only mutation allowed; rest is INSERT).

Hooks + middleware:

  • [x] Sign-up flow (two-step):
    • Step 1 — POST /v1/me/patient-profile (idempotent on the profile): creates the portable patient_profiles row keyed by the calling human + writes platform-scope consents (platform_terms, platform_privacy_notice) in one admin tx. Org-scope codes here fail 400 scope_mismatch (those belong on step 2). Mounted in the principal-RLS group outside RequireCurrentConsents so a fresh human can clear the gate by completing this step.
    • Step 2 — POST /v1/portal/onboard: requires the portable profile to exist (409 profile_missing otherwise); writes org-scope consents (org_terms, org_privacy_notice, plus any optional toggles the patient ticked) and provisions the per-clinic patients + patient_subscriptions chain.
    • Failure to accept any required purpose at either step = whole admin tx rolls back with 400 consents_required and the missing-purpose list. Each fresh consent INSERT audits as a CREATE row sourced as signup_checkbox. Implemented in portalonboarding.Service.SetupProfile (step 1) and portalonboarding.Service.Onboard (step 2).
    • Why split: platform identity (profile + platform terms) is processor-side; per-clinic membership (patients row + clinic terms) is controller-side. A patient who leaves every clinic keeps their portable profile + platform consents; the next clinic they join skips step 1 entirely. The portal /onboard page reads User.has_patient_profile to route between the two screens.
  • [x] Self-toggle endpoints in patient settings (1D.3): patient flips marketing_email / marketing_sms / analytics / ai_processing from portal settings; subject and grantor are the same principal. Each toggle is a new ledger row (grant) or an UPDATE on the active row (withdrawal). (Shipped: POST /v1/me/consents — idempotent grant with source=self_toggle; POST /v1/me/consents/{id}/withdraw.)
  • [x] Staff-action endpoints (1D.2): gated by new consents.manage permission; records the staff principal as grantor and the patient as subject (CS rep flips marketing on patient's behalf when the patient calls in). (Shipped: POST /v1/organizations/{id}/patients/{patientId}/consents with source=staff_action; POST .../consents/{consentId}/withdraw.)
  • [x] current_required_consent_versions(principal_id, organization_id) — RLS-helper-style function returns the set of (purpose, version) the user hasn't accepted yet, restricted to non-consent legal-basis purposes (contract / legitimate_interest / legal_obligation / vital_interest). Optional consent-basis toggles never appear here. Platform purposes always apply; org purposes apply when organization_id is non-NULL.
  • [x] Re-consent middleware (middleware.RequireCurrentConsents): wraps /me/* routes after RequirePrincipalRLS. Calls current_required_consent_versions; if non-empty, returns 412 Precondition Failed with {"error": {"code": "consent_required", "missing": [{"purpose_code": ..., "version": ...}, ...]}}. The consent endpoints (/v1/me/consents, /v1/me/consents/{id}/withdraw) are mounted in a sibling group without this gate so the 412 → re-grant loop can close. Service.Grant now supersedes a stale older-version active grant (writes withdrawal_reason='superseded_by_v{N}') before the re-insert, so non-withdrawable purposes can clear version drift.
  • [x] Withdrawal endpoints — patient self-withdraw (POST /v1/me/consents/{id}/withdraw); staff-on-behalf-of-patient withdraw (gated by consents.manage). (Both shipped; cascade trigger fires on org_terms withdrawal at either path.)
  • [x] Cascade rule: withdrawing platform platform_terms is not a patient-initiated path — UI says "to revoke these, delete your account" and triggers the GDPR erasure flow (handled in F11.1). Withdrawing org_terms at clinic A is a single transaction that (1) sets patients.deleted_at = NOW() at clinic A, (2) transitions the active patient_subscriptions row to status='canceled' with canceled_at = NOW() — the row stays in place for billing / audit history, never hard-deleted, and (3) cascades withdrawal of all org-scope consents at that org via trigger. (Shipped in 000008 as trigger_consent_org_terms_cascade — DB-enforced; defense-in-depth for any code path.) Supersession-guard (added 2026-05-02): the trigger skips when withdrawal_reason LIKE 'superseded_by_v%' — the consents service uses that withdrawal-reason convention when a re-grant supersedes a stale older-version row, and without the guard re-acceptance of org_terms would auto-soft-delete the patient. The cascade fires only on real "leave clinic" intent (no superseded_by_* reason); locked by TestConsents_VersionSupersessionDoesNotFireOrgTermsCascade.
  • [x] Re-onboarding semantics (foundation invariant, locked in 2026-05-01): the partial unique index patients_profile_org_active_uniq (000006) allows a returning patient to sign up at the same clinic again — they get a brand-new patients row + brand-new patient_subscriptions row. The previous (soft-deleted) patients row + (canceled) subscription row stay in place as historical record. Each onboarding chapter has its own audit trail via per-row entity_id. The portable patient_profiles row is reused (one identity, many processing chapters). When 1B.9 ships the consents withdrawal flow, the cancel-subscription step above is what makes the historical chapter stay queryable as "patient was at clinic A from time X to time Y." (Mechanically enforced by the cascade trigger from 000008.)
  • [x] RequireConsent(purpose) middleware — analogous to RequireOrgEntitlement / RequirePlanEntitlement. Used by features that require a specific consent (telemedicine flow, AI inference, biometric capture). Returns 403 consent_required with missing_purpose. Foundation-tier stub: no production route consumes it today; F3 / F5 / F9 wire it up when those features land. Resolves scope from the catalog (platform-scope = no org context; org-scope = current org).

RLS + permissions:

  • [x] RLS on consents: org staff with new consents.view_org see consents in their org; patients see their own across all orgs (via current_human_patient_profile_ids()); platform-scope rows (NULL organization_id) visible to the patient and to break-glass-elevated staff via 1B.11.
  • [x] RLS on consent_purposes + consent_purpose_versions: SELECT for everyone (these are catalog rows; the text is by definition public). Mutations via AdminPool only (catalog edits are migrations or admin-tool actions).
  • [x] Permission seeding: consents.view_org (granted to specialist + customer_support + admin), consents.manage (admin + customer_support — staff-action grantor path).

Trail view:

  • [x] Patient-side: GET /v1/me/consents returns the patient's full consent history grouped by (organization_id, purpose_code) with current state + history rows. Optional ?organization_id= filter narrows to one clinic.
  • [x] Staff-side: GET /v1/organizations/{id}/patients/{patient_id}/consents returns the same shape scoped to that patient at that clinic (gated by consents.view_org).
  • [x] Catalog: GET /v1/consent-purposes?organization_id={id} returns each purpose paired with the latest applicable version body (org override wins, platform-default fallback). Used by the sign-up consent block + patient settings UI.
  • [x] Patient settings UI surface in 1D.3: /(patient)/consents/page.tsx renders the trail grouped by platform-scope and org-scope. Withdrawable purposes (legal_basis = 'consent') get inline Grant / Withdraw buttons; non-withdrawable purposes show "delete account to revoke" (platform) or "leave clinic to revoke" (org) copy. Server actions hit /v1/me/consents and /v1/me/consents/{id}/withdraw. Sign-up consent block ships in /onboard (OnboardForm + JoinClinicButton) — required purposes block submit until checked; server enforces regardless. Sidebar entry under nav.consents (en + ro).

1B.10 Privacy Notice Templates

Platform provides a versioned template; clinic fills placeholders + selects toggleable sections; the assembled markdown becomes the org_privacy_notice (and org_terms) text the patient sees and accepts. Maintains the controller/processor split — the clinic owns the legal artefact, the platform provides the scaffolding. Substantive design rationale in decisions.md → Why clinic legal documents are templated, not forms.

Scope decision (locked): the same machinery covers both org_terms and org_privacy_notice — both are clinic-authored, versioned, template-assembled, and gate onboarding. Discriminator column document_type IN ('terms', 'privacy_notice'). One editor surface, one publish path.

Schema (shipped in 000009):

  • [x] legal_document_templates — platform-level versioned templates: (id, document_type, version, locale, body_with_placeholders, required_placeholders TEXT[], toggleable_sections JSONB, published_at, published_by_principal_id, created_at). UNIQUE (document_type, version, locale). One row per locale; placeholder values + section toggles are global, locale-agnostic. required_placeholders is the contract the publish path enforces.
  • [x] organization_legal_documents — per-org editor state: (id, organization_id, document_type, source_template_version, placeholder_values JSONB, included_sections JSONB, published_version, last_reviewed_by_principal_id, last_reviewed_at, created_at, updated_at). UNIQUE (organization_id, document_type). Mutable: clinic admin saves drafts repeatedly. published_version corresponds to the consent_purpose_versions row minted at publish time.
  • [x] Org-create trigger extended: create_organization_companion_rows (000003) updated via CREATE OR REPLACE to insert two organization_legal_documents rows (one per type) with published_version = NULL pointing at the latest available template version. Backfill DO block in 000009 covers pre-existing orgs.
  • [x] RLS: legal_document_templates SELECT for everyone (catalog text is public-by-design). organization_legal_documents SELECT for org members (editor state names DPO email + registered address); UPDATE gated by organizations.manage_privacy_notice. INSERT via AdminPool only (trigger + backfill); DELETE never (cascade with organizations).
  • [x] Permission seeded: organizations.manage_privacy_notice granted to the admin system role template.
  • [x] Seed templates v1 (en + ro) for both terms and privacy_notice with placeholder bodies + the three foundation toggleable sections (video_recording, biometric_capture, cross_border_transfer). Real ANSPDCP-compliant text + lawyer review lands as part of the Romanian compliance pass (Deferred Foundation Extensions).

Backend domain (internal/core/domain/legaldocument/):

  • [x] Repo: CRUD on organization_legal_documents (RLS-gated UPDATE on ctx tx) + read on legal_document_templates. CallPublish invokes the SECURITY DEFINER publish_legal_document function from the request's ctx tx so both writes stay atomic and the function's permission/scope checks see the calling principal.
  • [x] Service:
    • [x] Assemble(template, placeholderValues, includedSections) (string, error) — pure function (testable; no DB). Replaces placeholders, validates required_placeholders, appends toggleable sections in catalog order with explicit-toggle-wins-default-fallback semantics. Unit tests cover the four key cases (placeholder substitution, missing required, blank required, default-section fallback).
    • [x] SaveDraft(orgID, docType, placeholderValues, includedSections, principalID) — UPDATE on organization_legal_documents via ctx tx; stamps last_reviewed_*. RLS denial → 403 forbidden.
    • [x] Publish(orgID, docType, principalID) — full validation chain (document_type valid → editor row exists → templates exist → required placeholders satisfied → assemble per locale → call publish_legal_document(...)). Returns (before, PublishResult); handler audits consent_purpose_version CREATE + organization_legal_document UPDATE in two rows.
    • [x] Preview(orgID, docType, locale) — assembles one locale without persisting. Same validation as Publish minus the required-placeholder pre-check (a partial draft can still preview).
    • [x] ListConsoleTemplates() — latest per (document_type, locale) for the Console template-management surface.
  • [x] Handlers (clinic admin / /v1/organizations/{id}/legal-documents/...):
    • [x] GET / — list both editor rows (terms + privacy_notice). Used by dashboard task card + onboarding gate.
    • [x] GET /{type} — current draft + latest templates per locale.
    • [x] PUT /{type} — save draft (no version bump).
    • [x] POST /{type}/preview — assemble without persisting.
    • [x] POST /{type}/publish — version-bump path.
  • [x] Handlers (Console / /v1/admin/legal-document-templates):
    • [x] GET / — superadmin-gated list of latest platform templates per (document_type, locale).
    • [x] POST / — superadmin publishes a new platform template version. Atomic per-locale write at MAX(version)+1; server validates document_type, locale uniqueness, non-blank bodies, and that every required placeholder has a marker in every locale's body. One audit legal_document_template CREATE per locale row.
  • [x] Permission organizations.manage_privacy_notice exposed as auth.PermOrganizationsManagePrivacyNotice; route layer gates every clinic-admin endpoint, RLS UPDATE policy + publish_legal_document defense-in-depth re-check enforce it at the DB.

Onboarding integration:

  • [x] POST /v1/portal/onboard returns 409 org_setup_incomplete when either organization_legal_documents.published_version IS NULL for the target org. Gate runs pre-tx (after self_signup_disabled check, before profile resolution) so a fail leaves zero rows behind. Error context lists the unpublished document_types so the portal can surface a useful "this clinic is finalising" message. Defense-in-depth: without this gate a fresh-but-unfinished clinic would have its patients implicitly accepting the platform-default consent_purpose_versions rows for purposes the clinic is supposed to author. Test harness Harness.PublishLegalDocuments short-circuits the editor for tests that drive /portal/onboard. Integration tests in internal/test/rlstest/legal_document_gate_test.go cover the rejected-when-unpublished, rejected-when-half-published, and passes-when-both-published paths.

Frontend (Clinic Admin / 1D.2):

  • [x] /legal-documents/page.tsx (list) + /legal-documents/[type]/page.tsx (editor). Editor renders one input per required_placeholders key + one checkbox per toggleable_sections[].key, seeded from the existing draft (or template defaults). Save Draft / Publish buttons; Publish opens a confirmation modal explaining "every existing patient re-consents on next login." Calls API client methods through server actions (saveDraftAction, publishAction); publishAction runs saveDraft then publish so the version is minted against the latest values. Gated by organizations.manage_privacy_notice at the route layer; the sidebar item is suppressed for non-permission-holders.
  • [x] Dashboard task card on (dashboard)/page.tsx: probes listOrganizationLegalDocuments and renders one of three states: hidden (no permission, or all docs published + on the latest template), "Complete your legal documents" (any unpublished — onboarding blocked; priority over the stale prompt), or "Review template update" (all published but at least one editor row is on a stale platform-template version). Suppressed for principals without the permission. Probe failures are swallowed.

Frontend (Console / 1D.1):

  • [x] Templates list page + "Publish new template version" form (raw markdown body + sections JSON per-locale tabs, comma-separated required-placeholders input). Both audit-logged. Foundation-level scope: not a real markdown editor — platform team pastes from a lawyer-reviewed Word doc. Polish (markdown preview, structured section editor) is deliberately deferred since this surface is used rarely by a few people. See decisions.md → Why clinic legal documents are templated, not forms for the editor-vs-form-builder boundary.
  • [x] Cross-tenant read-only view of which orgs are stale on the latest platform template. GET /v1/admin/legal-document-templates/stale-orgs (superadmin) returns every (org, document_type) where source_template_version < MAX(legal_document_templates.version), sorted document_type → source ASC → org name. Surfaced on the Console templates page as a per-doc-type table (org, source v, latest v, published v).

Re-consent semantics (1B.9 backend, surfaced by 1D.3 portal modal):

  • [x] Clinic re-publishes → new consent_purpose_versions row at version N+1 → re-consent middleware (RequireCurrentConsents) catches existing patients on next request → 412 with {code: "consent_required", missing: [...]} → portal (patient) layout probes GET /v1/me/required-consents → blocking ReconsentModal renders missing purposes' bodies → patient accepts → acceptRequiredConsents server action grants every missing purpose; consents service supersedes the v1 active grant with withdrawal_reason='superseded_by_v{N}' and inserts the v_n row. The org_terms cascade trigger skips supersession-driven withdrawals (withdrawal_reason LIKE 'superseded_by_v%') so re-acceptance does NOT trigger leave-clinic; locked by TestConsents_VersionSupersessionDoesNotFireOrgTermsCascade (subtests cover both org_terms and org_privacy_notice).
  • [x] Platform template version bump → clinics with source_template_version < latest see a "Review template update" prompt. Closed via three pieces: (1) GET /v1/organizations/{id}/legal-documents and GET .../{type} now return latest_template_version per row so every consumer can compute stale-ness without extra round-trips; (2) clinic dashboard task card switches to "Review template update" copy + CTA when the org is published-but-stale (priority over "Complete your legal documents" only when the org is fully published); (3) editor page renders an amber banner with a "Refresh to v_n" button that calls the new POST /v1/organizations/{id}/legal-documents/{type}/refresh-source-template endpoint (audited UPDATE on the editor row, returns 409 already_current for idempotent UI guard). Refresh bumps source_template_version and preserves existing draft values for unchanged keys; new keys in v_n surface as empty inputs the admin must fill before re-publishing.

1B.11 Platform Break-Glass Access

Controlled, audited, transparent access for platform staff to identifiable cross-tenant patient data. The processor boundary is the default; break-glass is the documented exception path. Lives in foundation because every Console surface that touches patient data has to know whether it's always-on or break-glass-gated.

Status: primitive shipped. Schema, middleware, elevation endpoints, audit attribution, notification fan-out via 1A.18, and integration tests all closed. Console surface classification + Clinic admin banner light up as 1D.1 / 1D.2 surfaces ship — the foundation primitive is the gate they consume.

Decisions locked during implementation:

  • Platform-permission model — pure Go, not data-driven. Per-org RBAC (permissions / roles / role_permissions) is for tenant authorization checked by RLS; platform permissions are checked in Go middleware only. Encoding break_glass.* codes + the role → permission map in services/api/internal/core/principal/platform_permissions.go (with Subject.HasPlatformPermission(code) as the call site) keeps the two models cleanly separated. New platform permission OR support_engineer scope adjustment is a single-file Go change. The platform_role_permissions table reserved at 000002:416 lands when a real consumer needs DB-side platform-permission joins; until then the speculation cost is zero.
  • Schema mutations on AdminPool, not AppPool. The original spec wording said "AppPool-INSERT via the elevation endpoint"; we landed on AdminPool-only writes with REVOKE INSERT, UPDATE, DELETE, TRUNCATE FROM restartix_app, mirroring audit_log and notifications. The service-layer Subject.HasPlatformPermission is the load-bearing authorization gate; the REVOKE is the DB-layer floor. AppPool-with-WITH-CHECK would gratuitously couple session inserts to the request-tx GUC binding lifecycle, which the test path doesn't always have.
  • Active-session uniqueness via partial unique index. (principal_id, organization_id, scope) WHERE closed_at IS NULL. Same-principal double-clicked elevation modal hits the constraint; service catches the violation and returns the existing session. No duplicate audit rows, no duplicate notification fan-out (per-admin idempotency keys dedup at the notify layer).
  • Lazy expiry finalization. A row with closed_at IS NULL AND expires_at < NOW() is closed on the admin pool the first time Service.ActiveFor reads it. closed_at = expires_at (system-finalized at the natural-end moment), closed_by_principal_id = NULL (system close). Keeps the unique index honest without requiring a sweeper cron.
  • No redundant audit row from Service.Open/Close. The session row IS the artifact; the open + close events ride on the calling handler's audit row (same shape as 1A.18 notifications). audit_log carries break_glass_id linking back via the GUC bound by set_app_break_glass_session_id + the redefined audit_log_insert (this migration's CREATE OR REPLACE).

Schema (shipped in 000011):

  • [x] break_glass_sessions(id, principal_id, organization_id, scope, reason_category, reason_text, reason_ref NULL, opened_at, expires_at, closed_at, closed_by_principal_id NULL). scope IN ('patient_list', 'patient_detail', 'audit_full', 'cross_org_lookup', 'org_management') (the last added in 1B.11.x). reason_category IN ('support_ticket', 'security_incident', 'dsar_routing', 'fraud_investigation', 'platform_engineering'). CHECK length(btrim(reason_text)) >= 10 + CHECK expires_at > opened_at AND expires_at <= opened_at + INTERVAL '4 hours'. RLS-enabled (SELECT for own + org members with audit_log.view_org); DML REVOKE'd from restartix_app so writes go through admin pool only. Partial unique (principal_id, organization_id, scope) WHERE closed_at IS NULL for active-session uniqueness; covering indexes on (organization_id, opened_at DESC), (principal_id, opened_at DESC), partial (opened_at DESC) WHERE closed_at IS NULL for the Console "all currently active" surface.
  • [x] audit_log.break_glass_id UUID NULL was already reserved in 000001; audit_log_insert redefined in 000011 to populate it from current_app_break_glass_id() (a session GUC bound by set_app_break_glass_session_id). Every audit row written inside an elevated session carries action_context = 'break_glass' + break_glass_id = <session.id> automatically — no per-handler plumbing.
  • [x] Data classification entries for every column registered in data-classification.md → Break-glass sessions. Reason fields ship pii_basic + support_export-only (operator free-text may carry support context).

Permissions + middleware (shipped):

  • [x] Platform permission constants + role → perms map in platform_permissions.go: RoleSupportEngineer, PlatformPermBreakGlass{PatientList,PatientDetail,AuditFull,CrossOrgLookup,Manage}. Superadmin holds everything via IsSuperadmin == true; support_engineer holds PatientList + PatientDetail + AuditFull (CrossOrgLookup + Manage stay superadmin-only). Subject.HasPlatformPermission(code) is the Go call site.
  • [x] RequireBreakGlass(scope) middleware factory: reads URL {paramName} for the target org id, looks up the active session, returns 403 break_glass_required (no session) or 410 break_glass_expired (session expired but not closed). Active match binds the session GUC + attaches the session id to context via BreakGlassSessionIDFromContext.
  • [x] Elevation endpoint POST /v1/break-glass/sessions — body {organization_id, scope, reason_category, reason_text, reason_ref?, expires_in_minutes}. Service validates platform permission via HasPlatformPermission(scopeToPermission(scope)); partial-unique-index conflict returns existing session instead of 409. Per-principal rate-limited via RATELIMIT_BREAK_GLASS_OPEN_LIMIT (default 5/min).
  • [x] Close endpoint POST /v1/break-glass/sessions/{id}/close. Self-close path checks row.principal_id == subject.principal_id; manage-close path checks subject.HasPlatformPermission(PlatformPermBreakGlassManage). Both audit-logged with entity_type = 'break_glass_session'.
  • [x] GET /v1/break-glass/sessions[?org_id=&principal_id=&only_active=&limit=&offset=] — RLS-scoped read on the request tx for the elevating principal + org members; admin-pool read for superadmin / break_glass.manage holders (cross-org Console "all active sessions" surface). GET /v1/break-glass/sessions/{id} for detail.

Notification (shipped):

  • [x] Always-on email to the clinic admin(s) when a break-glass session opens against their org. Sent via 1A.18: Service.Open queries organization_memberships for principals with the system admin role and calls notify.Send(notify.To(admin), CategoryBreakGlassOpened, data, IdempotencyKey(<session>:<admin>), Org(org)) per recipient. Per-admin idempotency keys dedup retries at the notification layer. Template fields (org_name, staff_name, staff_email, scope, reason_*, opened_at/expires_at as time.Time) loaded via FindOrgName + FindHumanIdentity repository helpers.
  • [x] Audit attribution end-to-end: every audit row written inside an elevated session carries action_context = 'break_glass' + break_glass_id = <session.id>. Locked by TestBreakGlass_Middleware_GatesProtectedRouteAndStampsAudit.

Open follow-ups (consumed by 1D.1 / 1D.2 / 1E):

  • [x] Expired-session sweeper for break_glass_sessions + patient_impersonation_sessionscmd/expired-sessions-sweep (EventBridge Scheduler → ECS RunTask, every 15 min, smallest Fargate sizing). Per-domain SweepExpired(ctx, repo, now, batchSize) free functions in internal/core/domain/{breakglass,impersonation}/sweep.go find rows with closed_at IS NULL AND expires_at < now, call the existing CloseAdmin path (stamps closed_at = expires_at, closed_by_principal_id = NULL), and emit a system-attributed audit_log UPDATE row with action_context = 'break_glass' / 'impersonation'. Race-safe via the underlying WHERE closed_at IS NULL. 3-test rlstest acceptance at expired_sessions_sweep_test.go covers close-stamping, audit attribution, skip-active, skip-already-closed, idempotency-on-rerun. Partial-scope follow-up (not load-bearing): the middleware lazy-finalize path (Service.ActiveFor / Service.Open) still closes expired rows on next-request WITHOUT writing an audit row — sessions touched within ~15 min of expiry have no close audit row. Extending that path to mirror the sweep's system-attributed audit shape is a separate cleanup; the sweeper alone closes the originally-stated gap ("sessions opened and never re-touched stay closed_at=NULL forever").
  • [ ] In-app banner in Clinic admin UI (1D.2) showing active and recent break-glass sessions against the org. Reads GET /v1/break-glass/sessions?org_id={current_org_id} (RLS-scoped via audit_log.view_org SELECT policy). Lights up as 1D.2 admin surfaces ship.
  • [ ] Console surface classification (1D.1). Foundation ships the RequireBreakGlass(scope) primitive; mounting it on each Console route is 1D.1 work alongside the actual surfaces. Specific classifications:
    • Org list, org detail (profile, billing, plan, entitlements) → aggregate
    • Patient counter per org → aggregate
    • Audit log metadata cross-tenant (timestamps, actions, status codes — diff content masked) → aggregate
    • Audit log full content cross-tenant (with diffs, IPs, request bodies) → break_glass:audit_full
    • Patient list per org → break_glass:patient_list
    • Patient detail per org (profile, subscriptions, consents) → break_glass:patient_detail
    • Cross-org patient lookup (rare, narrow) → break_glass:cross_org_lookup
    • Humans/users page filtered to staff principals only → aggregate
    • Humans/users page including patient principals → break_glass:patient_list

Cross-tenant guardrail (locked at the foundation level):

  • [x] Foundation principle codified in implementation-plan.md: cross-tenant features operate on anonymised data only. Any feature that needs identifiable cross-tenant data must either go through the break-glass pattern (one-shot, audited, narrow) or surface as an explicit ADR-worthy request to break the rule. Joint controllership avoidance is the underlying reason — see decisions.md → Why clinic is controller, platform is processor.
  • [x] DSAR routing flows through the clinic, not the platform. [email protected] auto-responder + portal self-service ("Your clinics" list at /v1/me.patient_org_ids) handle 99% of misdirected requests without break-glass. Genuinely orphaned requests (ex-patient, no active account) are break-glass with reason_category='dsar_routing'.

1B.11.x Console-Side Break-Glass Primitive

Sub-phase opened 2026-05-10. The 1B.11 backend is complete; this slice closes the Console-side wiring that turns the gate from theatre into reality. Reusable primitive, not per-feature one-offs — see decisions.md → Why one Console-side break-glass primitive and patterns.md → P55.

Status: shipped. Backend smarter middleware + new scope, Console session context + hook + modal + banner + gate wrapper + action wrapper, four routes mounted (PATCH /:id, POST/DELETE /members, POST /staff-invitations), acceptance tests landed, ADR + pattern entry documented.

Decisions locked during implementation:

  • One scope (org_management), not per-resource. Covers staff/role/member/settings writes. Single elevation session covers a related task without re-prompting per click; the audit_log.action row records the specific action so blast radius is reconstructable.
  • Smarter middleware on the same route, not duplicate Console routes. RequirePerOrgPermissionOrBreakGlass(permission, scope, svc, "id") admits tenant principals via per-org permission and platform principals via active break-glass session. Same staff-invitations route serves the Clinic app (tenant) and the Console (platform-with-elevation). Avoids the doubled-routes-that-drift failure mode.
  • Reads stay always-on; only writes need elevation. Mirrors the patient surfaces — aggregate view always-on, identifiable lists / writes elevation-gated. Staff data isn't PHI; the controller-vs-processor risk is on writes, not list reads.
  • Platform principals don't bypass even when org members. The whole point is every cross-tenant write links to an open session — incidental membership doesn't bypass.
  • Console-side fetch is React.cache-shared, not P42-tagged. Session timing is load-bearing — a session expired 30s ago must not be cached as active. React.cache keeps layout + descendant page on one round-trip without crossing the staleness threshold.

Backend (shipped):

  • [x] Add org_management to the chk_break_glass_sessions_scope CHECK in 000011_break_glass.up.sql (pre-prod, edited in place per CLAUDE.md "migrations editable pre-production"). Add ScopeOrgManagement to breakglass/model.go + IsValid + scopeToPermission mapping.
  • [x] PlatformPermBreakGlassOrgManagement in platform_permissions.go; granted to support_engineer. Superadmin holds via IsSuperadmin == true.
  • [x] middleware.RequirePerOrgPermissionOrBreakGlass(permission, scope, svc, paramName) — splits by principal type; tenant via Subject.HasPermission, platform via active session lookup + audit-GUC binding.
  • [x] Mount on PATCH /v1/organizations/{id}, POST /v1/organizations/{id}/members, DELETE /v1/organizations/{id}/members/{principalId}, POST /v1/organizations/{id}/staff-invitations in routes.go. Read endpoints (GET /members, GET /staff-invitations, GET /) stay on RequirePermission only — always-on for operational support.
  • [x] OpenAPI BreakGlassScope enum updated to include org_management; packages/api-client/src/generated.ts + services/api/internal/core/server/openapi/spec.gen.go regenerated.
  • [x] Acceptance tests in per_org_permission_or_break_glass_test.go: tenant admin → 200; superadmin without session → 403 break_glass_required; superadmin with session → 200 + bg=<session_id> in handler context (proves the GUC bind path); tenant without permission → 403; platform expired session → 410 break_glass_expired.

Console (shipped):

  • [x] apps/console/lib/break-glass.tsgetActiveBreakGlassSessionsForOrg(orgId) (server-only, React.cache-wrapped) + findActiveSession(sessions, scope).
  • [x] apps/console/lib/break-glass-actions.tsopenBreakGlassSessionAction + closeBreakGlassSessionAction server actions calling the api-client; standard revalidatePath + refresh() chain.
  • [x] apps/console/lib/with-break-glass.tswithBreakGlass(fn) server-action wrapper; surfaces break_glass_required / break_glass_expired as a typed sentinel for useActionState consumers.
  • [x] apps/console/components/break-glass/break-glass-session-provider.tsx — context provider + useBreakGlassSession(scope) + useAllActiveBreakGlassSessions() hooks. Honours P48 via useServerSyncedState.
  • [x] apps/console/components/break-glass/elevation-modal.tsx — real backend-connected modal (replaces the URL-stub ?bg=active modal that lived under components/organizations/).
  • [x] apps/console/components/break-glass/active-session-banner.tsx — pinned in the clinic-detail layout; per-session row with reason + minutes-left + close button (calls real backend closeBreakGlassSession).
  • [x] apps/console/components/break-glass/require-break-glass.tsx<RequireBreakGlass scope=…> client wrapper. Defense-in-depth UI gate.
  • [x] Clinic-detail layout wraps children in <BreakGlassSessionProvider> + renders <ActiveBreakGlassBanner> above content; old URL-state stub clinic-detail-banners.tsx deleted.
  • [x] Patients page rewired from searchParams.bg === "active" URL state to findActiveSession(sessions, "patient_list") against the server-fetched session list. Old stub components/organizations/elevation-modal.tsx deleted; new modal under components/break-glass/.
  • [x] InviteStaffDialog — first integration consumer of the primitive end-to-end. Dialog wraps its content in <RequireBreakGlass scope="org_management"> so the body switches between elevation prompt (no session) and the invite form (session active). inviteStaffAction wraps the api call in withBreakGlass(...); if the session expires between dialog-open and submit, the result's needsElevation field flips the body back to the elevation prompt without dropping the user out. Validates the full flow: tenant principals would consume the same route via per-org permission from the Clinic app; Console superadmins consume it via elevation.

Open follow-ups (next consumers, not blocking 1B.11.x close):

  • [ ] Mount RequirePerOrgPermissionOrBreakGlass(perm, ScopeOrgManagement, …) on org-settings + designations + webhooks + integrations + privacy-notice + domains write routes when those Console surfaces light up. Mechanical wraps; no design left.
  • [ ] Member role-change + remove flows on the Console members page mount the RequireBreakGlass UI gate + send actions through withBreakGlass. Backend already gated.

Inverted onboarding paths: clinic admins invite staff and patients by email (personal invitations); clinics also mint multi-use share-links for QR codes and intake forms (patient-only). One mechanism (Clerk Invitations API) covers both invitation kinds; share-link is a separate code-anchored primitive. Platform-engineer invites are deferred to a non-foundation feature — they will live in a sibling platform_invites table (purely additive; foundation contract preserved).

Status: backend complete. Personal-invitation primitive (staff + patient) + share-links primitive (patient-only) + portal-onboarding integration (tier resolution from invite/share-link, atomic redeem, mark-consumed) + resend endpoint + 12 RLS integration tests all shipped. Only the 1D.2 / 1D.3 admin + portal UI surfaces remain — those belong in the 1D slice on top.

Decisions locked during implementation:

  • Mechanism: Clerk's Invitations API (invitation.Create) — not custom-token-via-our-SES. While we're on Clerk for auth emails (Layer 1+2 customisation: dashboard template + custom from-domain), the invite email rides the same pipeline. When we flip to BYO ESP (Layer 3) before the dedicated-mode launch, both Clerk auth emails and invite emails migrate together. Avoiding the mixed "our-SES for invites + Clerk for auth" middle state. The shipped 1A.18 MemberInvite template stays in repo as the visual source-of-truth; copy into Clerk's dashboard "Invitation" template until BYO ESP lands.
  • No Svix webhook receiver. Acceptance is detected on every authenticated request: the auth middleware's OnAuthHook calls invites.Service.BindForPrincipal(principalID, email), which finds every open invite matching the email and binds them. No webhook signature verification, no emails.created subscription. The hook fires on EVERY authenticated request — not just first-sight — because cross-clinic invites for principals that already exist in our system never trigger created=true (a specialist hired by a second clinic, a patient referred to a second clinic). The bind is idempotent by design: zero open invites = single index hit, zero rows, return; pending invites = one bind, subsequent calls find consumed_at set and return zero. Re-evaluate Clerk webhooks when a real event use case (e.g. email.bounced cleanup, user.deleted cascade) justifies the webhook surface area.
  • Single mechanism, two binding paths. The webhook-free bind step dispatches by kind:
    • stafforganization_memberships row created in one admin tx (accepted_at + consumed_at + the row's invited_at/invited_by reserved in 1A.12).
    • patientaccepted_at set only; portal onboarding step 2 sets consumed_at when the patients + patient_subscriptions chain commits. Consents still grant explicitly on the consent gates — invite acceptance never bypasses them.
  • Schema co-locates both invite kinds in organization_invites. Org-scoped, named for what it is. CHECK constraint pins (kind=staffrole_id set, patient_tier_id NULL) and (kind=patientrole_id NULL). Partial unique index (organization_id, lower(email), kind) WHERE consumed_at IS NULL AND revoked_at IS NULL blocks duplicate open invites; the service catches the violation and returns ErrPendingInviteExists instead of a 500.
  • No audit_log_insert redefinition. Invitations are normal CRUD events; entity_type='organization_invite' rows attribute correctly via the existing actor + envelope chain. Break-glass and impersonation redefine audit_log_insert because they need action_context overrides; nothing here does.
  • organization_memberships.principal_id stays NOT NULL. The original spec wording said "pending row with no principal_id yet" — we landed on a separate organization_invites shadow table that binds to a freshly created principal at the auth-middleware step, keeping the membership table coherent.

Schema (shipped in 000012):

  • [x] organization_invites(id, organization_id, provider_invitation_id, email, kind, role_id NULL, patient_tier_id NULL, invited_by_principal_id, invited_at, expires_at, accepted_at NULL, accepted_principal_id NULL, consumed_at NULL, revoked_at NULL, revoked_by_principal_id NULL). RLS policies split by kind: manage_members for staff invites, patients.manage for patient invites. AppPool DML revoked; service writes through admin pool.
  • [x] Partial unique index for "one open invite per (org, email, kind)" + supporting indexes for cross-org email scan (the bind path) and clinic admin oversight (org_recent).
  • [x] No audit_log column added — see decisions block above.

Clerk SDK + service (shipped):

  • [x] internal/core/auth/clerk/invitations.go wraps the Clerk Backend SDK's invitation package. Create issues a magic-link invite with our public_metadata (invite_kind + organization_id) and the kind-specific redirect_url (clinic.* for staff, portal.* for patient). Revoke is best-effort — the local row is the source of truth for our bind path. Notify=true until BYO ESP lands.
  • [x] internal/core/domain/invites/ — model, errors, repository, service, handler. CreateStaff / CreatePatient / Resend / Revoke / List / Get plus the foundation hook BindForPrincipal consumed by the auth middleware, and MarkPatientConsumed + FindAcceptedPatientInviteForPrincipalOrg consumed by portal onboarding step 2.
  • [x] internal/core/domain/sharelinks/ — model, errors, repository, service, handler. Mint / List / Get / Revoke for clinic admins; ResolvePublic for the portal landing page (admin-pool, no auth); RedeemForPatientOnTx runs inside the portal-onboarding admin tx so the use_count increment commits/rolls-back together with the patients chain.
  • [x] internal/core/middleware/auth.go extended with OnAuthHook + WithOnAuthHook option. Auth middleware fires registered hooks on every authenticated request after audit.SetActor, inside a panic recover so a hook crash doesn't lock anyone out. Wired in routes.go to call invitesService.BindForPrincipal. Every-request semantics (vs. first-sight only) is what makes cross-clinic invites for existing principals work — the bind is idempotent by design and returns immediately on a single index hit when there are no open invites.

Endpoints (shipped):

  • [x] POST /v1/organizations/{id}/staff-invitations — gated by organizations.manage_members. Body {email, role_code, expires_in_days?}. Defaults: 7 days, max 30.
  • [x] POST /v1/organizations/{id}/patient-invitations — gated by patients.manage. Body {email, patient_tier_id?, expires_in_days?}. patient_tier_id NULL = use org default tier at consume time.
  • [x] GET /v1/organizations/{id}/staff-invitations[?status=&limit=&offset=] and GET /.../patient-invitations[...] — RLS-gated lists. status filter: pending / accepted / consumed / revoked / expired.
  • [x] GET /v1/organizations/{id}/invitations/{inviteId} — detail. RLS-gated; the row's kind decides which permission admits.
  • [x] POST /v1/organizations/{id}/invitations/{inviteId}/revoke — service-layer permission gate branches on row.kind. Revokes the Clerk-side invitation too (best-effort).
  • [x] POST /v1/organizations/{id}/invitations/{inviteId}/resend — atomically rotates the provider-side invitation: mints a new one with a fresh magic-link URL + expiry window, revokes the old one. Local row's provider_invitation_id is updated in place; rejects on already-accepted / already-revoked rows.
  • [x] Share-link endpoints: POST/GET /v1/organizations/{id}/share-links + GET /v1/organizations/{id}/share-links/{shareLinkId} + POST /v1/organizations/{id}/share-links/{shareLinkId}/revoke (all gated by organizations.manage_share_links); public GET /v1/public/share-links/{code} returns {org_name, slug, tier_name, valid} for the portal landing page (per-IP rate-limited under public_resolve).
  • [x] POST /v1/portal/onboard accepts optional share_link_code in body, resolves pending patient invite by (accepted_principal_id = me, organization_id = current), picks tier with precedence invite > share-link > org default, atomically increments share_links.use_count under "still active" predicate, marks the patient invite consumed — all in the same admin tx. Per-IP rate-limited via the new share_link_redeem policy.
  • [x] Audit-logged everywhere: invite create / revoke / resend / consume / membership create on bind (with explicit actor override since the bind hook fires before audit.SetActor); share-link mint / revoke / redeem; patient invite acceptance.
  • [x] principal.PermOrganizationsManageShareLinks constant + Clerk webhook-free architecture documented inline.

Tests (shipped):

  • [x] invites_test.go — 5 cases covering staff bind creates membership + marks accepted+consumed; patient bind sets accepted only (consumed_at NULL); revoked invite skipped on bind; expired invite skipped on bind; partial unique index blocks duplicate open invites per (org, email, kind).
  • [x] sharelinks_test.go — 7 cases covering mint persistence + generated code; permission gate denies non-admin; atomic use_count increment; max_uses cap enforced under race; cross-org code rejection; public resolve returns org metadata; revoked code surfaces as 410.
  • [x] auth.InvitationProvider test stub in api_harness.go — consumption paths (bind + redeem) don't reach the identity provider; create-side helpers get a synthetic test_provider_* id.

Open follow-ups (UI):

  • [ ] 1D.3 Portal (next-active). /join/{code} share-link landing page → calls GET /v1/public/share-links/{code} → branded "Join Acme Clinic" pre-auth CTA → Clerk sign-up → onboarding step 2 with share_link_code in body. /onboard page surfaces a "you've been invited to Acme Clinic" banner when a pending patient invite exists for (principalID, currentOrgID). No new backend endpoints required — this is pure portal-app work.
  • [ ] 1D.2 Clinic admin (deferred until the clinic-app refresh — see 1D.2 below). Staff invite list + form + revoke + resend; patient invite list + form + revoke + resend; share-link mint form (tier picker, max_uses, expires_at, note) + list + revoke + copy-code + QR; /welcome landing page (staff invite magic-link redirect target). Backend contract is stable — no Go changes needed when this lands.

1B.13 Patient Impersonation Sessions

Clinic-internal access pattern for staff acting on a patient's behalf — assisted form fill, accessibility help, language barriers, troubleshooting. Lives entirely within a clinic's controllership scope (this is not a controller/processor concern; it's a "make staff actions on patient data reviewable" concern). Lands in foundation so the primitive is locked before any feature consumes it.

Design note. This is a deliberately minimal primitive. The audit + transparency mechanism is the load-bearing part; per-action-type scopes, granular permissions, real-time notification toggles can all be added later if a real product need surfaces. Foundation discipline argues against speculation here — clinics trust their staff (they hired them and granted patients.manage); finer-grain controls add no security on top of that.

Status: backend complete. Schema, RLS WITH-CHECK policies (staff self + patients.manage org-member + target-patient self), set_app_impersonation_session_id GUC + current_app_impersonation_id reader, redefined audit_log_insert carrying impersonation_id, patients.impersonate permission seeded with admin + customer_support grants, full domain (model/errors/repo/service/handler), RequireImpersonation middleware, open/close/list/get endpoints, /v1/me/patient-impersonation-sessions patient self-read, cross-context exclusion guard (one elevated session at a time per principal × org, bidirectional with break-glass), per-principal rate limit, and 14 RLS integration tests all shipped. Only the 1D.2 / 1D.3 UI surfaces remain — those belong in the 1D slice on top.

Decisions locked during implementation:

  • Simple authorship semantics — no data-layer rebind. The original spec's split-author model (forms appearing patient-authored at the data layer + staff actor in audit log) was considered and dropped at design time: every Layer 2+ feature with an "author" column would have to remember coalesce(acting_as_patient_id, current_principal_id), and one missed call site would leak staff names into patient-facing records. Foundation discipline argues against the cross-cutting invariant. Instead: actor = current_app_principal_id() always; the audit row carries impersonation_id linking back; consumers that want "who really did this" follow the link. Clean foundation invariant, zero per-feature glue.
  • Active-session uniqueness (staff_principal, organization). One impersonation at a time per staff member per clinic. Mental model: "I'm currently helping Alice; close that before starting Bob." Partial unique index closed_at IS NULL.
  • Patient access history reads patient_impersonation_sessions, not audit_log. No patient SELECT policy on audit_log (kept staff/forensic-only). Patient sees session metadata via the table's self-read RLS; per-action drill-down deferred to the future patient_account_activity projection (see Deferred Foundation Extensions).
  • AppPool + RLS WITH CHECK (not AdminPool). The opening principal is an authenticated org member with patients.impersonate and full RLS context. Same write-side pattern as consents and organization_invites, NOT the audit_log/notifications/break_glass AdminPool pattern (which exist because their writers don't have an org-scoped principal context). Break-glass's AdminPool design was driven by platform staff lacking tenant membership; impersonation doesn't have that issue.
  • Cross-context exclusion bidirectional. impersonation.Service.Open rejects when the principal already has an active break-glass session for the same (principal × org); breakglass.Service.Open rejects symmetrically. The redefined audit_log_insert reads BOTH GUCs unconditionally (so a future legitimate compounding case writes both columns correctly without another schema change), but the runtime guards prevent the case from arising today.
  • No acting_as_patient_id GUC plumbing. Locked design: simple authorship means no rebind helper. The foundation primitive is bounded by what consumers will actually see — no speculation against an F3/F5 model that we've explicitly chosen not to build.
  • closed_at uses clock_timestamp(), not NOW(). NOW() returns transaction-start; same-tx Open+CloseSelf paths (some test/retry scenarios) would make closed_at < opened_at and violate the CHECK. clock_timestamp() returns wall-clock at statement time.

Schema (shipped in 000013):

  • [x] patient_impersonation_sessions(id, staff_principal_id, target_patient_id, organization_id, reason, opened_at, expires_at, closed_at, closed_by_principal_id). target_patient_id FKs patients(id) (per-org row); organization_id denormalized for RLS efficiency, mirroring patient_subscriptions. reason is free-text with 10-char trimmed-length floor. expires_at enforced ≤ opened_at + 4h by CHECK. Partial unique on (staff_principal_id, organization_id) WHERE closed_at IS NULL; supporting indexes for org-recent / staff-recent / target-recent / active-set / FK-target.
  • [x] audit_log.impersonation_id UUID NULL — reserved in 1A.12 (000001). 000013 redefines audit_log_insert to populate it from the GUC (idempotent CREATE OR REPLACE on top of the 000011 break-glass redefinition; signature unchanged so all existing callers remain valid).
  • [x] Data classification entries for every column registered in data-classification.md → Patient impersonation sessions. Reason ships pii_basic + support_export-only (operator free-text may carry clinical context).

Permissions + middleware (shipped):

  • [x] patients.impersonate permission seeded inline in 000013; granted by default to admin and customer_support system role templates (specialist deliberately excluded — clinical role, not service role). Custom roles can be granted via the role editor (1D.2) once that ships. Go constant: principal.PermPatientsImpersonate.
  • [x] RequireImpersonation(svc, paramName) middleware factory: reads URL {paramName} for the target org id, looks up the active session via Service.ActiveFor, returns 403 impersonation_required (no session) or 410 impersonation_expired (session expired but not closed). Active match binds the session GUC + attaches the session id to context via ImpersonationSessionIDFromContext. No scope argument — single-permission primitive.
  • [x] Open endpoint: POST /v1/organizations/{org_id}/patient-impersonation-sessions — body {patient_id, reason, expires_in_minutes}. Service validates Subject.HasPermission('patients.impersonate'); cross-context exclusion check; partial-unique-index conflict returns existing session. Per-principal rate-limited via RATELIMIT_PATIENT_IMPERSONATION_OPEN_LIMIT (default 5/min). Audit-logged with action_context = 'impersonation' + impersonation_id.
  • [x] Close endpoint: POST /v1/organizations/{org_id}/patient-impersonation-sessions/{id}/close. Self-close path runs through the request tx with the patient_impersonation_update_self RLS policy; manage-close path checks subj.HasPermission('patients.manage') and runs through patient_impersonation_update_manage. Both audit-logged with entity_type = 'patient_impersonation_session'.
  • [x] GET /v1/organizations/{org_id}/patient-impersonation-sessions[?staff_principal_id=&patient_id=&only_active=&limit=&offset=] — RLS-scoped read (staff sees own; org members with patients.manage see all org sessions). GET .../{sessionId} for detail.
  • [x] GET /v1/me/patient-impersonation-sessions[?organization_id=&only_active=&limit=&offset=] — patient access history. RLS self-read on patient_impersonation_sessions cascades through current_human_patient_profile_ids() to span every clinic the patient is at; the cross-org account surface (1D.5) consumes the unfiltered shape.

Audit attribution (shipped):

Tests (shipped, 14 cases):

  • [x] impersonation_test.go — happy-path open + audit attribution; validation errors (5 sub-cases); permission denied for specialist; patient-must-be-at-org boundary; idempotent same-(staff, org) returns existing; "one thing at a time" (second-patient open returns existing session, doesn't create new); lazy expiry finalization; CloseSelf attribution; CloseSelf re-close (already_closed); CloseManaged by org admin holding patients.manage; RLS patient-self-read; RLS other-staff-cannot-see; RLS deny-other-org-patient; cross-context exclusion (impersonation → break-glass blocked, and break-glass → impersonation blocked); end-to-end middleware gating + audit-attribution validation.

Open follow-ups (UI):

  • [ ] 1D.2 Clinic admin oversight — per-org "Staff impersonation oversight" view in 1D.2 — list of all sessions across the clinic, filterable by staff member, patient, date range. Same DataTable foundation as the audit log viewer (1D.4). Gated by patients.manage (already exists). Per-patient impersonation history shown alongside the per-patient consents view. Backend contract is stable — no Go changes needed when this lands.
  • [ ] 1D.3 Patient access history — per-clinic list of staff impersonation sessions on this patient, served via GET /v1/me/patient-impersonation-sessions. Shows who opened it, when, the reason, duration. Foundation-tier scope is session metadata only; per-action drill-down ("what entities they touched") is deferred to the patient_account_activity projection (Deferred Foundation Extensions).

Differences from break-glass (deliberate):

Break-glass (1B.11)Patient impersonation (1B.13)
Who initiatesPlatform staffClinic staff
Whose data they accessCross-tenant patient data (any clinic)One specific patient at their own clinic
Audit action_context'break_glass''impersonation'
Linking column on audit_logbreak_glass_idimpersonation_id
Who gets notifiedClinic admin (always-on email)Patient (always-recorded in access history; no real-time email in v1)
Permission grantsPer-scope (break_glass.patient_list, audit_full, etc.)Single permission (patients.impersonate)
Cross-tenant?Yes — explicit cross-tenant primitiveNo — single-clinic primitive
Controllership concern?Yes — processor boundaryNo — within clinic's controllership
Writer poolAdminPool (REVOKE on AppPool)AppPool + RLS WITH CHECK
Authorship semanticsN/A (platform staff only access; doesn't write tenant data routinely)Simple — actor = current_principal_id() always

Why break-glass kept granular permissions but impersonation didn't. Break-glass crosses the platform↔clinic trust boundary; clinics genuinely care about "your support engineers can ONLY view audit logs, never patient detail" and procurement reviewers ask about it. Impersonation lives inside one clinic's trust boundary — the clinic already gave the staff member patients.manage; layering finer impersonation-scope permissions on top of that adds no security.

1B.14 Locations & Multi-Site Support

A clinic (organization) may operate at one or more physical locations. Locations are a logistics layer on top of org-scoped tenancy — they partition appointments, schedules, and availability operationally without fragmenting permissions, consents, or patient identity. Lands in foundation because every clinical entity that ships in Layer 2 (specialists, calendars, appointments) needs location_id from day one; retrofitting is a cross-cutting backfill the foundation discipline rule exists to prevent.

Design note. This is intentionally minimal. Org stays the trust boundary, entitlements stay org-wide, RBAC stays org-wide ("all staff see all locations"). The only thing locations partition is physical operations — where a specialist physically is at a given moment, where an appointment happens. See patterns.md P40 for the full pattern and the deliberate non-goals.

Status: backend complete. Schema (000014), RLS policies (SELECT for org members + INSERT/UPDATE/DELETE WITH CHECK gated by current_app_has_permission('locations','manage')), locations.manage permission seeded with admin-only grant, full domain (model/errors/repo/service/handler), routes mounted under per-org group with RequireURLOrgMatchesScope("id") (P47) + RequirePermission on mutations, and 13 RLS integration tests all shipped. Only the 1D.1 / 1D.2 UI surfaces remain — those belong in the unified UI pass per CLAUDE.md "UI deferred until foundation locked" stance.

Decisions locked during implementation:

  • closed is terminal. Service rejects any transition out of closed (active/inactive → 409 closed_terminal). Re-opening "the same place" later means creating a new row — preserves audit trail clarity. The inactive status covers the temporary "renovation, lease pending, seasonal closure" case where the site will resume operations. CHECK constraint chk_locations_closed_at_consistency pins closed_at non-NULL iff status='closed' as a structural backstop.
  • Slug is mutable, normalised at the service layer. Unlike organizations.slug (which lives in DNS hostnames and is immutable), location slugs only appear in deep paths like /locations/main-floor. Renaming costs at most a 404 on a stale bookmark; FKs use UUIDs. The validator auto-lowercases + trims (matching the org-domain normalisation pattern at organization/service.go:96); the regex enforces ^[a-z0-9]+(-[a-z0-9]+)*$ (no leading/trailing/double hyphens, no underscores, no dots).
  • Country as free TEXT, no ISO 3166-1 enforcement at this layer. Romania-launch context makes ISO-2 tempting, but the constraint can be added later non-breakingly when a UI form renders a country picker. Service-layer normalisation only trims whitespace.
  • AppPool only — no admin-pool surface. Locations are entirely org-scoped: there is no cross-tenant or pre-membership write path comparable to patients.CreateAdmin (portal onboarding) or breakglass. The repository skips the dual-pool plumbing.
  • DELETE exposed but discouraged. The route + RLS allow hard-delete for the rare "created in error, never used" case. The canonical retire-a-location flow is PATCH ... {status: "closed"}. Once Layer 2 ships and FKs reference locations(id) (specialist_locations / calendars / appointments), DELETE will RESTRICT naturally on dependent rows — that's the intended steady-state behavior.

Schema (shipped in 000014):

  • [x] locations(id, organization_id, slug, name, timezone NULL, phone NULL, email NULL, address_line1 NULL, address_line2 NULL, city NULL, county NULL, postal_code NULL, country NULL, status DEFAULT 'active', closed_at NULL, created_at, updated_at). status IN ('active', 'inactive', 'closed') enforced by CHECK. Address fields structured (never freeform). Unique (organization_id, slug) via uq_locations_org_slug. Standard set_updated_at trigger.
  • [x] organization_settings.default_timezone — already shipped in 000003 with explicit reference to 1B.14's P23 chain. Resolution chain (location.timezone → specialist.scheduling_timezone → org.default_timezone → platform default) closes layer 3 here; layer 2 lights up with F4.
  • [x] Address class registration. locations.address_line1/2/city/county/postal_code/country registered as pii_basic with support_export egress in data-classification.md. locations.timezone/phone/email/status registered as org_internal. Public-face fields (name, slug) registered as public. No bulk_export egress — locations are clinic operational data, not patient-export data.
  • [x] Indexes. idx_locations_org ON locations(organization_id); partial idx_locations_active ON locations(organization_id) WHERE status = 'active' (booking flows + specialist availability pickers only ever care about active).

Permissions + middleware (shipped):

  • [x] locations.manage permission seeded inline in 000014; granted by default to admin system role template only (specialist + customer_support deliberately excluded). Custom roles can be granted via the role editor (1D.2) once that ships. Go constant: principal.PermLocationsManage. TS mirror: PERM_LOCATIONS_MANAGE in packages/api-client/src/permissions.ts.
  • [x] Routes: GET /v1/organizations/{id}/locations[?status=&page=&limit=], POST .../locations, GET .../locations/{locationId}, PATCH .../locations/{locationId}, DELETE .../locations/{locationId}. All mounted under the per-org route group with RequireURLOrgMatchesScope("id") (P47) + RequirePermission(PermLocationsManage) on mutations. List + Get inherit the org membership SELECT policy.

RLS (shipped):

  • [x] locations policy: SELECT for org members (organization_id = current_app_org_id()). INSERT/UPDATE/DELETE gated by current_app_has_permission('locations', 'manage') in WITH CHECK. No per-location RLS helper (current_app_location_ids() deliberately not added) — staff see all locations within their org. Per-location scoping is a future ADR if a customer requires it.

Tests (shipped, 13 cases):

  • [x] locations_test.go — happy-path Create + audit-attribution; permission-denied for specialist (service-layer); RLS deny INSERT for specialist (repo-direct, defense in depth at DB); slug uniqueness per org; slug collision across orgs allowed; closed is terminal (active→closed→active blocked); active↔inactive roundtrip with closed_at staying NULL; RLS cross-org SELECT denied; List scoped to org + status filter (cross-org bleed prevented); invalid slug shapes rejected (8 sub-cases); Delete by admin happy path; PATCH no-op detected as before == after (handler skips audit row); PATCH explicitly clearing optional column ({phone: null}) sets the DB column to NULL.

Forward-binding (no schema yet — locks the contract for Layer 2):

  • [ ] When specialists ships (F4), it adds specialists.scheduling_timezone TEXT NULL (IANA fallback layer 2 in P23).
  • [ ] When specialists ships, it adds the join table specialist_locations(specialist_id, location_id, created_at) — many-to-many with composite PK.
  • [ ] When specialist_weekly_hours ships (F4), it adds location_id UUID NULL FK → locations(id) — NULL means remote/telerehab availability.
  • [ ] When specialist_schedule_overrides ships (F4), same convention — location_id UUID NULL.
  • [ ] When calendars ships (F4), it adds location_id UUID NULL FK → locations(id) — NULL means org-level virtual (telerehab) calendar.
  • [ ] When appointments ships (F5), it adds location_id UUID NULL FK → locations(id) — NULL means remote / telerehab session.
  • [ ] Single-true-availability invariant (enforced when specialist_weekly_hours ships): DB-level EXCLUDE USING gist constraint on (specialist_id, day_of_week, time-range) so no two windows for the same specialist may overlap, regardless of location_id. A specialist physically cannot be in two places at once. Same constraint shape on specialist_schedule_overrides. Locations partition the labels on availability, never the availability itself.

Open follow-ups (UI):

  • [ ] 1D.1 Console org-detail page — locations list per org with CRUD (gated by locations.manage). Same DataTable foundation as members / domains. Backend contract is stable — no Go changes needed when this lands. End-to-end wiring is verified by the 11 RLS integration tests above; the UI consumer was deferred to the unified UI pass per CLAUDE.md "UI deferred until foundation locked" stance.
  • [ ] 1D.2 Clinic admin "Locations" page — same CRUD scoped to the admin's own org. Same DataTable foundation as members / domains / roles.
  • [ ] Patient Portal does not surface locations until F4 booking ships — there is nothing useful to show in foundation (no calendars, no appointments).

Org with zero locations — a pure-telerehab clinic operates with no locations rows. UI skips the location picker; appointment creation accepts location_id = NULL. No "Virtual" placeholder row is ever auto-created.

Org with one location — UI auto-picks the only active location at booking time; no picker shown. Schema-wise indistinguishable from the multi-location case.

Deliberate non-goals (recorded so they don't creep in):

  • No per-location entitlements (entitlements stay org-wide on organization_entitlements).
  • No per-location billing or pricing.
  • No services_per_location table — service catalog stays org-wide.
  • No inter-location transfer workflow — point the next appointment at the other location.
  • No room / facility booking. Telerehab is mostly remote; rooms are a future concern if a clinic asks.
  • No current_app_location_ids() RLS helper — org-scoping is the only RLS dimension.

1C. Capabilities, Integrations & Metering

Cross-cutting infrastructure for how the platform composes capabilities, reaches external systems, and measures consumption. Locks the conventions every feature consumes from Layer 2 onwards. The taxonomy used here lives in glossary.md.

Why a new sub-phase. 1A laid the runtime foundations. 1B locked the identity, tenancy, and entitlement primitives. 1C connects them outward: the capability convention so every feature builds on stable interfaces, the four integration categories so every external touchpoint has a designated home, the metering seam so every metered call is captured before billing exists, and the AI hookup so the platform's stated agent direction has a load-bearing path. Without 1C, Layer 2 features would each invent their own external-call shape and we'd retrofit through every one — exactly what foundation discipline exists to prevent.

Scope discipline. Each sub-phase ships the primitive and (where relevant) one Day-1 consumer. Subsequent consumers ride the same primitive without schema or contract change. Foundation discipline applies — primitive design must be correct now; mechanical extension is fine later. See glossary.md for category definitions (Cat A Curated Provider, Cat B Connected Account, Cat C Outbound Webhook Subscription, Cat D Inbound Webhook, Cat E Internal Event, Cat F External API Access via service-account auth — operational flow deferred to its own sub-phase, schema lives in service_accounts from 1B.1).

Build order inside 1C.

1C.1 Capability Framework               (foundational — every other 1C sub-phase consumes the convention)

1C.2 Curated Providers (Cat A)          (extends 1A.18's Channel pattern; provider-resolution table for per-tenant brand-isolation readiness)
1C.3 Internal Events Registry (Cat E)   (parallel — typed registry over 1A.9 events.Bus)

1C.4 Outbound Webhook Subscriptions     (Cat C — Day-1 Make.com consumer)
1C.5 Connected Accounts (Cat B)         (parallel — table + framework; OAuth deferred to first OAuth consumer)
1C.6 Inbound Webhook Framework (Cat D)  (parallel — convention; Daily.co recording handler is the first impl)

1C.7 Metering & Quotas                  (depends on 1C.2 — the seam where capability calls get measured + capped)
1C.8 AI Capability Hooks                (depends on 1C.2 + 1C.7 — special-case Curated Provider with mandatory metering + provenance)

1C.9 Entitlements Rename                (mechanical sweep — independent but cleans the vocabulary used by all of 1C)

1C.1 Capability Framework

Convention for declaring internal capabilities (named for what they do, not for the provider that implements them) and composing the standard cross-cutting concerns (permission, quota, provider resolution, audit, metering, error classification) around them via wrappers. Every capability ships with the right concerns for free, picked from a small menu of templates.

Status: design locked 2026-05-06; skeleton shipped 2026-05-06 — internal/core/capabilities with the four wrap helpers, sentinel error taxonomy, README + tests, billing capability skeletons (payment / invoicing / patient_payment / payout), cmd/check-capabilities CI guard wired into make check, P50 docs, notify async-outbox exception documented. Resolve / meter / audit hooks remain pluggable seams that 1C.2, 1C.7, and 1C.8 fill in; second real-world consumer rides on 1C.4 (webhook.Deliverer).

What "capability" means here. See glossary.md → Capability. One Go interface, one bounded responsibility, switchable implementation. Examples: email.Channel (shipped, 1A.18), video.Provider (Daily.co today), pdf.Renderer (internal library), ai.LLM (future), webhook.Deliverer (1C.4), calendar.Sync (Cat B / future), sms.Channel (future), payment.Provider + invoicing.Provider (declared at foundation; implementations in F12 — Stripe + FGO for Romania), patient_payment.Provider + clinic_payout.Provider (declared at foundation; implementations in the future marketplace mediation feature — Stripe Connect or equivalent).

Locked decisions:

  • Composition shape: functional with one helper per implementation-strategy category. Four templates: capabilities.WrapMeteredProvider(impl, name, perm) for metered Curated Provider / Cat A (email, SMS, video, AI text gen — every call costs the platform money and the org is metered for it); capabilities.WrapProvider(impl, name, perm) for unmetered Curated Provider / Cat A (auth, storage — usage cost is bundled in platform fees); capabilities.WrapOutbound(impl, name) for Outbound Webhook / Cat C delivery (no quota meter, no provider resolution, permission lives at subscription create time not delivery time); capabilities.WrapInternal(impl) for Internal Library (mostly a no-op forwarder; permission and audit happen at the calling layer above). Functional composition (vs. struct decorators) chosen because the stack is fixed per category — no runtime customization needed; decorators speculate against unknown future flexibility.

  • Provider selection: per-call resolver for Cat A, wiring-time for everything else. Cat A capabilities go through 1C.2's platform_service_providers resolver at every call (logically per-call, physically cached aggressively per (org_id, capability) with ~5min TTL invalidating on the row's updated_at). The cache-aside pattern from P45 applies — first call per (org, capability) per Core API instance pays the lookup; subsequent calls within TTL pay nothing. Per-call (not startup-only) is required for multi-tenant + per-tenant brand-isolation readiness — startup-only would force one org's provider for the whole process. Non-Cat A capabilities (Internal Library, Cat C outbound) wire their impl at startup in cmd/api/main.go.

  • Test-double convention: Fake{Capability} in same package as interface. Standardized hand-written fake (1A.18's FakeChannel is the canonical reference). Tests inject the fake at construction time; assertions read accumulated state via accessor methods (f.Sends() etc.). Mocks (gomock) explicitly rejected — verbose, brittle, refactor-fragile. Real-service test layer (smoke / E2E against real SES / Daily.co / Anthropic) is per-capability strategy decided when that capability ships; the foundation pattern is the FakeChannel-based integration test from 1A.18, and the real-service smoke layer lands as part of 1E AWS staging.

  • notify.Channel (1A.18) audit + dispatcher exception. notify.Channel interface and notify.FakeChannel already conform to the codified convention — keep as-is. The notification dispatcher itself is a documented EXCEPTION because it's an async outbox pattern, not a synchronous capability call (worker polls deliveries, ships at its own pace, dispatches across multiple channel adapters). Document the exception in patterns.md alongside the convention. Real rewires for notify.email (env-config → resolver migration in 1C.2; metering hookup in 1C.7) land in those sub-phases' PRs, not in 1C.1.

  • Wrapper ordering (locked once for every Cat A metered capability): permission → quota → resolve → audit (before provider call) → provider call → meter (after success only) → error classification wraps the whole chain. Audit-before means failed calls are auditable with status code; meter-after-success means failures don't burn quota. Matches the existing 1B.5 four-gate model + 1A.1 audit + 1B.5 quota semantics.

  • Principal-type-agnostic. The wrapper stack treats all principal types the same — humans, agents, service_accounts (Cat F), and the system principal all flow through identical wrappers. Audit attribution carries actor_id + actor_type; permissions / quotas / metering operate on org-scope without consulting actor type. This property is essential — accidentally hardcoding "human" assumptions in any wrapper would break Cat F service-account API calls and autonomous-agent calls when those flows light up. See glossary → Principal-type-agnostic primitive.

Wrapper-content matrix by category:

CategoryPermissionQuotaResolveProvider callMeterAuditErrors
Cat A metered (email, SMS, video, AI text gen)
Cat A unmetered (auth via Clerk, storage via S3)
Cat C outbound (webhook delivery)
Internal Library (pdf, signing, encryption)

Implementation order inside 1C.1:

  • [x] internal/core/capabilities/ package with the four Wrap* helpers as functional composition. Skeleton at 1C.1 close — permission and quota gates wire against existing principal.Subject helpers; resolveProvider is a no-op forwarder until 1C.2 registers a resolver via capabilities.SetResolveFunc; meterAfterSuccess is a no-op forwarder until 1C.7 registers a meter via capabilities.SetMeterFunc; auditCall fires per-capability AuditFunc hook (nil = no-op). Sentinel error taxonomy (ErrUnauthenticated, ErrPermissionDenied, ErrQuotaExceeded, ErrProviderUnavailable, ErrTransient, ErrPermanent) is fully wired at 1C.1.
  • [x] Fake{Capability} documentation as internal/core/capabilities/README.md with notify.FakeChannel as the canonical reference shape, plus a Cat A wiring example (ai.text shape) and the multi-method capability section (payment.Provider shape).
  • [x] Audited notify.Channel against the codified convention. Notify keeps its current shape; dispatcher async-outbox exception documented in notify/doc.go with rationale (producer-side gates, consumer-side state machine, no per-call audit row, future meter hooks the dispatcher's success path directly).
  • [x] patterns.md P50 — Capability Convention documenting: the four implementation-strategy categories with examples, the four wrap helpers, the locked wrapper-stack ordering (permission → quota → resolve → audit → provider → meter → errors), the principal-type-agnostic property, the Fake{Capability} test-double convention, and the load-bearing notify carve-out.
  • [x] CI guard cmd/check-capabilities wired into make check. Verifies (1) the four wrap helpers exist in internal/core/capabilities/capabilities.go; (2) every package under internal/core/{name}/ that looks like a capability (one-method interface + Fake* struct) is either wired through one of the wrap helpers OR allow-listed with a documented rationale (notify is the only allow-list entry today). Loose by design at 1C.1; 1C.2 tightens to per-package matching as Cat A capabilities migrate to the resolver.
  • [x] Billing capability skeletons declared at foundation: internal/core/billing/payment/ (Provider interface with CreateCustomer, CreateSubscription, Charge, Refund, HandleWebhook); internal/core/billing/invoicing/ (Provider interface with IssueInvoice, DeliverInvoice, RegisterWithAuthority); internal/core/billing/patient_payment/ (Provider interface — Charge, Refund, HandleWebhook for marketplace mediation); internal/core/billing/payout/ (Provider interface — ConnectAccount, InitiatePayout, GetBalance). All skeletons; no implementations. F12 ships payment.Provider + invoicing.Provider impls; marketplace mediation feature ships patient_payment.Provider + payout.Provider impls.

Acceptance:

  • [x] Composition shape locked: functional with four category helpers.
  • [x] Provider selection locked: per-call resolver for Cat A; wiring-time for non-Cat A.
  • [x] Test-double convention locked: Fake{Capability} in same package as interface.
  • [x] Wrapper ordering locked: permission → quota → resolve → audit → provider → meter → errors.
  • [x] notify.Channel audit conclusion: keeps current shape; dispatcher documented as async-outbox exception.
  • [x] internal/core/capabilities/ package with four Wrap* helpers (skeleton form — fills out as 1C.2 / 1C.7 ship).
  • [x] patterns.md P50 documentation.
  • [x] cmd/check-capabilities CI guard.
  • [ ] One additional capability (likely webhook.Deliverer from 1C.4) ships against the convention to prove it with a second consumer. Deferred to 1C.4 — synthetic Cat A consumer in capabilities_test.go exercises the wrap stack today; the second real-world consumer lands when webhook.Deliverer ships.

1C.2 Curated Providers (Cat A) + Provider Resolution

Cat A is the implementation strategy where a capability calls an external API with platform-owned credentials. The provider name (Daily.co, SES, Twilio, Anthropic, Stripe) is implementation detail; the capability interface is what callers see. The platform_service_providers resolution table holds platform-default credentials and per-org brand-isolation overrides; per-call resolver with aggressive caching is the runtime path.

Status: design locked 2026-05-06; shipped 2026-05-06 — migration 000015_platform_service_providers + internal/core/providers (resolver, register, bootstrap, healthcheck) + cmd/check-providers (cron) + cmd/check-cata-resolution (CI guard, enforcing) + Console superadmin endpoints under /v1/admin/platform-service-providers + apps/docs/reference/credential-rotation.md + acceptance test in internal/test/rlstest/provider_resolver_test.go (full lifecycle: default→override→fail-loud→healthcheck→repair). All three current Cat A capabilities (notify.email, S3, auth.clerk) wired through the resolver: email per-call, storage + auth startup-only-resolved + singleton-installed per package SDK constraints.

Why a provider-resolution table at foundation, not when the first clinic asks for per-tenant brand isolation. Foundation work designs for stated platform direction. Per-tenant brand isolation IS platform direction (CLAUDE.md "two tenancy modes" + per-tenant Cat A overrides on either mode). The first paying clinic may demand brand isolation, requiring isolated platform-managed accounts (separate sender domain, separate video account, possibly separate AI account). Without the resolution table at foundation, every Cat A call site needs retrofit later. With it, the override path exists and the table seeds with platform-default rows; per-org override rows get added later without schema change.

Locked decisions:

  • Initial scope: migrate all three current Cat A capabilities at 1C.2 closenotify.email (1A.18 SES), internal/integration/s3/ (1A.8 storage), and auth.clerk (auth verifier abstraction is already provider-agnostic, just swap credentials source from env to resolver). Same migration shape for each (~10–50 lines per capability). All future Cat A capabilities (Twilio SMS, Daily.co video, Anthropic AI, Stripe payments, future providers) wire through the resolver from Day 1 — never read env directly.
  • Failure mode: fail loud on broken per-org override. If a clinic's brand-isolation override row exists but credentials don't decrypt or are rejected by the provider, the call fails with 502 provider_unavailable rather than silently falling back to the platform default. Silent fallback would break the brand-isolation contract (clinic thinks they're sending from their domain; actually goes from platform default). The healthcheck cron is the early-warning system that catches broken rows before traffic hits.
  • Healthcheck: cmd/check-providers cron. Runs at deploy time + every 5 min in staging / every 1 min in prod (configurable). Walks every row in platform_service_providers, decrypts credentials via 1A.3, optionally pings the provider with a no-op (SES GetSendQuota, S3 HeadBucket, etc.). Marks broken rows as status='error' with last_error_at + last_error. The transition activeerror is audit-logged (state change attributed to system principal).
  • Audit treatment. State changes to platform_service_providers (CREATE / UPDATE / DELETE via Console superadmin endpoints) audit at the app layer with full diff; healthcheck transitions audit; runtime resolution lookups DO NOT audit (operational metadata, would generate millions of rows/day per CLAUDE.md exempt rule). Runtime failures slog at ERROR level with org_id, capability, provider_name, error_class, request_id for tracing via the telemetry sink. App-layer audit only — no DB-trigger audit backstop at foundation (AdminPool REVOKE + superadmin gating is the access control floor; defense-in-depth via DB triggers can be added later if compliance requires).
  • Rotation runbook. Documented per-provider in apps/docs/reference/credential-rotation.md (new doc). Standard flow: (1) generate new credentials at provider; (2) superadmin updates the row in Console (server encrypts, bumps updated_at); (3) wait ~10 min for cache TTL to expire across the fleet; (4) revoke old credentials at provider. Zero downtime achievable when the provider supports two valid credential sets simultaneously (AWS IAM, Stripe, Twilio, SES — all do). Foundation 1C.2 ships the runbook for SES/email; subsequent providers add their section as they ship.

Schema (locked):

  • [x] platform_service_providers(id UUID PK, capability TEXT NOT NULL, organization_id UUID NULL FK → organizations(id), provider_name TEXT NOT NULL, credentials_encrypted BYTEA NOT NULL, config JSONB NOT NULL DEFAULT '{}', status TEXT NOT NULL DEFAULT 'active' CHECK (status IN ('active', 'inactive', 'error')), last_error_at TIMESTAMPTZ NULL, last_error TEXT NULL, last_health_check_at TIMESTAMPTZ NULL, created_at, updated_at). CHECK constraint on provider_name per capability (today: email→ses, storage→aws_s3, auth→clerk; extends per migration as new providers ship). No is_default column — derived from organization_id IS NULL. Partial unique (capability) WHERE organization_id IS NULL for one platform default per capability. Partial unique (capability, organization_id) WHERE organization_id IS NOT NULL for one override per (capability, org). CHECK enforcing (status='error') ⇔ (last_error_at + last_error populated).
  • [x] RLS: AdminPool only — RLS enabled with no SELECT policy + REVOKE SELECT/INSERT/UPDATE/DELETE/TRUNCATE FROM restartix_app (double-deny mirroring audit_log, notifications). Console superadmin endpoints write through AdminPool.
  • [x] Data classification entries shipped: credentials_encryptedauth_secret (no egress); provider_name / capability / status / last_error* / last_health_check_at / configorg_internal with support_export; id / organization_id / created_at / updated_atsystem_metadata with support_export.
  • [x] Platform permission principal.PlatformPermProvidersManage ("providers.manage") shipped — Go-only constant, superadmin-only by default (no per-org RBAC row); the platform-permissions layer is the gate.

Note on organization_billing.payment_provider. The plan also called for dropping an enum constraint on this column. Inspection of 000003_org_settings.up.sql showed the column is already plain TEXT with no CHECK constraint (the comment lists 'manual' | 'stripe' | 'chargebee' as forward-compat hints, not as an enforced enum). No migration change needed; F12 will replace this whole shape when the billing engine ships.

Capability-resolution flow (runtime):

Cat A capability call (e.g., email.Channel.Send)

WrapMeteredProvider / WrapProvider helper (1C.1)

Resolver.Resolve(ctx, capability='email', org=ctx.OrgID)

Cache lookup (key: (capability, org_id), TTL ~5min)
  ├─ Cache hit → return cached impl  (~µs)
  └─ Cache miss

       SELECT FROM platform_service_providers
         WHERE capability = $1
           AND (organization_id = $2 OR organization_id IS NULL)
           AND status = 'active'
         ORDER BY organization_id NULLS LAST  -- prefer org-specific
         LIMIT 1

       Row found?
         No  → 502 provider_unavailable (no platform default!)
         Yes → If org-specific row's status='error' → 502 (fail loud per locked decision)
              → Decrypt credentials_encrypted via 1A.3
              → Instantiate provider impl (e.g., ses.NewClient(creds, config))
              → Cache (impl, snapshot of updated_at)
              → Return impl

Cache invalidation:
  Pull-based — at lookup, query SELECT updated_at FROM platform_service_providers WHERE id = $cached_id;
  if updated_at != cached snapshot → cache miss path.
  Cheap (single indexed column read).

Implementation order inside 1C.2:

  • [x] Migration creating platform_service_providers with RLS + data-classification entries (000015_platform_service_providers). The platform permission lives in Go (principal.PlatformPermProvidersManage), not in the migration. The organization_billing.payment_provider "drop the enum constraint" item turned out to be a no-op — the column was already plain TEXT (see the schema note above).
  • [x] internal/core/providers/ package: Resolver + generic Register[T] typed lookup + per-instance TTL cache + Bootstrap env-seed helper + Healthcheck / HealthcheckAll primitives. cmd/check-providers cron binary uses the same primitives.
  • [x] Migrate three existing Cat A capabilities to use the resolver: notify.email (per-call lookup via email.Lookup returned from providers.Register[*EmailProvider]); internal/integration/s3/ (factory + s3.UseProvider singleton install at startup; per-call refactor deferred to first S3 consumer's PR); auth.clerk (factory + clerk.UseProvider calls SDK's process-global SetKey; per-call non-applicable — auth verification runs before org context). cmd/api/main.go bootstraps platform-default rows from env via providers.Bootstrap (idempotent ON CONFLICT DO NOTHING) so behavior is identical post-migration; env vars become non-load-bearing once the row exists.
  • [x] Console superadmin endpoints (mounted under /v1/admin/platform-service-providers): GET / (list, optional ?capability=), POST / (create), PATCH /{id} (update), DELETE /{id} (hard delete), POST /{id}/test (on-demand healthcheck). Service holds *providers.Resolver and calls Invalidate after every mutation. Audit shipped with credentials field redacted. Console UI deferred to 1D.
  • [x] P31 — already scoped to Cat B with a Cat A carve-out pointing at 1C.2.
  • [x] CI guard cmd/check-cata-resolution: walks the three Cat A package dirs, flags references to credential-bearing env-config field names. Wired into make check; enforcing at 1C.2 close (allow-list empty).
  • [x] apps/docs/reference/credential-rotation.md shipped — full SES/email runbook, stub sections for storage/auth + future providers.
  • [x] Acceptance test extending the setup-clinic suite: internal/test/rlstest/provider_resolver_test.go exercises full lifecycle (default → override → fail-loud on inactive/corrupt override → healthcheck flips to error → repair → healthy).

Acceptance:

  • [x] Initial scope locked: all three current Cat A capabilities (email, storage, auth) migrate at 1C.2 close; future Cat A wires through resolver from Day 1.
  • [x] Failure mode locked: fail loud on broken per-org override; healthcheck cron is the early-warning safety net.
  • [x] Audit treatment locked: state changes audit, runtime lookups don't, healthcheck transitions audit, runtime failures slog ERROR with full attribution.
  • [x] Rotation runbook locked: generate → update → ~10min cache window → revoke; documented in credential-rotation.md.
  • [x] Schema locked: platform_service_providers with NULL org for default + specific org for override; partial uniques; status enum; provider_name TEXT with per-capability CHECK.
  • [x] Migration + RLS shipped.
  • [x] Resolver package + cache + healthcheck primitives + cmd/check-providers binary shipped.
  • [x] Three existing Cat A capabilities migrated; platform-default rows bootstrap from env at startup.
  • [x] Console superadmin endpoints shipped (UI deferred).
  • [x] CI guard (enforcing) + credential-rotation doc + acceptance test shipped.

1C.3 Internal Events Registry (Cat E)

1A.9 already ships the in-process events.Bus. 1C.3 adds a typed registry that is the single source of truth for event types and their payload schemas. Webhook subscriptions, automation triggers, and notification dispatcher all reference one registry — preventing the drift the audit found (three documents describing the same stream).

Status: design locked 2026-05-06; shipped 2026-05-07 — internal/core/events/registry.go with EventDef (Name + ResourceType + Description + Layer + typed Payload + DeprecatedAt + ReplacedBy), Register / Lookup / All, JSONSchemaOf reflection-based schema generator, and PublishWith typed-payload publish helper that validates payload type against the registry. Per-domain events.go files for organization (8 events) and portalonboarding (1 event) declare typed payload structs and register via init(); the existing 9 publish sites migrated to PublishWith. cmd/dump-events-registry emits JSON or Markdown; cmd/check-events-registry replaces the older cmd/check-events and validates (a) every events.Type constant has a matching Register call and (b) the committed _generated/events-catalog.md is in sync with the registry. The catalog is auto-generated at apps/docs/architecture/_generated/events-catalog.md, included into the architecture events doc via VitePress <!--@include: -->. P51 documented in patterns.md. Hand-edited rows for Layer 2+ events that have no publisher yet were dropped — the registry is the source of truth and grows feature-by-feature.

Locked decisions:

  • Payload schema location: Go struct as source of truth + JSON schema generated. Each domain package declares its events as Go structs (already done implicitly today). cmd/gen-event-schemas derives JSON schemas via reflection + struct tags for: (a) the future automation engine UI (F8) which needs JSON schema to render trigger configurators, (b) the webhook docs which auto-render payload shape per event, (c) any external consumer that needs a typed contract. One source of truth + codegen mirrors the existing OpenAPI pattern.
  • Retired events: keep in registry with deprecated_at + replaced_by. Events are public contract for clinic webhook subscribers and automation triggers — can't silently break. A deprecated event keeps publishing during a grace period; the registry entry surfaces "deprecated" in webhook UI and docs; clinics with subscriptions on the old event get a notice to migrate. After grace period, the registry entry stays (history) but the publish call is removed and the event no longer fires.
  • Registry has NO per-event fan-out controls. Earlier draft proposed per-event audit/webhook/notification toggles; over-engineered against actual needs. Subscribers decide what they consume, not events. Audit subscriber consumes everything (universal sink). Notification dispatcher consumes events with a registered notification handler (1A.18's notify.categories map is already the subscription mechanism). Webhook dispatcher (1C.4) consumes events matching each subscription's event_filters array (per-subscription filtering). Automations engine (F8) consumes events its rules subscribe to. Each subscriber owns its own consumption logic. Registry per-event is just {Name, PayloadType, Classification (doc/UX hint only), DeprecatedAt, ReplacedBy}.
  • Distributed ownership with central registration. Each domain package (appointments, patients, consents, invites, breakglass, impersonation, legaldocument, ...) declares its events in events.go and registers them via init(). The events package becomes a thin coordinator that auto-discovers via init-time registration. Single source of truth for "what events exist" = grep all domain packages OR run cmd/dump-events-registry. Matches the existing platform pattern (each domain owns its model.go, repository.go, errors.go).
  • Docs auto-generate from the registry. cmd/dump-events-registry emits JSON; VitePress build pipeline calls it; webhook events docs + automation trigger docs render from the dump (no hand-edited lists). The "registry IS the spec" delivery — drift between code and docs becomes mechanically impossible.

Reusable pattern for future code-first registries:

This pattern (Go-side init-registration + cmd/dump-{name}-registry + docs auto-gen) is documented in patterns.md as P51 — Code-first registries with generated documentation. Adopt for future cases where a small set of values has multiple documentation consumers and is naturally defined in code. Likely future adopters: per-org permissions catalog (currently hand-written, drift risk). Don't preemptively extend to permissions in this PR — that's foundation discipline (don't speculate; adopt when drift surfaces).

Implementation order inside 1C.3:

  • [x] internal/core/events/registry.go + schema.goEventDef struct + Register + Lookup + All + PublishWith (typed-payload publish that validates against the registry then round-trips to map[string]any) + JSONSchemaOf (reflection-based schema gen with uuid / date-time format hints). Tests cover registration, lookup, sort order, duplicate / conflict semantics, publish round-trip, type-mismatch rejection, deprecation metadata, and schema generation across mixed-type structs.
  • [x] Per-domain events.go files for the two domains that publish today: organization (8 events — created / updated / member_added / member_role_changed / member_removed / domain.added / domain.verified / domain.removed) and portalonboarding (1 event — patient.onboarded). Each event has a typed payload struct (OrgCreatedPayload, OrgMemberAddedPayload, …) registered via init(). The 9 existing publish sites migrated to PublishWith(eventXxx, orgID, resourceID, typed payload). Domains with no publisher today (consents, invites, breakglass, impersonation, legaldocument, patient_subscriptions, service_plans) get events.go when they wire up publishing — foundation discipline (no speculation; the registry catalogs intent + reality, not aspiration).
  • [x] cmd/dump-events-registry binary — -format=json emits the full catalog with payload JSON schemas; -format=md emits a layered Markdown table for embedding via VitePress include. Folds the optional cmd/gen-event-schemas into JSONSchemaOf per the design's "could be folded" caveat.
  • [x] cmd/check-events-registry — replaces the older cmd/check-events (deleted). Validates (1) every events.Type constant has a matching events.Register call, (2) the committed _generated/events-catalog.md matches the freshly-generated form, (3) no events.Publish(...) / events.PublishWith(...) call uses a string-literal name. Wired into make check.
  • [x] VitePress integration: apps/docs/architecture/_generated/events-catalog.md is the auto-generated artifact, included from apps/docs/architecture/events.md via <!--@include: -->. Regenerated by make events-docs from the repo root (or services/api). Hand-edited rows for Layer 2+ events without publishers were deleted — registry is the source of truth, grows feature-by-feature. Webhook docs (1C.4) and automation trigger docs (F8) will consume the same dump when they ship; nothing to wire there at 1C.3 close.
  • [x] apps/docs/architecture/patterns.mdP51: Code-First Registries with Generated Documentation added under a new "Capability & Integration Architecture" group, with index entry. Cross-references P28 (events) and P39 (column classification — same registry-with-CI-guard discipline that predated this pattern).
  • [x] Acceptance test: internal/core/events/acceptance_test.go exercises the full end-to-end flow — synthetic payload struct → Register → Lookup round-trip → JSONSchemaOf reflects the typed shape → PublishWith validates type + round-trips through bus → subscriber sees Data with omitempty respected → All() includes the entry.

Acceptance:

  • [x] Payload schema location locked: Go-truth + JSON-schema generated.
  • [x] Retired event handling locked: deprecated_at + replaced_by in registry.
  • [x] Fan-out control locked: NO per-event controls; subscribers own their consumption logic.
  • [x] Registry physical location locked: distributed per-domain with central init-registration.
  • [x] Docs generation locked: auto-gen from registry; pattern documented as P51 for future adopters.
  • [x] internal/core/events/registry.go + per-domain events.go files shipped.
  • [x] cmd/dump-events-registry shipped (JSON + Markdown formats; gen-event-schemas folded in via JSONSchemaOf).
  • [x] cmd/check-events-registry CI guard shipped, replaces cmd/check-events, wired into make check.
  • [x] Generated catalog at apps/docs/architecture/_generated/events-catalog.md included into the architecture events doc via VitePress <!--@include: -->. Webhook events docs / automation trigger docs land in 1C.4 / F8 against the same dump.
  • [x] P51 documented in patterns.md.

1C.4 Outbound Webhook Subscriptions (Cat C)

Clinic-configurable URL + signing secret + event-type filter. We POST signed payloads to the URL when matching events fire on events.Bus. Make.com, Zapier, n8n, custom clinic backends, Slack incoming webhooks — all the same row type from our side. The foundation marketplace primitive for outbound integrations.

Day-1 consumer. First paying clinic uses Make.com for CRM sync. 1C.4 ships the primitive AND the Make.com flow end-to-end at foundation close. No "framework only" — the framework is exercised by a real consumer.

Status: design locked 2026-05-06; shipped 2026-05-07 in two commits — Part 1 (foundation primitives) e4390da (schema 000016 + edits to 000004 + permission + classification + signing helper + domain package); Part 2 (runtime) wires the events.Bus subscriber + dispatcher (internal/core/webhooks/dispatcher) mirroring notify.Dispatcher one-to-one, the auto-pause notify category + en/ro templates, the partition-runner extraction (internal/core/partitions) used by both audit and webhooks, the route group at /v1/organizations/{id}/outbound-webhook-subscriptions/* with EnforceLimit("max_webhook_subscriptions", 1), the cmd/api bootstrap, the integration guide at apps/docs/reference/webhook-integration-guide.md, and a 3-test acceptance suite in internal/test/rlstest/webhooks_test.go. The Make.com end-to-end smoke against the integration guide closes in 1E staging.

Locked decisions:

  • Signing scheme: HMAC-SHA256 over timestamp.body (Stripe convention). Headers X-RestartiX-Signature: sha256=<hex>, X-RestartiX-Timestamp: <unix>, X-RestartiX-Event: <event_name>. Receiver validates signature AND that timestamp is within ±5 min of "now" (rejects stale/future-dated payloads). Signing secret is shown ONCE at subscription create time + on regenerate; never readable thereafter.
  • Dual-secret rotation window. Schema carries signing_secret_encrypted (current) + signing_secret_previous_encrypted NULL (previous, valid for 24h after rotation). On POST /{id}/regenerate-secret: write current → previous, generate new → current, return new secret to clinic (one-time response). Dispatcher signs with current; receiver verifies with EITHER. After 24h, previous cleared by background sweep (or on next mutation; sweep is more reliable). Mirrors Stripe rotation experience.
  • Retry policy. Retry on 5xx + network errors + timeouts. Don't retry 4xx (clinic's URL configuration is wrong; won't get better). Exponential backoff 1m / 5m / 30m / 1h / 6h. Dead-letter at 5 attempts. Mirrors 1A.18 notification dispatcher exactly — same SKIP LOCKED claim pattern, same backoff intervals, same cap. Engineers learn one outbox pattern.
  • Auto-pause on consecutive failures. After 10 consecutive dead-lettered deliveries for the same subscription, auto-pause (status='paused') + send a 1A.18 notification to the clinic admin ("your Make.com webhook subscription is paused; URL appears to be down at $url"). Clinic admin re-enables (PATCH .../{id} with status='active') when fixed. Protects us from indefinitely retrying broken endpoints.
  • Per-subscription rate limit. Configurable per subscription, default 100 deliveries/min. Excess events QUEUE with FIFO (writes a pending row in deliveries; worker picks up at next tick). Cap value driven by entitlement quota (max_webhook_deliveries_per_minute); dedicated tier could have higher caps.
  • Payload envelope shape (locked):
    json
    {
      "event": "appointment.scheduled",
      "event_id": "evt_01hzg5...",
      "occurred_at": "2026-05-06T12:34:56Z",
      "organization_id": "01hzg5...",
      "data": { "...": "..." }
    }
    event_id enables clinic-side dedup of retries (we send the same event_id for retried deliveries of the same event). data carries the typed payload from the events registry (1C.3) — schema generated from Go struct.
  • Actor info OMITTED from envelope (locked). The envelope intentionally does NOT include actor_id or actor_type. Receivers that need actor attribution (e.g., "was this triggered by a human admin or an agent or a service_account?") query our audit log via Cat F service-account API access. Keeps the envelope minimal, avoids leaking internal principal model across the boundary, and prevents versioning churn if the actor model evolves. Decision can be revisited if a customer specifically asks; not speculation worth.
  • Replay endpoint deferred. POST /{id}/replay-deliveries (re-send a window of past deliveries) is NOT in foundation. Add when first customer asks. Schema accommodates it (outbound_webhook_deliveries.payload is the source for replays).
  • Worker model. Mirror 1A.18 exactly: in-process polling goroutine, one per Core API instance, polls every 1–2s with ... WHERE status IN ('pending', 'retry') AND next_attempt_at <= NOW() FOR UPDATE SKIP LOCKED LIMIT N. Migration to a separate cmd/webhook-dispatcher binary is mechanical when volume warrants — same pattern.
  • Wildcard event filters deferred. Foundation: explicit event names only in event_filters. Wildcards (patient.*) discourage explicit allow-listing and complicate the registry-driven UI. Add later if real customer ask.

Schema (locked):

  • [x] outbound_webhook_subscriptions(id UUID PK, organization_id UUID NOT NULL FK, target_url TEXT NOT NULL, signing_secret_encrypted BYTEA NOT NULL, signing_secret_previous_encrypted BYTEA NULL, signing_secret_rotated_at TIMESTAMPTZ NULL, event_filters TEXT[] NOT NULL, status TEXT NOT NULL DEFAULT 'active' CHECK (status IN ('active', 'paused', 'revoked')), failure_count INT NOT NULL DEFAULT 0, last_success_at TIMESTAMPTZ NULL, last_failure_at TIMESTAMPTZ NULL, created_by_principal_id UUID NOT NULL FK → principals(id), created_at, updated_at). RLS: org members with organizations.manage_webhooks SELECT; INSERT/UPDATE/DELETE WITH CHECK same permission. AppPool DML.
  • [x] outbound_webhook_deliveries(id UUID, subscription_id UUID FK, event_id UUID NOT NULL, event_name TEXT NOT NULL, payload JSONB NOT NULL, status TEXT CHECK (status IN ('pending', 'retry', 'success', 'failed', 'dead_lettered')), attempt_count SMALLINT NOT NULL DEFAULT 0, next_attempt_at TIMESTAMPTZ NULL, claimed_at TIMESTAMPTZ NULL, claimed_by_worker_id TEXT NULL, last_attempt_at TIMESTAMPTZ NULL, last_response_status_code INT NULL, last_response_body TEXT NULL, dead_lettered_at TIMESTAMPTZ NULL, created_at). Range-partitioned monthly on created_at per P41 — one row per attempt, append-only, time-ordered, multi-month retention. PK (id, created_at). RLS: org members with organizations.manage_webhooks SELECT (joined via subscription_id). REVOKE INSERT/UPDATE/DELETE from restartix_app — dispatcher writes via AdminPool. Migration seeds the current month only; the partition runner (cmd/audit-partition-roll, scope expanded in 1C.4 via internal/core/partitions) extends the runway.
  • [x] Data classification: signing_secret_encrypted + signing_secret_previous_encrypted = auth_secret; payload registers as variable-class — the field carries event payloads which already have classifications via the events registry, so the deliveries table inherits the most-permissive class of any included event payload. Practical effect: deliveries table has support_export egress for ops debugging; no bulk_export. Document per CLAUDE.md data-classification rules.
  • [x] Permission seed: new organizations.manage_webhooks permission, granted to admin system role template only. Subscription count gated by entitlement max_webhook_subscriptions quota (default_behavior=hard_block, period_kind=lifetime); per-minute delivery cap via max_webhook_deliveries_per_minute quota (default_behavior=soft_meter, period_kind=per_minute, new value added to chk_limit_def_period_kind). Tier defaults: pro = 10 subs / 100 deliveries-per-minute; dedicated = unlimited (NULL caps).

Endpoints (locked):

  • [x] GET /v1/organizations/{id}/outbound-webhook-subscriptions[?status=&limit=&offset=] — list. RLS-gated.
  • [x] POST /v1/organizations/{id}/outbound-webhook-subscriptions — create. Body {target_url, event_filters: [...]}. Server generates signing_secret, returns {id, target_url, event_filters, status, signing_secret} (secret one-time read). Validates event_filters against the events registry (1C.3) — unknown event names rejected with 400 unknown_event_name listing the offending names. Quota enforced via EnforceLimit("max_webhook_subscriptions", 1).
  • [x] GET /v1/organizations/{id}/outbound-webhook-subscriptions/{id} — read. Returns row WITHOUT signing_secret.
  • [x] PATCH /v1/organizations/{id}/outbound-webhook-subscriptions/{id} — update target_url, event_filters, or status. Same registry validation on event_filters. Status only transitions active ↔ paused; revoke goes through DELETE.
  • [x] DELETE /v1/organizations/{id}/outbound-webhook-subscriptions/{id} — soft-delete (status = revoked); preserves history.
  • [x] POST /v1/organizations/{id}/outbound-webhook-subscriptions/{id}/regenerate-secret — rotates signing secret per the dual-secret pattern. Returns new secret one-time.
  • [x] POST /v1/organizations/{id}/outbound-webhook-subscriptions/{id}/test — fires synthetic test payload (event subscription.test) inline (signs + POSTs via the handler's HTTPClient — no persistence). Returns receiver status code + body truncated to 4 KiB for the clinic admin to debug their Make scenario in real-time.
  • [x] GET /v1/organizations/{id}/outbound-webhook-subscriptions/{id}/deliveries[?status=&limit=&offset=] — list recent deliveries with status. RLS-gated. Useful for "did event X fire successfully?" debugging.

Day-1 Make.com flow:

  1. Clinic admin creates a Make scenario with "Webhooks" trigger, gets a URL.
  2. Clinic admin creates a webhook subscription via the UI (1D, deferred): pastes URL, picks events from a registry-driven dropdown (e.g., patient.onboarded, appointment.completed), receives signing secret one-time.
  3. Clinic configures Make scenario to verify our X-RestartiX-Signature header.
  4. On first matching event, Make scenario fires; clinic's CRM is synced.
  5. Clinic uses the test endpoint to validate the round-trip without waiting for a real event.

UI placement (deferred to 1D, locked here for clarity):

  • Marketplace page lists "Webhook Subscriptions" as a single card (with logos: Make.com, Zapier, n8n, "custom") that links to the management section. NOT one card per third-party tool — they're all the same row type.
  • Webhook Subscriptions section (under Settings → Integrations or similar) — list / create / edit / delete / test / view deliveries. Backend contract is stable per the endpoints above.
  • Connected Accounts (Cat B) get distinct cards on the Marketplace; that's a different surface (1C.5).

Implementation order inside 1C.4:

  • [x] Migration 000016_outbound_webhooks creating both tables with RLS + permissions + data-classification + audit triggers (per-delivery transitions exempt from audit per CLAUDE.md operational-metadata rule; subscription state changes audit at the application layer with full diff).
  • [x] internal/core/domain/webhooks/ package: model / errors / repository / service / handler.
  • [x] internal/core/webhooks/dispatcher/ package: events.Bus subscriber + outbox worker (SKIP LOCKED claim, exponential backoff, dead-letter, auto-pause logic). Mirror 1A.18's notify.dispatcher shape one-to-one.
  • [x] Endpoints mounted under per-org route group with RequireURLOrgMatchesScope("id") (P47) + RequirePermission(PermOrganizationsManageWebhooks). Create additionally gates on EnforceLimit("max_webhook_subscriptions", 1).
  • [x] HMAC signing helper in internal/core/webhooks/signing/ with Sign(secret, timestamp, body) string + Verify + VerifyWithRotation (dual-secret) + ParseTimestampHeader + 9 unit tests covering round-trip, tampered body / timestamp / dual-secret mismatch, stale-window, drift acceptance, header parse.
  • [x] Integration guide apps/docs/reference/webhook-integration-guide.md with code samples for verifying signatures (Node/Python/Go) + Make.com recipe + Test endpoint usage. Event payload schemas auto-rendered from the 1C.3 registry catalog at /architecture/events.
  • [x] Entitlement quotas (max_webhook_subscriptions, max_webhook_deliveries_per_minute) added to the entitlements catalog (1C.9 rename — wire via the new naming). The chk_limit_def_period_kind CHECK in 000004 expanded to include 'per_minute'.
  • [x] Auto-pause notification: new notify.CategoryWebhookSubscriptionPaused + en/ro email templates, plus a dispatcher-side AdminAutoPauseNotifier that resolves clinic admins via role_permissions @> manage_webhooks and fans out per-recipient. Idempotency keyed on (subscription_id, failure_count, principal_id) so re-firing the same auto-pause condition deduplicates.
  • [x] Partition runner: internal/core/partitions package with shared EnsureMonthly helper. audit.EnsurePartitions and webhookdispatcher.EnsurePartitions both delegate; cmd/audit-partition-roll rolls both sets each tick (binary name retained for scheduler back-compat).
  • [x] cmd/api bootstrap: webhook events.Bus subscriber + dispatcher constructed at startup, dispatcher runs in its own goroutine, both stop gracefully on SIGTERM before events.Shutdown.
  • [x] Acceptance test in internal/test/rlstest/webhooks_test.go (3 tests): (1) subscriber → fire event → fake server captures signed POST → verify signature + envelope shape + delivery row transitioned to success; (2) auto-pause path: 10 dead-lettered deliveries → subscription transitions activepaused exactly once + recording notifier fires once; (3) end-to-end notifier writes a notifications row with category webhook_subscription_paused to the qualifying admin.

Acceptance:

  • [x] Signing scheme + replay window + dual-secret rotation locked.
  • [x] Retry policy + auto-pause + per-subscription rate limit locked.
  • [x] Payload envelope shape locked.
  • [x] API surface locked (replay endpoint deferred).
  • [x] Worker model locked (mirror 1A.18).
  • [x] UI placement clarified (single marketplace card; dedicated management section; deferred to 1D).
  • [x] Schema + RLS + permissions + entitlement quotas shipped (commit e4390da).
  • [x] Domain package + dispatcher + signing helper shipped.
  • [x] Endpoints shipped (UI deferred to 1D).
  • [x] Integration guide shipped; Make.com end-to-end smoke test closes in 1E staging.
  • [x] Acceptance test extension to internal/test/rlstest/webhooks_test.go (split out of setup_clinic_test.go for focus).

1C.5 Connected Accounts (Cat B)

Per-org table where clinic admins connect external services they own (Google Calendar, Slack, HubSpot, future EHRs). Foundation ships the table + framework. OAuth callback infrastructure deferred to first OAuth-using consumer (likely F-tier scheduling for Google Calendar).

Status: design locked 2026-05-06; shipped 2026-05-07 — migration 000017_connected_accounts (integration_services + organization_integrations with RLS, permission seed, app-layer audit), internal/core/domain/integrations/ (model / errors / repository / service / handler), internal/core/integrations/ (Connector interface + init-time Register / Lookup / Reset), per-org route group at /v1/organizations/{id}/integrations/* with RequireURLOrgMatchesScope("id") + RequirePermission(PermOrganizationsManageIntegrations), public catalog endpoint at /v1/integration-services (rate-limited under public_resolve), data-classification entries (auth_secret on credentials_encrypted; no egress), integration guide at apps/docs/reference/connected-account-integration-guide.md, 3-test rlstest acceptance suite (integrations_test.go). No real Cat B catalog rows seeded at foundation — first F-tier consumer (likely Google Calendar at F4 Scheduling) ships the first row + connector implementation + OAuth callback handler in its own PR.

Locked decisions:

  • Hybrid auth shape: typed universal columns + encrypted credentials blob + plaintext per-service config JSONB. Mirrors existing platform pattern (audit_log has typed fields + JSONB metadata; notification_deliveries similar). Single table shape accommodates all auth types (OAuth, API key, signing-secret-only) without per-pattern migrations.
    • Typed columns (queryable, indexable): id, organization_id, integration_service_id, auth_type, external_account_id, title, status, oauth_expires_at (so we can run "expiring soon" sweeps without decrypting), last_used_at, last_error_at, last_error, created_by_principal_id, created_at, updated_at.
    • credentials_encrypted BYTEA for secrets — contents vary by auth_type (OAuth: {access_token, refresh_token, scopes}; API key: {api_key}; webhook-in-only: {signing_secret}). AES-GCM via 1A.3 helper, version-stamped.
    • config JSONB plaintext for non-secret per-service config (calendar IDs, scope subsets, custom field mappings, webhook event filters) — queryable for ops + UI without decryption.
  • Status lifecycle: five values. connected (auth working, healthy); expired (OAuth refresh-token rejected, requires re-OAuth — distinct user-facing recovery); revoked (clinic admin disconnected via UI); error (provider returned 401/403 repeatedly via healthcheck or runtime); pending (created but OAuth flow not yet completed; auto-deleted if not transitioned within 30 min). The expired vs. error split matters because the user-facing recovery is different — expired shows "Reconnect" button; error shows "Provider unavailable, retrying."
  • OAuth client ownership: platform-level per provider (Cat A inside Cat B). RestartiX registers ONE Google Cloud project, ONE Slack app, ONE HubSpot OAuth app, etc. Per-org clients (Option B from design discussion) explicitly rejected — each clinic would have to create their own Google Cloud project, configure OAuth consent screen, get verified by Google, paste credentials into our UI. Brutal UX especially for non-technical clinic admins. The OAuth client itself (client_id + client_secret per provider) is Cat A — lives in platform_service_providers keyed by capability oauth_client_google / oauth_client_slack / etc. The resulting per-clinic access+refresh tokens are Cat B — live in organization_integrations. The two layers compose: Cat A holds the keys to mint tokens; Cat B holds the tokens themselves.
  • Catalog seeded EMPTY at foundation. No Cat B integration ships in 1C.5. Each F-tier consumer adds an integration_services row + first organization_integrations consumer in its own PR. First likely consumer: Google Calendar at F4 Scheduling. Foundation discipline — don't speculate against unknown future config shapes.
  • Per-service config validation deferred to per-service connectors. Foundation just provides the config JSONB column. First connector adds Go-side validation (internal/core/integrations/connectors/google_calendar/validate.go) when it ships. The catalog row carries config_schema JSONB placeholder column for future JSON-schema-based validation if a UI generic config-form needs it; foundation leaves it NULL.
  • Cross-tenancy: catalog is platform-scoped, no per-org overrides. integration_services rows are seeded by platform team via migrations. Clinics consume; clinics don't write. No use case for org-customized catalog ever surfaced.
  • OAuth callback infrastructure deferred. First OAuth-using consumer adds: /oauth/callback/{provider} route, state-token CSRF protection (signed JWT carrying (org_id, principal_id, integration_service_id, return_url)), auth code → token exchange via the platform OAuth client (resolved via 1C.2 Cat A resolver), refresh-token rotation worker. Foundation 1C.5 just lays the table + service skeleton.
  • OAuth requires interactive human consent. OAuth connections (auth_type='oauth2') can ONLY be created by human principals — the OAuth dance requires a browser redirect + provider consent UI. Service_accounts (Cat F) and agents cannot trigger OAuth flows. They CAN create API-key-auth connections (auth_type='api_key') programmatically. The created_by_principal_id column accepts any principal type; the auth-flow-vs-principal-type compatibility is enforced at the handler/connector level, not the schema.

Schema (locked):

  • [x] integration_services — platform catalog of supported integrations. (id UUID PK, slug TEXT NOT NULL UNIQUE, name TEXT NOT NULL, description TEXT, auth_type TEXT NOT NULL CHECK (auth_type IN ('oauth2', 'api_key', 'webhook_in_only')), oauth_scopes TEXT[], oauth_client_capability TEXT NULL, icon_url TEXT, status TEXT NOT NULL DEFAULT 'available' CHECK (status IN ('available', 'beta', 'deprecated')), config_schema JSONB, created_at, updated_at). oauth_client_capability is the FK-by-string to the Cat A capability holding the OAuth client_id/secret (e.g., 'oauth_client_google'). Schema CHECK enforces oauth_client_capability IS NOT NULL exactly when auth_type = 'oauth2'. RLS: SELECT for everyone (catalog is public-by-design so the marketplace UI works); mutations via AdminPool only (catalog edits = migrations or superadmin admin tool).
  • [x] organization_integrations — per-org connections. (id UUID PK, organization_id UUID NOT NULL FK, integration_service_id UUID NOT NULL FK → integration_services(id), auth_type TEXT NOT NULL, external_account_id TEXT NOT NULL, title TEXT NOT NULL, status TEXT NOT NULL DEFAULT 'pending' CHECK (status IN ('pending', 'connected', 'expired', 'revoked', 'error')), oauth_expires_at TIMESTAMPTZ NULL, credentials_encrypted BYTEA NOT NULL, config JSONB NOT NULL DEFAULT '{}', last_used_at TIMESTAMPTZ NULL, last_error_at TIMESTAMPTZ NULL, last_error TEXT NULL, created_by_principal_id UUID NOT NULL FK → principals(id), created_at, updated_at). UNIQUE (organization_id, integration_service_id, external_account_id) — multiple Google Calendars per org (per specialist) is fine; same external account twice is not. Partial indexes for OAuth-expiring sweep + pending-GC sweep. RLS: org members with organizations.manage_integrations SELECT; INSERT/UPDATE/DELETE WITH CHECK same permission. AppPool DML for the per-org happy path; AdminPool for OAuth callback handler (which writes outside the request's tx since the OAuth dance flows through a redirect).
  • [x] Data classification: credentials_encrypted = auth_secret (no egress targets); external_account_id + title + auth_type + status + last_error = org_internal with support_export; config registers as org_internal with support_export (per-service review obligation when each connector ships). Catalog (integration_services) is public for marketplace pre-auth render.
  • [x] Permission seed: new organizations.manage_integrations permission, granted to admin system role template only.

Framework (locked):

  • [x] internal/core/domain/integrations/ — Connected-Account domain package: model / errors / repository / service / handler. Reads/writes per-org rows; orchestrates connector validation + credential encryption + healthcheck.
  • [x] internal/core/integrations/Connector interface + init-time registry (Register / Lookup / RegisteredSlugs / Reset). Per-connector packages register impls via init(), mirroring the events registry pattern from 1C.3. Final shape (creds passed as opaque []byte so the framework stays per-impl-type-erased):
    go
    type Connector interface {
        Slug() string
        ValidateConfig(ctx context.Context, config map[string]any) error
        ValidateCredentials(ctx context.Context, creds map[string]any) error
        Healthcheck(ctx context.Context, creds []byte, config map[string]any) error
        RefreshOAuthToken(ctx context.Context, creds []byte) ([]byte, error)
    }
  • [x] OAuth callback infrastructure NOT shipped at foundation. The Service.Create path explicitly rejects auth_type='oauth2' with 400 oauth_requires_callback_flow. The first OAuth-using consumer ships /oauth/callback/{provider} route + state-token CSRF + auth-code → token exchange + refresh-token rotation worker.

Endpoints (locked):

  • [x] GET /v1/organizations/{id}/integrations[?status=&service_slug=&limit=&offset=] — list connections. RLS-gated.
  • [x] GET /v1/organizations/{id}/integrations/{id} — read. Returns row WITHOUT credentials_encrypted (decrypted credentials NEVER leave the API surface).
  • [x] POST /v1/organizations/{id}/integrations — create. Body {integration_service_id, auth_type, credentials, config}. Used today by API-key auth_type; OAuth auth_type goes through /oauth/callback/{provider} handler instead (deferred). Server validates against the catalog row + per-connector validator.
  • [x] PATCH /v1/organizations/{id}/integrations/{id} — update title or config (NOT credentials — those rotate via separate flow).
  • [x] DELETE /v1/organizations/{id}/integrations/{id} — soft-delete (status='revoked'). Subsequent connector calls fail; clinic re-creates if they want to reconnect.
  • [x] POST /v1/organizations/{id}/integrations/{id}/test — runs the connector's Healthcheck method on demand. Returns success/failure + error context for the clinic admin to debug.
  • [x] Public catalog endpoint: GET /v1/integration-services — lists available integrations from the catalog. Public-resolve style (no auth required, rate-limited per IP under public_resolve) so the marketplace landing page works pre-login, similar to org resolve.

UI placement (1D, locked here for clarity):

  • Marketplace page lists each integration_services row as a discrete card (Google Calendar, Slack, HubSpot, etc., one card per service). Each card shows status if connected, "Connect" button if not. Clicking "Connect" kicks off OAuth (deferred infra) or opens the API-key form depending on auth_type.
  • Connected Accounts management section — list of active organization_integrations rows for this org, with status, last used, disconnect, test, edit-config actions.
  • Distinct from 1C.4's webhook subscriptions UI placement — Connected Accounts has one card per service; Outbound Webhooks has one card total. Both surface from the marketplace.

Implementation order inside 1C.5:

  • [x] Migration 000017_connected_accounts creating integration_services + organization_integrations with RLS + permission + data-classification entries.
  • [x] internal/core/domain/integrations/ package with model / errors / repository / service / handler.
  • [x] Connector interface declaration + registration mechanism (init-time, per package).
  • [x] Endpoints mounted under per-org route group with RequireURLOrgMatchesScope("id") (P47) + RequirePermission(PermOrganizationsManageIntegrations) for mutations. Public catalog endpoint mounted under /v1/integration-services (no auth, rate-limited).
  • [x] P31 verified — organization_integrations.credentials_encrypted is the canonical Cat B per-org credential store the pattern describes.
  • [x] Acceptance test in internal/test/rlstest/integrations_test.go (3 tests): (1) framework round-trip — fixture catalog row → org creates connection (API-key auth_type) → connector validate hooks fire → service.RunHealthcheck dispatches → revoke + idempotent re-revoke; (2) OAuth-rejected-from-direct-create — auth_type='oauth2' Create returns ErrOAuthRequiresCallbackFlow; (3) RLS defense-in-depth — specialist role's repo.Insert blocked at the DB layer despite bypassing the service permission check.

Acceptance:

  • [x] Hybrid auth shape locked.
  • [x] Five-status lifecycle locked.
  • [x] OAuth client ownership: platform-level (Cat A holds clients; Cat B holds per-clinic tokens). The two-layer composition is the architectural insight.
  • [x] Catalog empty at foundation; first F-tier consumer seeds first row + OAuth infra.
  • [x] Per-service config validation deferred to per-service connectors.
  • [x] Cross-tenancy: catalog platform-scoped, no per-org overrides.
  • [x] Schema + RLS + permission seed shipped.
  • [x] internal/core/domain/integrations/ package + Connector interface + registration mechanism shipped.
  • [x] Endpoints shipped; public catalog endpoint live.
  • [x] Acceptance test (framework-only — no real Cat B integration at foundation).

1C.6 Inbound Webhook Framework (Cat D)

Convention for /webhooks/{provider} route mounting + per-provider signature verification + once-and-only-once dedup + state update + Internal Event emission (Cat E). Mostly a documented convention plus a small dedup table and a CI guard.

Status: design locked 2026-05-06; shipped 2026-05-07 framework-only — migration 000018_inbound_webhook_dedup (monthly-partitioned dedup table with REVOKE on restartix_app, current-month seed), internal/core/inboundwebhooks/dedup/ repo helpers (WasProcessed + MarkProcessed, AdminPool only) + EnsurePartitions registered in cmd/audit-partition-roll, P52 documented in patterns.md, cmd/check-inbound-webhooks CI guard (AST-based; scans internal/integration/*/inbound/ packages for the four required call sites; passes with zero handlers today + four unit tests verify the guard's correctness on synthetic inputs), integration guide at apps/docs/reference/inbound-webhook-guide.md, 3-test rlstest acceptance suite (inbound_dedup_test.go). The original spec premise of a Daily.co retrofit was stale (no Daily.co handler exists in the codebase) — first F-tier consumer ships the first per-provider verifier + handler + Cat E event registration in their own PR.

Locked decisions:

  • Per-provider verification helpers (no generic abstraction). Each provider's signature scheme lives in its own package — internal/integration/stripe/inbound/verify.go, internal/integration/dailyco/inbound/verify.go, internal/integration/clerk/svix/verify.go, internal/integration/ses/sns/verify.go. Each package exports Verify(req *http.Request, secret []byte) error (or equivalent). Signature schemes don't share enough structure for a shared abstraction to be worth the indirection — Stripe is t=...,v1=... HMAC of timestamp+body; SES SNS uses X.509 cert chain validation; Svix has its own three-header format; Google echoes back our opaque token. False abstraction over them would be brittle and obscure. Convention is the framework; per-provider helpers are the work.
  • Dedup table at foundation: inbound_webhook_dedup. Range-partitioned monthly per P41. (provider TEXT, event_id TEXT, processed_at TIMESTAMPTZ) with PK (provider, event_id, processed_at) (composite for partition compatibility). Repo helpers WasProcessed(ctx, provider, eventID) (bool, error) and MarkProcessed(ctx, provider, eventID) enforce once-and-only-once across every provider from Day 1. Retention bounded by max provider retry windows (~30 days for Stripe, less for others — drop partitions older than 60 days). Worth shipping at foundation rather than deferring per-provider because the cost is one table + two helpers, and ALL inbound webhook handlers benefit immediately.
  • Per-connection inbound tokens stored in organization_integrations.config (Option A). For Cat B providers that push notifications (Google Calendar, Microsoft 365), each per-org connection registers a push channel; the provider echoes back a token we generated at registration. Token lives in the connection's config JSONB (e.g., config.push_channel.token). Lookup via GIN index on config. NOT clinic-facing — purely internal routing. Distinct from Cat C signing secrets (which are clinic-managed and client-visible at create time). Co-locates token with the connection it belongs to; deletion is automatic via cascade. At expected scale (~9k Cat B connections at the high end), JSONB-scan with GIN is microseconds; if a future provider's push volume changes that, migrating to a dedicated table is a denormalization, not a model change.
  • Standard inbound flow (locked): every inbound webhook handler runs in this order: (1) verify signature using the per-provider helper, return 401 on mismatch; (2) call dedup.WasProcessed(provider, event_id), return 200 early if already seen (provider treats this as ack); (3) call domain service to update state via the request's tx; (4) call dedup.MarkProcessed(provider, event_id) in the same tx; (5) commit; (6) on success, emit a Cat E Internal Event via 1C.3 registry describing the inbound effect (e.g., appointment.recording_available, payment.received, auth.user_synced). The standard fan-out (audit, notification dispatcher, outbound webhook dispatcher, automations engine) consumes the event downstream.
  • Audit existing Daily.co handler at 1C.6 close. Today's Daily.co recording webhook verifies the signature and updates the appointment record but doesn't dedup or emit an internal event. Retrofit at 1C.6: add dedup, emit appointment.recording_available on success. Same audit will catch any other inbound webhook that's drifted from the convention. Document any unavoidable exceptions explicitly.

Schema (locked):

  • [x] inbound_webhook_dedup(provider TEXT NOT NULL, event_id TEXT NOT NULL, processed_at TIMESTAMPTZ NOT NULL DEFAULT NOW()). Range-partitioned monthly on processed_at. PK (provider, event_id, processed_at). RLS enabled with no policies; REVOKE INSERT/UPDATE/DELETE/SELECT from restartix_app — table is invisible to the app role for both reads and writes; AdminPool (owner) bypasses RLS. Retention: drop partitions older than 60 days (configurable; covers max provider retry windows; sweep is a future operational concern).
  • [x] No new permission seed — inbound webhook handlers run on a sibling router group with their own auth shape (signature verification, not session auth). Dedup table is operational infrastructure, not user-facing.

Per-provider package shape (locked):

internal/integration/{provider}/inbound/
  ├── verify.go    // Verify(req *http.Request, secret []byte) error
  ├── parse.go     // Parse(body []byte) (Event, error)  -- typed event extraction
  └── handler.go   // mount on /webhooks/{provider}; runs the standard flow

The router group (mounted at /webhooks/) is auth-naked from the JWT side — verification happens via the per-provider helper. CSRF doesn't apply because there's no session. Rate limiting per-provider via 1A.13's existing infrastructure (e.g., reject if a provider sends >100 requests/sec to its endpoint, which would indicate a runaway loop or attack).

Implementation order inside 1C.6:

  • [x] Migration 000018_inbound_webhook_dedup (monthly-partitioned) with RLS + REVOKE on restartix_app (read + write) + current-month seed (mirrors 1A.15 audit_log partition shape).
  • [x] internal/core/inboundwebhooks/dedup/ package with WasProcessed + MarkProcessed repo helpers (AdminPool path only) + EnsurePartitions registered in cmd/audit-partition-roll.
  • [-] Per-provider verify.go packages — DEFERRED. No inbound provider handler exists at foundation; original spec premise (Daily.co handler) was stale. First F-tier consumer ships the first per-provider package alongside its handler.
  • [-] Audit Daily.co handler at 1C.6 close — DEFERRED for the same reason; nothing to retrofit.
  • [x] Documentation as P52 — Inbound Webhook Convention in patterns.md covering: route mount path, signature verification, dedup, state update, internal event emission, error handling.
  • [x] CI guard cmd/check-inbound-webhooks walks every internal/integration/*/inbound/ package (the convention's home for handlers) and asserts the four required call sites: *Verify*, dedup.WasProcessed, dedup.MarkProcessed, events.Publish/PublishWith/NewEvent. AST-based, with four unit tests proving the guard rejects non-compliant fixtures and accepts compliant ones. Wired into make check. Passes with zero handlers today (no inbound/ packages); first F-tier consumer triggers the first non-trivial run.
  • [x] Integration guide apps/docs/reference/inbound-webhook-guide.md — for engineers adding new inbound webhooks; references the per-provider package shape and the standard flow.
  • [x] Acceptance test internal/test/rlstest/inbound_dedup_test.go (3 tests): WasProcessed/MarkProcessed round-trip; AppPool blocked from SELECT + INSERT (REVOKE proof); repeated MarkProcessed remains idempotent at the protocol surface. Per-provider replay-path tests land with the first F-tier consumer.

Acceptance:

  • [x] Verification helpers locked: per-provider, no generic abstraction.
  • [x] Dedup table locked: inbound_webhook_dedup ships at foundation, monthly-partitioned, used by every provider.
  • [x] Per-org inbound tokens locked: stored in organization_integrations.config (Option A); GIN-indexed JSONB scan; not clinic-facing.
  • [x] Standard flow locked: verify → dedup → state update → mark processed → emit Cat E event.
  • [-] Daily.co handler retrofit scope locked: add dedup + internal event emission. DEFERRED — no Daily.co handler exists in the codebase; spec premise was stale. Retrofit will land with the first F-tier consumer that adds the first per-provider handler.
  • [x] Schema (dedup table) + repo helpers shipped.
  • [-] Per-provider verify packages aligned with convention. DEFERRED — no per-provider handlers at foundation; convention is documented in P52 + integration guide for the first F-tier consumer.
  • [-] Daily.co handler refactored. DEFERRED — same as above.
  • [x] P52 documented in patterns.md.
  • [x] CI guard cmd/check-inbound-webhooks shipped.
  • [x] Integration guide published.

1C.7 Metering & Quotas

Per-capability usage records captured at the capability seam. Per-org quotas enforced as hard limits at the seam — exceeding fails the call. Pricing engine + invoicing deferred to a later subject; this layer just measures + caps. Critical foundation work because AI cost-per-call is high and runaway-cost protection is non-optional from Day 1.

Status: shipped 2026-05-07. Three-table schema landed in migration 000019 with the usage.view_org permission and full data-classification entries; internal/core/metering/ exposes the AdminPool-backed Repository implementing capabilities.MeterStore plus LoadLimits, EnsureQuotaRow, AdvanceExpiredQuotas, RollupClosedPeriod, SyncOrgLimits, and a TelemetryEmitter hook. The capabilities wrap stack moved meterAfterSuccessmeterAroundCall (innermost; Reserve before inner, Refund on failure, Record on success) and now resolves the metering org via principal.Subject (request paths) or ContextWithMeteringOrg (dispatcher / system paths). notify.email.NewMeteredChannel wraps the SES adapter through WrapMeteredProvider; the new cmd/usage-quota-reset and cmd/usage-summary-rollup crons handle period boundaries and closed-period rollups; cmd/audit-partition-roll rolls the usage_records monthly partitions alongside the existing tables. Subscription mutations call subscriptions.Service.SetLimitSyncer to project plan-derived caps to usage_quotas.limit_units. Acceptance suite at internal/test/rlstest/metering_test.go covers atomic gate, refund, period reset, summary rollup, sync, and AppPool write blocks.

Why foundation, not feature. AI feature cost can blow up an org's monthly bill in hours if a runaway agent loop hits an unmetered LLM endpoint. Without per-org quotas, a misconfigured automation could rack up $10K of LLM cost on one clinic before anyone notices. Metering + caps are mandatory before any AI feature ships, which means foundation. Pricing / invoicing / billing UI is later — separate subject built on top of this.

Locked decisions:

  • Three-table model. Three distinct tables, three distinct purposes:

    • usage_records — append-only event log. One row per metered call. Source of truth.
    • usage_quotas — running counter for the CURRENT period only. Resets at period boundary. Real-time counter that gates runtime calls.
    • usage_summaries — closed-period historical totals. Created by end-of-period cron. Survives quota resets; provides historical record for billing/analytics.
  • Retention. usage_records 12-month hot retention; older monthly partitions archive to S3 (mirrors 1A.15 audit_log archive pattern). usage_summaries keeps longer (closed-period historical record for billing). usage_quotas is one row per (org, capability, period) — no retention question; it's small and lives forever.

  • Quota enforcement: atomic-increment-with-refund-on-failure (Option B). Single SQL on every metered call:

    sql
    UPDATE usage_quotas
       SET current_units = current_units + $units
     WHERE organization_id = $org AND capability = $cap AND period_end_at > NOW()
       AND (limit_units IS NULL OR current_units + $units <= limit_units)
     RETURNING current_units;

    If no row updated → quota exceeded; fail with 402 quota_exceeded BEFORE the provider call. Race-free at the DB level — concurrent calls can't both pass when only one slot remains. On provider-call failure, decrement (refund) so failed calls don't burn quota. Standard cloud-API pattern.

  • Period boundaries: calendar UTC. period IN ('day', 'week', 'month') — three granularities. Each capability picks at registration time (e.g., AI tokens daily for cost protection; emails weekly or monthly; storage monthly). Reset cron runs at boundary (UTC midnight daily, Monday UTC weekly, first-of-month UTC monthly); sets current_units = 0 and bumps period_start_at / period_end_at. Document timezone explicitly in clinic-facing usage UI so admins aren't surprised.

  • Day-1 metered capability: email only. 1C.7 wires notify.email through metering at foundation as the exercise consumer. Capability email, unit_type emails_sent, default monthly quota (configurable per plan via plan_limits). Real metering data accumulates by 1E staging. Storage / AI / video / SMS metering land at their respective consumers (each one decides its unit semantics — bytes vs. ops for storage; input/output tokens for AI; minutes for video). Foundation discipline — don't speculate on units we don't yet have a consumer for.

  • Aggregation cadence: end-of-period cron. Atomic-increment already keeps usage_quotas.current_units real-time-accurate (which is what runtime gating needs). The cron rolls usage_recordsusage_summaries at period close (one bulk INSERT per period boundary). Continuous summary updates explicitly rejected — doubles every write path; speculation against an unknown future need.

  • Quota source — two-family entitlements (see glossary.md → Entitlement). usage_quotas.limit_units syncs from organization_subscription_limits (the QUOTA family — limit_definitions / plan_limits / organization_subscription_limits, distinct from the BOOLEAN family renamed in 1C.9). Sync is one-way: subscription_limits drive quotas; never reverse. On subscription create → snapshot from plan_limits → write quota row. On override → write through to quota row. On period reset → no-op for limit (limit doesn't change at boundary; current_units does).

  • Quotas are org-scoped, not principal-scoped. A single usage_quotas row per (organization_id, capability, period) — every actor in the org (humans, agents, service_accounts) shares the same counter. Clinic's plan governs total monthly usage regardless of who triggers a call. Per-actor-type sub-quotas (e.g., "agents can use up to 30% of the org's AI quota") are NOT in the design — speculation against an unknown future need; quota schema doesn't accommodate it (no principal_type discriminator) and adding later is a non-trivial migration if real demand surfaces. Document the choice explicitly so a future reviewer doesn't accidentally add per-principal sub-quotas without ADR-level discussion.

  • Live vs. historical clinic visibility:

    • "How much have I used THIS period?" → usage_quotas.current_units (real-time, gated UI surface in 1D — clinic admin sees live counters against their plan).
    • "What's my history?" → usage_summaries (after period close; per-month rollup).
    • "Show me every email sent on date X for billing dispute?" → usage_records (event log, available for support within retention window).
  • Telemetry forwarding. Usage records forward to the telemetry sibling service for analytics — per-tier usage patterns, capacity planning, AI cost trends across orgs. PII pseudonymized at forwarding (org_id hashed; capability/units/cost are non-PII so they pass through). Same pipeline as audit forwarding (1A's pattern).

Schema (locked):

  • [x] usage_records(id UUID, organization_id UUID NOT NULL FK, capability TEXT NOT NULL, units BIGINT NOT NULL, unit_type TEXT NOT NULL, cost_cents INT NULL, principal_id UUID NULL FK, occurred_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), metadata JSONB NOT NULL DEFAULT '{}'). Range-partitioned monthly on occurred_at per P41. PK (id, occurred_at). RLS: org members with new usage.view_org permission SELECT; AdminPool writes only. REVOKE INSERT/UPDATE/DELETE from restartix_app.
  • [x] usage_quotas(id UUID PK, organization_id UUID NOT NULL FK, capability TEXT NOT NULL, period TEXT NOT NULL CHECK (period IN ('day', 'week', 'month')), limit_units BIGINT NULL, current_units BIGINT NOT NULL DEFAULT 0, period_start_at TIMESTAMPTZ NOT NULL, period_end_at TIMESTAMPTZ NOT NULL, last_reset_at TIMESTAMPTZ NULL, updated_at). UNIQUE (organization_id, capability, period). NULL limit_units = unlimited (e.g., enterprise tier; usage_quotas row still exists for tracking). RLS: org members with usage.view_org SELECT; AdminPool writes only.
  • [x] usage_summaries(id UUID PK, organization_id UUID NOT NULL FK, capability TEXT NOT NULL, period TEXT NOT NULL, period_start_at TIMESTAMPTZ NOT NULL, period_end_at TIMESTAMPTZ NOT NULL, total_units BIGINT NOT NULL, total_cost_cents BIGINT NOT NULL DEFAULT 0, calls_count INT NOT NULL, created_at). UNIQUE (organization_id, capability, period, period_start_at). RLS: org members with usage.view_org SELECT; cron writes via AdminPool.
  • [x] Data classification: all four tables register their columns in data-classification.md. metadata JSONB on usage_records registers as variable-class (per-capability metadata may carry pii_basic if e.g., recipient_email is included; reviewed per consumer).
  • [x] Permission seed: new usage.view_org permission, granted to admin and customer_support system role templates (operations need to debug; specialists don't).

Implementation order inside 1C.7:

  • [x] Migration creating three tables + RLS + permission + data classification entries.
  • [x] internal/core/metering/ package: atomic-increment query helper (Reserve(ctx, org, capability, period, units)), refund helper (Refund(...)), record-write helper (Record(...)), LoadLimits for org-scope middleware, EnsureQuotaRow + SyncOrgLimits for the subscription-mutation path, AdvanceExpiredQuotas + RollupClosedPeriod for the crons, EnsurePartitions registered with cmd/audit-partition-roll. The capabilities wrap stack ships meterAroundCall (replacing the old meterAfterSuccess skeleton) so Reserve/Record/Refund compose innermost.
  • [x] Period-reset cron cmd/usage-quota-reset — runs at calendar boundaries; loops AdvanceExpiredQuotas until no expired rows remain.
  • [x] Period-close cron cmd/usage-summary-rollup — rolls day every run, week on UTC Monday, month on UTC-1st (-force overrides for backfill).
  • [x] notify.email.SESChannel enters WrapMeteredProvider via notifyemail.NewMeteredChannel. Capability email, period month, unit_type emails_sent, units 1. Dispatcher delivers under the system principal so the metered adapter attaches the notification's organization_id via capabilities.ContextWithMeteringOrg.
  • [x] Quota source sync: subscriptions.Service.syncLimits (wired via SetLimitSyncer) calls metering.Repository.SyncOrgLimits after every create / update / override / revoke. Aggregates caps across active subscriptions (SUM, NULL = unlimited propagates) and writes through to usage_quotas.limit_units for every registered (limit_code → capability/period) entry.
  • [x] Telemetry forwarding hook: metering.TelemetryEmitter interface + Repository.SetTelemetryEmitter. Foundation 1C.7 leaves the emitter unset; the hook fires immediately after a successful usage_records insert when 1A's telemetry sink (or whichever later phase ships first) wires it.
  • [x] Acceptance suite at internal/test/rlstest/metering_test.go: atomic Reserve gate (cap=3 → fourth Reserve hits capabilities.ErrQuotaExceeded); Refund-decrements-counter; AdvanceExpiredQuotas zeroes a stale row and bumps the window forward; RollupClosedPeriod aggregates seeded usage_records into one usage_summaries row and is idempotent on a second run; SyncOrgLimits writes through subscription_limits.cap_value=250 to usage_quotas.limit_units=250; AppPool tx is rejected with permission denied on INSERT to all three tables.

Acceptance:

  • [x] Three-table model locked: usage_records (event log) + usage_quotas (live counter) + usage_summaries (closed-period historical).
  • [x] Retention locked: 12-month hot for records; older partitions archive to S3.
  • [x] Atomic-increment-with-refund enforcement model locked.
  • [x] Calendar UTC boundaries with day / week / month granularities locked.
  • [x] Day-1 metered capability locked: email at foundation; storage/AI/video deferred to consumers.
  • [x] Aggregation cadence locked: end-of-period cron only.
  • [x] Quota source locked: reads from organization_subscription_limits (quota family from glossary's two-family entitlement structure).
  • [x] Live + historical clinic visibility model locked.
  • [x] Telemetry forwarding locked.
  • [x] Schema + RLS + permission seed shipped.
  • [x] internal/core/metering/ package + reset cron + rollup cron shipped.
  • [x] notify.email wired through metering middleware.
  • [x] Telemetry pipe wired (hook only — TelemetryEmitter interface; concrete sink lands when 1A's telemetry pipeline ships).
  • [x] Acceptance test covering quota gating + refund + period reset + summary rollup.

1C.8 AI Capability Hooks

AI is Cat A in shape (curated provider, switchable, platform credentials by default) PLUS extra obs: provenance audit, model registry with pricing history, per-call cost capture, streaming support. Lays the foundation hooks; first AI feature consumer wires the actual LLM / embedding / transcription / vision / classification providers.

Status: shipped 2026-05-07. Schema landed in migration 000020 (ai_models + ai_model_pricing_history + FK from audit_ai_provenance.model_id); audit_log_insert extended with OUT params (audit_log_id, audit_log_created_at) so audit.RecordWithProvenance writes both rows in the same transaction. Five AI capability skeleton packages under internal/core/ai/ (LLM streaming + tools, embeddings / transcription / vision / classification simpler shapes) with Fake test doubles. capabilities.WrapMeteredAI + meterDeferred middleware + Reservation / SettleResult / SettleEntry types implement the variable-cost flow described in P53; metering.Repository.BeginReservation returns the production handle, RecordWithCost writes per-direction usage_records with cost_cents snapshots. internal/core/domain/aimodels/ Console superadmin endpoints under /v1/admin/ai-models (list / get / create-with-initial-pricing / patch / price-change), gated on PlatformPermAIModelsManage = "ai_models.manage" (superadmin-only by default). cmd/check-ai-models CI guard wired into make check; data classification entries cover the new tables. Acceptance suite at internal/test/rlstest/ai_provenance_test.go covers same-tx audit+provenance write, deferred reservation + per-direction settle, Cancel-refunds-full + idempotency, pricing lookup at-time + price change. UI deferred to 1D per the unified UI pass rule. OpenAPI for the admin endpoints deferred to 1D alongside 1C.2's platform-service-providers admin endpoints (same pattern: admin OpenAPI lands when the Console UI does).

Why foundation work before any AI feature ships. AI is stated platform direction (CLAUDE.md: "this platform is built around AI agents as first-class actors"; apps/docs/product/ai-agents.md). Audit provenance + model registry + cost capture + streaming are cross-cutting — every AI-using feature needs them. Retrofitting after F-tier AI features land is exactly the cross-cutting cost foundation discipline prevents.

Locked decisions:

  • One interface per AI task. Separate Go packages: internal/core/ai/llm/, internal/core/ai/embeddings/, internal/core/ai/transcription/, internal/core/ai/vision/, internal/core/ai/classification/. Each has its own interface with task-specific methods. Reasons: tasks have genuinely different shapes (LLM has tools/streaming; embeddings doesn't; transcription has audio I/O; vision has images); different providers specialize per task (Anthropic LLM, Voyage embeddings, Deepgram transcription, Google Vision OCR); validation status is per-(model, task); type safety catches "trying to embed a string with the LLM provider." False unification rejected. Matches the Cat A capability pattern (email.Channel, video.Provider, pdf.Renderer are each their own package).
  • Model registry with pricing history. Two tables — current state on ai_models, historical pricing changes in ai_model_pricing_history. Pricing changes for AI providers ARE inevitable (Anthropic and OpenAI both adjusted prices multiple times in 2024-2025); historical pricing is necessary for accurate billing reconstruction (closed-period invoice generation, customer disputes, cost-calculation bug recovery). Worth shipping at foundation; small schema cost vs. retrofit later.
  • AI credentials in platform_service_providers (Cat A resolver from 1C.2). Same pattern as email / storage / video / payments. AI capabilities (ai_text_generation, ai_embedding, ai_transcription, ai_vision, ai_classification) seed platform-default rows in platform_service_providers AS THEIR FIRST CONSUMER SHIPS. Foundation 1C.8 doesn't seed any rows — first AI feature does. Per-tenant brand-isolation overrides work the same way as any other Cat A capability.
  • Provenance wiring via metering middleware extension. The metering wrapper from 1C.7 (WrapMeteredProvider) is extended with optional provenance config. When WithProvenance(...) option is passed, the wrapper writes an audit_ai_provenance row in the same tx as the audit row, with (audit_log_id, model_id, inputs_hash, confidence). inputs_hash is SHA-256 of canonicalized prompt — lets compliance auditors verify "was this prompt the one we ran?" without storing the actual prompt content (PII for clinical features). Confidence is provider-supplied where available (Anthropic Claude returns it; some providers don't — NULL is allowed).
  • Streaming support in the LLM interface from Day 1. LLM.Generate(ctx, ...) (Stream, error) returns a stream interface with Next() (Token, bool, error) + Close() (Usage, error). Caller iterates tokens; Close() returns final usage info (input/output token counts) which the metering layer writes as one usage_record at stream completion. Without streaming-aware design from Day 1, every AI feature has to bolt streaming on after the fact — cross-cutting retrofit cost. ~30 lines of interface design.
  • BYO-LLM via Cat B deferred until first clinic asks. Foundation accommodates: organization_integrations schema already takes AI provider tokens via the same shape as other Cat B connections (auth_type='api_key', credentials_encrypted={api_key}, config={model_preferences}). When BYO-LLM lands, it composes cleanly with the Cat A resolver — clinic's per-org override row in platform_service_providers references the Cat B credentials by relation, OR the resolver falls through to the Cat B side. Mechanics decided when the first BYO consumer surfaces.
  • AI agent identity in audit. When an AI feature call runs on behalf of an AI agent (per principals model from 1B.1), the call's principal context is the agent (actor_type='agent'); audit_log row attributes correctly via existing infrastructure. The agent's parent_principal_id (delegation column reserved in 1B.1, currently semantically undefined) carries the human who delegated to the agent. No new plumbing at 1C.8 — principals model already supports this.
  • Agent provisioning shape deferred to first AI feature. agents table schema exists (1B.1) but no flow creates agent rows today. When the first AI feature ships, it decides: (a) lazy creation — create an agent row inline on first AI call delegated by a human; (b) eager creation — create the agent row when the human enables an AI feature; or (c) per-task agents — one agent per (human, AI feature) combo. Foundation 1C.8 ships skeleton interfaces + provenance hooks that work with ANY of these provisioning shapes; the choice falls to the first consumer. The audit + metering chain attributes to whatever principal context the call carries, so the framework is provisioning-agnostic.

Schema (locked):

  • [x] ai_models(id UUID PK, model_provider TEXT NOT NULL, model_name TEXT NOT NULL, model_version TEXT NOT NULL, capability TEXT NOT NULL CHECK (capability IN ('text_generation', 'embedding', 'transcription', 'vision', 'classification')), unit_type TEXT NOT NULL, validation_status TEXT NOT NULL CHECK (validation_status IN ('experimental', 'validated', 'deprecated', 'retired')), validation_notes TEXT, status TEXT NOT NULL DEFAULT 'active' CHECK (status IN ('active', 'deprecated', 'retired')), introduced_at TIMESTAMPTZ NOT NULL, retired_at TIMESTAMPTZ NULL, created_at, updated_at). UNIQUE (model_provider, model_name, model_version). Foundation seeds empty. RLS: SELECT for everyone (registry is public-by-design — surfaced in patient-facing AI transparency UIs); mutations via AdminPool only (superadmin actions, audited).
  • [x] ai_model_pricing_history(id UUID PK, model_id UUID NOT NULL FK → ai_models(id), cost_per_input_unit_cents NUMERIC NOT NULL, cost_per_output_unit_cents NUMERIC NOT NULL, effective_from TIMESTAMPTZ NOT NULL, effective_to TIMESTAMPTZ NULL, changed_by_principal_id UUID NULL FK → principals(id), notes TEXT, created_at). Partial unique (model_id) WHERE effective_to IS NULL — at most one current pricing row per model. Historical pricing for date X: WHERE effective_from <= X AND (effective_to IS NULL OR effective_to > X). RLS: SELECT for AdminPool only (pricing detail is platform-confidential); mutations via AdminPool with audit. usage_records.cost_cents snapshots pricing AT CALL TIME so historical reconstruction stays accurate even if pricing rows are amended later.
  • [x] Permission seeded inline in 1C.8 migration: ai_models.manage granted to superadmin only via the platform-permissions Go layer (no per-org RBAC row). Constants: principal.PermAIModelsManage.
  • [x] audit_ai_provenance already shipped in 1A.5 / 1A.15. 1C.8 verifies the FK target (model_id references new ai_models.id); migration adds the FK constraint at this point.
  • [x] Data classification entries: ai_models columns mostly org_internal with support_export egress (provider/model names are not patient-PII); validation_notes may carry medical-device-readiness context (org_internal + support_export). ai_model_pricing_history rows are org_internal only (pricing is platform-confidential). audit_ai_provenance.inputs_hash is org_internal (hash, not raw prompt).

Implementation order inside 1C.8:

  • [x] Migration creating ai_models + ai_model_pricing_history with RLS + permissions + data classification.
  • [x] internal/core/ai/llm/ package with LLM interface (streaming-first), Stream type, Usage type. Skeleton — no provider impls.
  • [x] internal/core/ai/embeddings/ package with Embeddings interface. Skeleton.
  • [x] internal/core/ai/transcription/ package with Transcription interface. Skeleton.
  • [x] internal/core/ai/vision/ package with Vision interface. Skeleton.
  • [x] internal/core/ai/classification/ package with Classification interface. Skeleton.
  • [x] Fake{Capability} per package — test-double convention from 1C.1.
  • [x] AI metering primitive shipped — capabilities.WrapMeteredAI + meterDeferred middleware + MeterStore.BeginReservation / Reservation / SettleResult / SettleEntry types + metering.Repository.RecordWithCost for per-direction cost_cents snapshots. Provenance lands via audit.RecordWithProvenance (extension on audit.Recorder); the original WrapMeteredProvider.WithProvenance framing was superseded by this cleaner separation — see the acceptance note below.
  • [x] Console superadmin endpoints: GET /v1/admin/ai-models, POST /v1/admin/ai-models (creates model + initial pricing-history row in one tx), PATCH /v1/admin/ai-models/{id} (updates non-pricing fields), POST /v1/admin/ai-models/{id}/price-change (closes current pricing-history row, inserts new). Console UI deferred to 1D per the unified UI pass rule.
  • [x] CI guard cmd/check-ai-models walks every AI capability call site, asserts the provider impl references a model_id that exists in the registry. Foundation discipline — no AI call without a registered model.
  • [x] Documentation: apps/docs/reference/ai-models.md (new) describing how to add a model row, validation status meanings, pricing-change procedure. Cross-references the SOUP entries from 1A.13 (every AI/ML model is a SOUP entry too — model_provider / model_version / validation_status fields per CLAUDE.md medical-device-readiness rule).
  • [x] Acceptance test at internal/test/rlstest/ai_provenance_test.go: same-tx audit+provenance write, deferred reservation + per-direction settle, Cancel-refunds-full + idempotency, pricing lookup at-time + price change.

Acceptance:

  • [x] One interface per AI task locked: ai/llm, ai/embeddings, ai/transcription, ai/vision, ai/classification.
  • [x] Model registry shape locked: ai_models + ai_model_pricing_history (history shipped at foundation, not deferred).
  • [x] AI credentials in platform_service_providers (Cat A) locked.
  • [x] Provenance wiring via WrapMeteredProvider.WithProvenance extension locked.
  • [x] Streaming support in LLM interface from Day 1 locked.
  • [x] BYO-LLM via Cat B deferred until first clinic asks.
  • [x] AI agent identity in audit: existing principals + parent_principal_id model is sufficient; no new plumbing.
  • [x] Schema + RLS + permission seeds shipped (migration 000020 + PlatformPermAIModelsManage constant + classification entries for ai_models / ai_model_pricing_history / renamed audit_ai_provenance.model_id).
  • [x] Five AI capability interfaces shipped (skeletons + Fake doubles in internal/core/ai/{llm,embeddings,transcription,vision,classification}; no provider impls at foundation).
  • [x] AI metering primitive shipped: capabilities.WrapMeteredAI + meterDeferred middleware, MeterStore.BeginReservation + Reservation / SettleResult / SettleEntry types, metering.Repository.RecordWithCost for per-direction cost_cents snapshots. Provenance lands via audit.RecordWithProvenance (extension on audit.Recorder) — wrap-layer agnostic to provenance shape, AuditFunc closure decides per-capability. (The original "WrapMeteredProvider.WithProvenance extension" framing in the design notes was superseded by this cleaner separation: metering gets a generic post-call reconciliation primitive that works for streaming-late-output AND non-streaming-late-usage; provenance gets its own audit-recorder API; capability impls compose the two as needed.)
  • [x] Console superadmin endpoints for model registry management shipped at /v1/admin/ai-models (UI deferred to 1D; OpenAPI also deferred to match 1C.2 platform-service-providers pattern).
  • [x] CI guard + documentation + acceptance test (cmd/check-ai-models wired into make check + apps/docs/reference/ai-models.md + internal/test/rlstest/ai_provenance_test.go covering same-tx audit+provenance, split-direction settle, Cancel-refund + idempotency, pricing lookup at-time + price change).

1C.9 Entitlements Rename

Mechanical rename of features / plan_features / organization_capabilitiesentitlements / plan_entitlements / organization_entitlements (plus the snapshot family organization_subscription_entitlements, patient_tier_entitlements, patient_subscription_entitlements, the entitlement_code / entitlement_enabled / entitlement_column columns, and the current_app_has_org_entitlement SQL helper). Resolves the architectural-vs-billing-vocabulary collision settled in glossary.md.

Status: shipped 2026-05-06. Existing functionality unchanged. See glossary.md → Entitlement and Forbidden terms for the rationale — architectural "Capability" means an internal Go interface, architectural "Feature" means user-facing functionality; the DB tables collided with both and were renamed in this sub-phase before any new 1C code references them.

What changed:

  • [x] DB migrations (early-dev, edit-in-place per CLAUDE.md): renamed featuresentitlements, plan_featuresplan_entitlements, organization_capabilitiesorganization_entitlements, plus the snapshot tables (organization_subscription_featuresorganization_subscription_entitlements, patient_tier_featurespatient_tier_entitlements, patient_subscription_featurespatient_subscription_entitlements). Updated FK names, indexes, triggers, RLS policies, the create_organization_companion_rows trigger function, and audit log entity_type strings.
  • [x] Go domain code: internal/core/domain/orgcapabilities/internal/core/domain/orgentitlements/ (full directory rename). Type renames (CapabilitiesEntitlements, FeatureEntitlement, PlanFeaturePlanEntitlement, SubscriptionFeatureSubscriptionEntitlement, OrganizationCapabilityOrganizationEntitlement). Repository / service / handler method renames (ListFeaturesListEntitlements, ListPlanFeaturesListPlanEntitlements, etc.). Override-kind enum value 'feature''entitlement'. Audit context constant ContextCapabilityChangeContextOrgEntitlementChange (string 'capability_change''org_entitlement_change').
  • [x] Go middleware + principal: RequireFeatureRequirePlanEntitlement, RequireCapabilityRequireOrgEntitlement. Subject.HasFeature / Subject.HasCapabilitySubject.HasPlanEntitlement / Subject.HasOrgEntitlement. Subject.Features / Subject.Capabilities fields → Subject.PlanEntitlements / Subject.OrgEntitlements. SQL helper current_app_has_capability(cap_code)current_app_has_org_entitlement(entitlement_code). Service projection method subscriptions.Service.RecomputeCapabilitiesRecomputeOrgEntitlements.
  • [x] Wire-protocol error codes: feature_unavailableplan_entitlement_unavailable; capability_disabledorg_entitlement_disabled; missing_feature / missing_capabilitymissing_entitlement.
  • [x] OpenAPI spec: schema names (OrganizationCapabilitiesOrganizationEntitlements, FeatureEntitlement, PlanFeaturePlanEntitlement, SubscriptionFeatureSubscriptionEntitlement), endpoint paths (/v1/features/v1/entitlements, /v1/organizations/{id}/capabilities/v1/organizations/{id}/entitlements), operationIds, and prose updated.
  • [x] API client (packages/api-client/): generated types regenerated via pnpm openapi; Go DTOs regenerated via make openapi.
  • [x] UI labels (Console superadmin only — clinic / portal carry no entitlement-context strings yet): "Capabilities" card title → "Entitlements"; activity-feed mock action strings, audit-log filter enum, palette stat card label all updated.
  • [x] Documentation: data-model.md, plans-and-subscriptions.md, org-settings.md, patterns.md, middleware-composition.md, decisions.md, dependency-map.md, data-classification.md, error-envelope.md, gdpr-compliance.md, implementation-plan top-level + foundation.md cross-references all updated. Glossary's Forbidden Terms table marks the three renames as completed 2026-05-06.
  • [ ] Romanian translations: no entitlement-context strings exist in clinic / portal i18n bundles yet; no work needed in this sub-phase. Will land alongside whichever Layer 2 feature first surfaces an entitlement label to clinic / patient users.

Resolved design questions:

  • User-facing label in Clinic admin UI — settled: friendly labels for clinic-facing surfaces ("Plan benefits" / "What's included") when those surfaces ship; strict "Entitlements" for Console superadmin (already in place).
  • Naming for the two shapes of entitlement (boolean gates vs. quota limits) — kept the existing split: entitlements catalog (boolean gates) + limit_definitions catalog (quotas). 1C.9 only renamed the boolean side.

Acceptance:

  • [x] All six table renames land in migrations (3 originally scoped + 3 snapshot tables for consistency).
  • [x] All Go code migrated; grep -rn confirms zero features\b|plan_features|organization_capabilit|feature_code|capability_column|cap_code|current_app_has_capability|RequireFeature|RequireCapability|HasFeature|HasCapability|orgcapabilities|capability_change matches in entitlement contexts under services/api/.
  • [x] OpenAPI + API client regenerated; Go build passes; pnpm typecheck passes.
  • [x] UI labels updated (Console). en + ro for clinic / portal not applicable yet — no entitlement-context strings exist in those bundles.
  • [x] Glossary's Forbidden Terms table reflects the rename completion.
  • [ ] CI guard: a check that the forbidden words don't appear in new code in entitlement contexts. Deferred — defaulting to manual review at PR time until a forbidden-words sweeper script lands alongside the broader CI guard buildout in 1C.

Clinical "services" rename — deferred

The clinical-domain rename (servicesofferings, service_plansenrollments) is locked in the glossary as canonical taxonomy but the actual file/code rename is deferred until that area is built. No preemptive sweep. See glossary.md → Offering and the Forbidden terms deferral note.


1D. Admin Surfaces

Three apps with end-to-end admin functionality so each audience can run their own house.

Status (2026-05-07): Inventory audit produced at apps/docs/implementation-plan/1d-ui-inventory.md — 124 mounted routes, 24 domain handlers, 73 in-scope UI surfaces post-decision. OpenAPI catch-up + api-client regen shipped at commit 58dc1c4 (1D-prep — 71 missing operations added, D-9 naming alignment, GET /v1/me/clinics for D-8). All 12 decisions (D-1 through D-12) settled in the inventory doc; outcomes baked into the subsection scopes below. Open: the 1D.0 prerequisite gap-fillers + the 1D.4 primitives PR must close before per-app surfaces start.

1D.0 Prerequisite Backend Gap-Fillers

Small endpoints surfaced by the 2026-05-07 inventory that block specific per-app surfaces. Bundled here so 1D.1 / 1D.2 / 1D.3 / 1D.5 can ship without each one carrying its own backend slice. Foundation-tier endpoints, all small (read-only or single-row mutation), all RBAC + RLS + audit per the standard rules.

Open:

  • [x] GET /v1/admin/permissions — read-only catalog over permissions, ordered by code (COLLATE "C" for stable ASCII order). Superadmin-gated at route layer. Drops "migration introduced" column from the original surface — not load-bearing for C5, and the permissions table doesn't carry it. Add later if Console needs it.
  • [x] GET /v1/admin/role-templates — read-only viewer over system role templates (organization_id IS NULL AND is_system = TRUE) with permission grants resolved per-row. Foundation seeds 3 templates (admin / specialist / customer_support). Superadmin-gated. Propagation editor remains deferred per D-2.
  • [x] POST /v1/admin/platform-memberships + GET /v1/admin/platform-memberships + DELETE /v1/admin/platform-memberships/{principalId} — grant / list / revoke-all. Role enum app-layer-validated against {superadmin, support_engineer}. Every grant + revoke audit-logged with the new audit.ContextPlatformMembershipChange constant. Self-revoke guard NOT shipped — caller can revoke themselves; recovery is direct SQL per the table's migration comment. Add a guard in Console UI if needed.
  • [x] Decision settled — aggregator shipped: GET /v1/admin/subscriptions[?org_id=&status=&tier_id=&page=&limit=] — paginated cross-org subscriptions aggregator with org_name + org_slug + active_overrides_count enrichment per row. Same apiquery pagination as other list endpoints (default 50, hard cap 500). Iterate-orgs workaround was dead-on-arrival at production scale (5k+ active subscriptions migrating in on launch day per CLAUDE.md → Production Scale).
  • [ ] Decision settled — endpoint deferred: Per-org aggregate-stats endpoint (GET /v1/organizations/{id}/stats). Storage tracker doesn't exist; MRR shape isn't settled; no production-blocking gap. Console C8 cards ship mock at 1D close and the endpoint lands later (1E observability or production-launch-readiness) when storage tracking and MRR semantics are settled.
  • [ ] Deferred per D-2: System role template editor (mutation endpoints + propagation handler — grants propagate to all org clones; revocations do not). Cross-tenant propagation semantics need design before this ships. C6 stays read-only viewer in 1D.
  • [ ] Deferred (gap #7): Custom roles editor for clinics (POST/PATCH/DELETE /v1/organizations/{id}/roles + role_permissions mutations). Clinic L7 ships as system-roles-only in 1D; custom roles per org wait until cross-tenant propagation and per-org permission-catalog UX are designed.

1D.1 Console UI (platform operator)

Console manages the platform: orgs, users, plans, overrides, entitlement flags, platform-wide audit. Layer-1/2 of the four-layer authorisation model lives here; layer-3/4 lives in the Clinic admin UI.

Status: partially shipped. Foundation pieces below are open; the cross-tenant audit log viewer + DataTable foundation already shipped. Inventory rows referenced as Cn map to 1d-ui-inventory.md.

Already shipped:

  • [x] Organisations CRUD with profile + atomic owner provisioning (C7) — owner provisioned synchronously through auth.PrincipalProvisioner (Clerk createUser), magic-link welcome email via OwnerWelcome notify category. New staff onboard via the staff-invitations primitive (item below), not provisioning. See the ADR "Why owner uses provisioning, staff and patients use invitation" in decisions.md.
  • [x] Org detail: profile edit, members section, custom domains section (C9, C10, C14) — members section currently lists confirmed members; staff-invite affordance now writes through the staff-invitations primitive (Clerk Invitations API + bind-on-first-auth) per the ADR above; pending-invitations surface (list / revoke / resend) follows below.
  • [x] Cross-tenant audit log viewer at /audit-logs with server-side pagination + sort + filters + detail Sheet (C28; see 1D.4)

Open (always-on / aggregate scope):

  • [ ] Console staff-invitation surface (1B.12 — backend shipped, dialog landed): add a "Pending invitations" sub-section to the org detail members card: list pending/accepted/revoked (calls GET /v1/organizations/{id}/staff-invitations?status=), inline revoke + resend per row (POST .../invitations/{inviteId}/revoke|resend). The "Add staff" dialog already writes through POST /v1/organizations/{id}/staff-invitations; this grows the visibility side. Mirrors the Clinic-app surface defined in 1D.2.
  • [ ] Users page (C1): list staff humans across the platform — search by email/name, view memberships across orgs (uses organization_memberships.last_used_at from 1A.11), block/unblock, view per-user audit trail. Listing patient principals requires elevation (see break-glass below).
  • [ ] Platform memberships management (C4): grant/revoke superadmin + future support_engineer. Consumes the shipped 1D.0 endpoints: GET /v1/admin/platform-memberships[?role=&principal_id=], POST /v1/admin/platform-memberships, DELETE /v1/admin/platform-memberships/{principalId}. Every grant + revoke audit-logged with action_context = 'platform_membership_change' (the backend writes this — UI does not need to set it). Reminder: OpenAPI + packages/api-client typed wrappers bundle with this UI per project_ui_deferred_until_foundation — add entries to openapi.yaml when wiring the page.
  • [ ] Permission catalog viewer (C5, read-only): every registered permission with (code, resource, action, description). Consumes the shipped 1D.0 endpoint: GET /v1/admin/permissions. Endpoint returns rows sorted by code with COLLATE "C" for stable ASCII order (no client-side sort needed). "Migration introduced" column was dropped from the original surface — not present in the table or the response. OpenAPI + api-client bundle with this UI.
  • [ ] System role templates viewer (C6, read-only per D-2): list system role templates (admin, specialist, customer_support) and the permissions they grant. Consumes the shipped 1D.0 endpoint: GET /v1/admin/role-templates (returns each template with its permissions: [{code, resource, action}] array resolved server-side). Editor (mutation endpoints + propagation handler) deferred — see 1D.0 deferred items. OpenAPI + api-client bundle with this UI.
  • [ ] Plans / entitlements / limits catalog viewers (C18, read-only): every plan version, every entitlement code (regulated highlighted), every limit definition.
  • [x] Platform-scope consent purpose editor (C32, from 1B.9): list every scope='platform' purpose paired with its current platform-default consent_purpose_versions row + per-locale body editor for publishing a new version (POST /v1/admin/platform-consent-purpose-versions). Inserts at MAX(version)+1 for (purpose_code, organization_id IS NULL); triggers re-consent across the entire platform via 1B.9's current_required_consent_versions helper. Confirmation modal explains the cross-tenant blast radius before publish. Audit-logged.
  • [ ] Consent purpose catalog viewer (read-only, from 1B.9): every consent_purposes row with its scope, legal_basis, withdrawable. Org-scope purposes' platform-default fallback bodies are visible through the platform editor above. Edits to consent_purposes itself remain migration-only.
  • [x] Privacy notice template management (C31, from 1B.10): list legal_document_templates; create new version; publish writes one row per locale at MAX(version)+1. Audit-logged.
  • [ ] Org subscription management (C12, C16): list orgs with current base plan + active add-ons + active overrides; change base plan; attach / cancel add-ons; grant usage packs. Cross-org aggregator (C16) consumes the shipped 1D.0 endpoint: GET /v1/admin/subscriptions[?org_id=&status=&tier_id=&page=&limit=]. Each row carries organization_name + organization_slug + active_overrides_count (server-side enrichment — no per-row lookup). Pagination via the standard apiquery envelope. Per-org mutations (change plan / attach add-on / cancel) hit existing per-org subscription routes. OpenAPI + api-client bundle with this UI.
  • [ ] Sales overrides (C12, C17): grant a per-subscription override with required reason and optional expiry; list active overrides; revoke. Cross-org list (C17) reads from the same aggregator (GET /v1/admin/subscriptionsactive_overrides_count per row points at orgs with active overrides; per-row detail comes from existing per-org override endpoints). Grant / revoke endpoints already exist at POST /v1/organizations/{id}/subscriptions/{subId}/overrides + POST .../overrides/{ovId}/revoke. OpenAPI + api-client bundle with this UI.
  • [ ] Org entitlement flags (C12): per-org page showing all organization_entitlements flags; toggle (audit-logged with action_context = 'org_entitlement_change').
  • [ ] Org billing editor (C12): write access for billing email, address, encrypted tax ID, payment_provider config.
  • [ ] Console superadmin's own preferences page (C35, "Settings"): consumes GET /v1/me + PATCH /v1/me. Per D-6, this is NOT a place to edit organization_settings.feature_flags — that's engineering-internal and not a UI surface anywhere.
  • [ ] Clinic overview cards (C8): patient counter, MRR, storage-used per org. Aggregate-stats endpoint deferred at 1D.0 (storage tracker doesn't exist; MRR shape unsettled; no production-blocking gap). Cards ship mock at 1D close; the endpoint lands later in 1E observability or production-launch-readiness when storage tracking + MRR semantics are settled.
  • [ ] Audit log metadata viewer cross-tenant (C13, C28 already built — extends per-clinic slice): timestamps, actions, status codes; diff content masked unless break-glass-elevated.
  • [ ] Platform service providers (C26 — confirmed in 1D.1 scope per D-5): Console superadmin CRUD over platform_service_providers (1C.2). List / create / get / update / delete platform-default + per-org-override providers (email/ses, storage/aws_s3, auth/clerk). Sidebar entry /platform-providers added at 1D close. Load-bearing for dedicated-tier rollout (Cat A per-org overrides). Gated by providers.manage or superadmin.
  • [ ] AI models registry (C27 — confirmed in 1D.1 scope per D-5): Console superadmin CRUD over the 1C.8 AI model registry. List / create / get / update / pricing. Sidebar entry /ai-models added at 1D close. Load-bearing for AI cost configuration. Gated by ai_models.manage or superadmin.
  • [ ] Break-glass sessions list (C29): list active + recent sessions; close session; view session detail. Foundational — required before C11's elevation modal can wire real flow. Sidebar entry /break-glass.

Removed from Console scope (decisions 2026-05-07):

  • Patient tiers cross-tenant (C19)D-1, stays clinic-side only.
  • Notification templates editor (C21)D-3, foundation templates stay migration-managed; F-tier features add per-template editing if needed.
  • Webhooks cross-tenant (C24)D-1, Cat C subscriptions stay clinic-side.
  • Connectors cross-tenant (C25)D-1, Cat B integrations stay clinic-side.
  • Feature flags page (C33)D-6, organization_settings.feature_flags JSONB is engineering-internal, not a UI surface.
  • Locales catalog (C34)D-7, i18n config is system-only (next-intl messages).
  • System health (C30)D-11, moved to 1E (consumes staging KMS / S3 / RDS / SES that exist after 1E.3). Page stays mock at 1D close.
  • F-tier sidebar stubs (/specialties, /services, /exercises, /forms, /announcements, /sales, /marketing, /compliance, /onboarding) — D-12, removed from sidebar at 1D close. They reappear when their backends ship.

Open (break-glass / elevated scope, gated by RequireBreakGlass):

  • [ ] Patient list per org (break_glass:patient_list).
  • [ ] Patient detail per org — profile, subscriptions, consent trail (break_glass:patient_detail).
  • [ ] Audit log full content cross-tenant — diffs, IPs, request bodies (break_glass:audit_full).
  • [ ] Cross-org patient lookup — narrow surface for DSAR routing of orphaned ex-patients (break_glass:cross_org_lookup).
  • [ ] Active break-glass sessions list — show all currently-open sessions across the platform; close / extend / audit (gated by break_glass.manage).
  • [ ] Elevation modal — common UI pattern that wraps any restricted route. Captures reason_category, reason_text, reason_ref, expires_in_minutes. Posts to POST /v1/break-glass/sessions.

1D.2 Clinic Admin UI (org self-service)

Clinic admin manages their own org without depending on a superadmin.

Status: parked behind the clinic-app refresh. The refresh comes first; every item below lights up against backends that are already stable in master. Inventory rows referenced as Ln map to 1d-ui-inventory.md. One backend gap remains (custom roles editor — see 1D.0 deferred items).

Open:

  • [ ] Org profile edit page (L1, gated by organizations.update).
  • [ ] Members section (L6): list with staff/patient split, inline role-change dropdown, remove (gated by organizations.manage_members).
  • [ ] Personal invitations surface (1B.12 — backend shipped):
    • [ ] Staff-invite tab under Members: list pending/accepted/revoked (calls GET /v1/organizations/{id}/staff-invitations?status=), invite-by-email form (POST /v1/organizations/{id}/staff-invitations body {email, role_code, expires_in_days?}, gated by organizations.manage_members), inline revoke + resend actions per row (POST .../invitations/{inviteId}/revoke|resend).
    • [ ] Patient-invite list page under Patients: same shape but gated by patients.manage, body uses {email, patient_tier_id?, expires_in_days?}, endpoint pair is .../patient-invitations.
    • [ ] Both surfaces should show "pending" / "accepted" / "consumed" / "revoked" / "expired" filter chips driven by the status query param.
  • [ ] Patient share-links surface (1B.12 — backend shipped): mint form with optional tier picker + max_uses + expires_at + note (POST /v1/organizations/{id}/share-links, gated by organizations.manage_share_links); list (GET .../share-links) with copy-code button + QR-code rendering (the public landing URL is https://{slug}.portal.restartix.pro/join/{code}); revoke per row (POST .../share-links/{id}/revoke); audit trail filterable to share-link redemptions in the per-org audit view.
  • [ ] /welcome landing page (1B.12 — backend shipped): the redirect target after auth-provider sign-up for staff who accepted an invite. The auth middleware's OnAuthHook has already created the membership row by the time the page loads (works for both new-user and existing-user flows); the page just shows "Welcome to Acme Clinic — you're now a Specialist" with links to relevant org surfaces.
  • [ ] Custom domains section (L3): list, add, verify, remove (gated by organizations.manage_domains).
  • [ ] Roles section (L7, system-roles-only at 1D close): list cloned system roles + their permissions (read-only); permissions catalog rendered as grouped checklist. Custom-role CRUD deferred per gap #7 (1D.0 deferred items) — no roles mutation surface exists in any handler. Adds organizations.manage_roles permission seeding now so the UI can be wired against the future mutations without a permission migration later.
  • [ ] Per-org audit log viewer (L21, read-only, filterable) — consumes 1A.1 writes; surfaces break-glass reads against the org with action_context='break_glass' filter.
  • [x] Legal documents editor (from 1B.10) — handles BOTH org_terms and org_privacy_notice through one editor surface. List page at /legal-documents; per-document editor at /legal-documents/[type] with structured form (one input per required_placeholders key, one checkbox per toggleable_sections[].key, defaults from template). Save Draft / Publish-with-confirmation; publish triggers re-consent for every existing patient. Dashboard task card surfaces unpublished documents to admins. Gated by organizations.manage_privacy_notice.
  • [ ] Settings page (L2, from 1B.2): marketing prefs, retention override, support locale, telerehab toggle (entitlement-mirrored, read-only) (gated by organizations.update_settings). Per D-6, feature_flags JSONB is engineering-internal and NOT exposed in the form — the PATCH .../settings endpoint may accept it for engineering use, but the UI does not surface or edit it.
  • [ ] Billing page (L4, from 1B.2, read-only): current plan, period dates, billing contact, upcoming renewal — write access lives in Console.
  • [ ] Locations section (L5, from 1B.5): list, create, update, close (status), delete (gated by locations.manage).
  • [ ] Patient Tiers section (L17, from 1B.4): list, add, edit, archive tiers; default-tier toggle (gated by patient_tiers.manage).
  • [ ] Patients list + detail (L11, L12): search, paginate, sort, archive; per-patient detail composing consents (L13), subscription (L14), impersonation history (L15, L16).
  • [ ] Per-patient consents view (L13, from 1B.9): list a patient's consent history at this clinic, current state per purpose, withdrawal as staff-action when needed (gated by consents.view_org / consents.manage).
  • [ ] Outbound webhook subscriptions (L18, Cat C — 1C.4): create / list / get / update / revoke / rotate-secret / list-deliveries / fire-test. Gated by organizations.manage_webhooks + EnforceLimit(max_webhook_subscriptions). Sidebar entry /webhooks.
  • [ ] Connected accounts (L19, Cat B — 1C.5 framework-only): per-org create / list / get / update / delete / test against the integration_services catalog. Gated by organizations.manage_integrations. Likely stub-only at 1D close — first Cat B catalog row + Connector impl + OAuth callback handler ships with first F-tier consumer (per 1C.5). Sidebar entry /connectors ships with stub copy until then.
  • [ ] Break-glass session banner (L22): when a platform staff break-glass session is open against this org, show an in-app banner with who/when/scope/reason; recent (closed within 30d) sessions listed in a "Platform support access" section.

Removed from Clinic admin scope (decisions 2026-05-07):

  • F-tier sidebar stubs (/calendar, /treatment-plans, /exercises, /forms, /specialists, /services, /specialties, /segments, /custom-fields, /pdf-templates, /automations, /reports, /billing-invoicing, /video-calls) — D-12, removed at 1D close. They reappear when their backends ship.
  • [ ] Staff impersonation oversight (1B.13 — backend shipped): list of all patient_impersonation_sessions across the clinic — filterable by staff member, patient, date range. Reads GET /v1/organizations/{id}/patient-impersonation-sessions[?staff_principal_id=&patient_id=&only_active=&limit=&offset=] (RLS-gated by patients.manage). DataTable foundation (1D.4). Per-patient impersonation history shown alongside the per-patient consents view. Backend contract is stable — no Go changes needed when this lands.

1D.3 Patient Self-Service (Portal)

Portal must be non-empty for a logged-in patient with no medical features. Mirror of "manage my org" for the patient audience.

Open:

  • [ ] Sign-up consent block (consumes 1B.9): clean checkbox UX for platform_terms + platform_privacy_notice (required); plus the org-scope required purposes (org_terms if the clinic published one, org_privacy_notice); plus optional marketing_email / marketing_sms / analytics / ai_processing / profile_sharing toggles. Submission writes the consents rows in the same transaction as POST /v1/portal/onboard (1B.8).
  • [ ] Onboarding form ("needs onboarding" UX): when is_patient_at_current_org=false, show form that posts to POST /v1/portal/onboard (already includes the consent block). Redirect to dashboard on success.
  • [x] Patient-invite banner on /onboard (1B.12): when the auth middleware's OnAuthHook has bound a patient invite for the current org, the page surfaces a "you've been invited to Acme Clinic" banner above the onboarding form. Discovery uses GET /v1/me/pending-invitations — admin-pool projection narrowly scoped to the calling principal's accepted-but-not-consumed invites. Works for both new-user (bind fires on first sign-in) and cross-clinic existing-user flows (bind fires on every authenticated request).
  • [ ] /join/{code} share-link landing page (1B.12 — backend shipped): anonymous landing page that reads code from the URL, calls GET /v1/public/share-links/{code} (per-IP rate-limited, no auth) to render branded "Join {org_name} — {tier_name}" CTA. Click → Clerk sign-up flow. After sign-up, the portal stashes share_link_code in a cookie (or URL state) and the /onboard page submits it on POST /v1/portal/onboard. 410 from public resolve renders "this link is no longer active"; 404 renders "link not found." Needs branding tokens from the org (logo, name) — fetch via the resolve response.
  • [x] Re-consent modal: blocking dialog when a consent_purpose_versions bump means the patient hasn't accepted the latest version. The portal (patient) layout probes GET /v1/me/required-consents (discovery endpoint mounted outside the RequireCurrentConsents 412 gate); non-empty result renders ReconsentModal alongside the page. Modal is genuinely blocking — no escape, outside-click, or close button — only Accept dismisses. Accept calls acceptRequiredConsents server action which re-grants every missing purpose; consents service supersedes the v1 active grant with withdrawal_reason='superseded_by_v{N}' (the org_terms cascade trigger correctly skips this path so re-acceptance does NOT trigger leave-clinic).
  • [ ] My profile page: view + edit patient_profiles fields (encrypted phone via 1A.3); read-only fields surface for what the org-side admin owns.
  • [ ] My subscription page: view active patient_subscriptions + tier features/limits; status; period dates.
  • [ ] My consents page (consumes 1B.9 trail view): full per-org and platform-level history with current state per purpose; toggles for withdrawable purposes (marketing_*, analytics, ai_processing, profile_sharing); "delete account to revoke" affordance for non-withdrawable platform purposes; "leave clinic" affordance for non-withdrawable org purposes (org_terms, org_privacy_notice — sets patients.deleted_at at that clinic).
  • [ ] My clinics page (P9 / overlaps with A4): consumes the shipped GET /v1/me/clinics endpoint (D-8, commit 58dc1c4) — returns one row per clinic the patient is at with {org_id, name, slug, primary_contact, dpo_email, ...} for DSAR routing without crossing the processor boundary. Same handler as 1D.5's A4; rendered with Portal chrome here, with platform chrome at 1D.5.
  • [ ] Out of 1D scope: Data export request (GDPR Art. 15/20) — F11 backend, listed for completeness only. UI work waits for F11.
  • [ ] Out of 1D scope: Account deletion request (full GDPR erasure across all orgs) — F11.1 backend, listed for completeness only. UI work waits for F11.1.
  • [ ] Access history view (1B.13 — backend shipped): per-clinic list of staff impersonation sessions on this patient — who opened it, when, the reason text, duration. Reads GET /v1/me/patient-impersonation-sessions[?organization_id=&only_active=&limit=&offset=] (RLS self-read on patient_impersonation_sessions cascades through current_human_patient_profile_ids() to span every clinic the patient is at; cross-org account surface 1D.5 consumes the unfiltered shape). Foundation-tier scope is session metadata only; per-action drill-down ("what entities were touched") is deferred to the future patient_account_activity projection (see Deferred Foundation Extensions) — patients never get SELECT on audit_log directly.
  • [x] Sign-out + locale selector + theme (P5).

Removed from Patient Portal scope (decisions 2026-05-07):

  • F-tier sidebar stubs (/appointments, /exercises, /treatment-plan, /forms, plus any others) — D-12, removed at 1D close. They reappear when their backends ship.

1D.4 Shared UI Patterns

Per D-4: this subsection ships FIRST as one packages/ui PR before any per-app 1D.1 / 1D.2 / 1D.3 / 1D.5 surface starts. Every per-app consumer composes against the same versioned primitives. Stricter than parallel-prototype because prototypes hardened on one app's first surface diverge from the version a second app starts against.

Status: partially shipped. Open items at the bottom — all four un-built primitives ship in the same PR.

Already shipped:

  • [x] DataTable (TanStack-backed, server-driven sort + filter + pagination + Sheet detail), MultiSelectFilter, AsyncMultiSelectFilter, DateRangeFilter.
  • [x] App shell + brand theme (sidebar/inset, OKLCH brand tokens, Poppins, light/dark sidebar, min-w-0 boundary).
  • [x] Listing-page pattern (fill mode, sticky toolbar, edge-to-edge tables).
  • [x] Branded 404 pages with i18n.
  • [x] Re-consent modal (1B.10) — first instance of the persistent-blocking-banner pattern.

Open (one PR, ships before any per-app surface):

  • [x] Empty / loading / error states standardised in packages/ui/patterns/EmptyState (icon + title + description + action; card / bare variants), LoadingState (Skeleton-based; table / card / page variants with role="status" + aria-live), ErrorState (matches EmptyState shape so list pages swap one for the other on load result).
  • [x] Toast / notification system for action results — sonner-backed. <Toaster /> mounts once at root layout; toast.success() / toast.error() from @workspace/ui/components/sonner. Ephemeral by design — never the canonical surface for form errors (<FormError />) or persistent state (<PersistentBanner />).
  • [x] Server-validation rendering (422 with field errors → form-level error display) — FormErrorState shape {error?, field_errors?} extends the existing useActionState {error} pattern. <FormError state /> renders form-level; fieldError(state, name) returns the per-field message for FormField's error prop. No form-library lock-in.
  • [x] Permission-aware UI helper: <RequirePermission code="..." /> wrapper for routes, buttons, table actions + useHasPermission hook + <PermissionsProvider> (Context, fed from /v1/me's current_permissions + is_superadmin). Superadmins bypass every per-org gate. Authoritative server-side gate is unchanged — this is UX, not security.
  • [x] Generalize the persistent banner pattern beyond re-consent — <PersistentBanner variant="info|warning|destructive|security" title description action? icon?> in patterns/persistent-banner.tsx. Consumers: C11 / L22 break-glass active banner, future impersonation banners, future compliance-review banners.
  • [x] QR-code component in packages/ui/components/<QRCode value size? level?> thin wrapper over qrcode.react SVG renderer. Consumed by L10 share-link mint UI.
  • [x] Edit locks primitive (backend + frontend; see Edit Locks below) — shipped 2026-05-10. internal/core/locks/ package (Store + Service + Handler + RequireLockHeld middleware + ResourceDef registry), 4 HTTP endpoints under /v1/organizations/{id}/locks/{resource}/{resourceId}, useEditLock hook + <EditLockBanner /> in @workspace/ui, integration tests in internal/core/locks/store_integration_test.go, P54 documented in patterns.md. Foundation ships the framework — the registry starts empty; F-tier consumers register their resource types in their own init() functions.
  • [x] Documented as canonical patterns in packages/ui/README.md — toast usage, form validation rendering, permission-gated UI, empty/loading/error states, persistent banners, edit locks, QR codes, plus the "adding a new primitive" workflow.

1D.4 Edit Locks (Design)

Pessimistic edit locks prevent two staff from concurrently editing the same record. WooCommerce-style: the first staff to open a detail page acquires a TTL'd Redis lock; subsequent openers see a read-only banner ("Maria is editing — since 14:32") and a "Take over" button. Mutations to lockable resources are guarded server-side: if the caller doesn't hold the lock, the write returns 409 Conflict. version columns on mutable tables are kept as defense-in-depth — locks are UX, version columns are correctness.

Generalises the appointment-slot Redis hold pattern: same primitive (TTL'd Redis key, owner-bound, atomic acquire), broader scope (any lockable resource type).

Decisions settled (2026-05-08):

  • Granularity: per-record. Each domain registers its lockable resource type with the lock middleware. Patient detail's sub-tabs (appointments / forms / treatment plan) lock independently — Maria can edit a consent while Andrei edits an appointment on the same patient.
  • Lock identity: per-principal. Same staff member with two tabs open does NOT lock themselves out. Self-races between tabs are caught by the version column on the underlying table, not the lock.
  • TTL: 120s lock / 45s heartbeat. Lower API chatter than WooCommerce's 150/15; recovers within ~2 min of tab close.
  • Takeover: allowed, audited. Second user clicks "Take over" → Redis key is overwritten with the new holder, the original session gets booted on next heartbeat (which now fails with lock_lost). Audit row: lock.takeover with (resource_type, resource_id, prior_holder, new_holder).
  • Read-only banner for non-holders — page loads but inputs are disabled with a banner showing holder + acquired-at + "Take over" button. Friendlier than a hard refusal.
  • Audit scope: takeover + write-blocked only. lock.takeover and lock.write_blocked (guard rejection) are security-significant. Acquire / heartbeat / release are operational metadata, exempt per CLAUDE.md "operational-metadata bumps are exempt".
  • Defense-in-depth: keep version columns on every mutable table the lock protects. Lock prevents the common case; version catches Redis hiccups, expired-mid-save races, and self-races between tabs.
  • No bypass. Org admins don't get a force-release shortcut — takeover is the escape hatch and it's audited. Keeps the model simple.

Backend (internal/core/locks/) — shipped 2026-05-10:

  • [x] Redis-backed lock store with atomic Acquire (SET NX + holder-returning conflict shape), Heartbeat (Lua-script EXPIRE-only-if-still-mine), Release (Lua-script DEL-only-if-still-mine), Takeover (unconditional SET, returns prior holder), Get, HeldBy. Key shape lock:{org_id}:{resource_type}:{resource_id}; lock value JSON {principal_id, acquired_at}. Default TTL 120s.
  • [x] HTTP endpoints POST/PATCH/DELETE/GET /v1/organizations/{id}/locks/{resource}/{resourceId}. Mounted under the per-org route group so P47 URL ≡ scope guard inherits.
  • [x] Mutate-guard middleware locks.RequireLockHeld(svc, resourceType, paramName) — extracts URL param, checks Redis, returns 409 with {holder_principal_id, acquired_at} in error envelope context. Applied via r.With(...) chain on PATCH/DELETE routes after the permission gate.
  • [x] Resource-type registry: locks.RegisterResource(ResourceDef{Type, Permission, Description}) at init time. Foundation registry starts empty; F-tier consumers register their resource types in their own init() functions. Handler validates URL resource segment against the registry; unknown types → 400.
  • [x] Audit on takeover: LOCK_TAKEOVER action verb (extending the open-ended audit.Action const), EntityType: "edit_lock", Before/After carrying both holders. Emitted via the standard audit.Record path inside the request tx (the takeover IS a write that commits, so the audit row commits with it).
  • [x] Audit on write-blocked — audit.RecordOutOfTx(ctx, event) shipped (internal/core/audit/recorder.go). Opens an isolated AdminPool tx so the audit row commits independently of the request tx (which rolls back on 409). The locks service's CheckHeld write-blocked path now emits a LOCK_WRITE_BLOCKED audit row with entity_type = "edit_lock", entity_id = resource_id, Before/After carrying holder + caller principal. Slog stays as the operational debug signal; audit is the forensic source of truth.

Frontend (@workspace/ui/hooks/use-edit-lock) — shipped 2026-05-10:

  • [x] useEditLock hook — acquires on mount, heartbeats every 45s, releases on unmount, polls every 10s while held_by_other, best-effort navigator.sendBeacon on pagehide (consumer-supplied URL). Returns a typed status (acquiring | held_by_self | held_by_other | lost | error) the form switches on. Action-agnostic — accepts caller-supplied acquire / heartbeat / release / get callbacks so the hook stays in @workspace/ui (no Next.js coupling).
  • [x] <EditLockBanner /> component in @workspace/ui/components/edit-lock-banner — two variants: amber held_by_other (with optional "Take over" button) + destructive lost (with "Try to acquire again" button). Caller supplies localised label strings (next-intl in scope at the consumer).
  • [ ] Form-level integration patterns — first F-tier consumer wires the hook + banner into its detail page (deferred until F1+).
  • [x] Server-action 409 handling — the api-client throws ApiError with error.status === 409 and error.code === "lock_held_by_other" | "lock_lost"; the consumer's server-action wrapper catches and surfaces via the action's typed result so useEditLock can transition state cleanly.

Migration touch:

  • [ ] Add version INTEGER NOT NULL DEFAULT 1 column to mutable tables that will be edit-locked. Pre-prod, so edit the original CREATE TABLE migration in place. List of affected tables determined per-feature when the lock subscribes — foundation lands the primitive; each per-app surface in 1D.1/1D.2/1D.3 wires its detail page through the hook and adds the column to the table it edits.
  • [x] Pattern entry P54 (Edit Locks) added to patterns.md.

Out of scope here:

  • Realtime presence ("Maria is editing this" surfacing without page-level acquire). HTTP polling on GET /v1/locks/{resource}/{id} covers the read-only banner refresh; WebSocket-driven presence is a Layer 6+ concern if it ever ships.
  • Collaborative-edit merge (Figma-style). Hard non-goal — the platform is staff-blocking-staff, not concurrent-edit.
  • Locks across browser sessions for the same principal — per-principal identity means the same staff can edit from multiple tabs without locking themselves out; self-races are version-column-only.

1D.5 Cross-Org Account Surface (Patient Platform-Level View)

Each clinic portal scopes the patient to that clinic's view by RLS — demo.portal.restartix.pro shows Demo's data, acme.portal.restartix.pro shows Acme's, never blended. That posture is correct (the platform is processor for each clinic separately; blending is joint controllership per Art. 26). But a patient enrolled at multiple clinics still needs one place to see the union of their own data, manage cross-org actions (account deletion, DSAR routing per clinic), and discover which clinics they're at.

This surface is the answer. It runs on a platform-owned hostname (account.restartix.pro or similar) with no clinic branding, no X-Organization-ID header, and no per-org RLS context. The session is patient-portable: set_app_principal (no org), so existing RLS policies (consents_select_self, patients_select_self, patient_subscriptions_select_self, etc.) return the cross-org union via the current_app_org_id() IS NULL branch they already carry.

Why this is a foundation item, not a feature: the Portal-per-clinic policies were tightened during 1B's RLS hardening to prevent Demo→Acme bleed; the cross-org "see everything you have across all your clinics" UX is the corresponding patient-side affordance. Without it, a patient at multiple clinics has no single place to manage account-wide concerns. Must ship before staging cuts over (1E.3) so 1E.2's setup-a-clinic acceptance test can validate the full multi-clinic flow against real hostnames.

Open:

  • [ ] Hostname provisioning: account.restartix.pro (or final name) — DNS, ACM cert, Route53 entry; identical infra shape to console.restartix.pro.
  • [ ] Frontend app: new Next.js app under apps/account/ (or extend an existing one with a new layout). Auth via Clerk same as portals. No org-resolver — the proxy does NOT set X-Organization-ID.
  • [ ] Middleware composition: the account.* host hits /v1/me/* routes through RequirePrincipalRLS only, with CurrentOrganizationID = uuid.Nil enforced. attachRLSConn dispatches to set_app_principal(P) (no org context) — patient-side RLS policies then return the cross-org union.
  • [ ] My consents (cross-org view): consume the same GET /v1/me/consents endpoint with no org header. Returns every active + withdrawn consent across every clinic the patient has ever been at, plus platform-scope rows. Toggles for self-withdrawable purposes work per-row (the row carries organization_id; the withdraw call is org-attributed correctly because the row's column drives the cascade).
  • [ ] My clinics page (A4): consumes the shipped GET /v1/me/clinics endpoint (D-8, commit 58dc1c4) — returns each clinic's name, primary contact, and DPO email (field-filtered subset of organization_billing; no other billing data leaks) for DSAR routing without crossing the processor boundary. Same handler as Portal's P9; rendered with platform chrome here.
  • [ ] My profile (portable): view + edit patient_profiles (the portable identity, no organization_id). Same handler as Portal's /v1/me/patient-profile — read returns the portable row; edit propagates to every clinic the patient is at without per-org duplication.
  • [ ] Account deletion request: full-account erasure entry-point (consumes F11.1's job pipeline when it ships; trigger + UI + queued job ship in 1B's account-deletion subset). Distinct from "leave clinic X" (which lives in the clinic's own portal under org_terms withdraw).
  • [ ] Cross-org data export request: GDPR Art. 15/20 pull spanning every clinic the patient is at. Each clinic's slice routes to that clinic for fulfilment via the same per-clinic queue used by the portal-side export request.
  • [ ] Access history view, cross-org (1B.13 — backend shipped): every staff impersonation session against this patient across every clinic. Reads GET /v1/me/patient-impersonation-sessions (no organization_id filter — RLS self-read returns the union); patient sees who at which clinic accessed their record, when, and why. Foundation-tier scope is session metadata only; per-action drill-down via the future patient_account_activity projection if/when patient UX needs more — patients do NOT get SELECT on audit_log.
  • [ ] Locale + theme + sign-out: same pattern as Portal.
  • [ ] Acceptance test: a patient enrolled at two clinics signs in to account.restartix.pro, sees both clinics in their list, sees consents from both clinics in one view, withdraws a marketing consent at clinic A and verifies it stays granted at clinic B (per-clinic scope preserved despite the unified view), triggers a cross-org export, and signs out. Add to 1E.2's setup-a-clinic acceptance test before staging cuts over.

Out of scope here: clinical data (treatment plans, exercises, telerehab) — those are per-clinic by design and the patient sees them at the clinic's portal. Cross-org medical surfaces are a Layer 5+ concern if they ever ship.


1E. Foundation Gate

Three closes happen here: documentation, acceptance test, staging deployment.

1E.1 Foundation Gate Documentation

Status: shipped. STOP callout, Phase Discipline section in CLAUDE.md, gate referenced from /new-domain + /new-migration skills, four CI gates wired into make check (check-classification, check-soup, check-migrations, check-events).

1E.2 Setup-a-Clinic Acceptance Test (was 1.23)

Status: shipped end-to-end against local. Scenarios cover org provisioning + companion fan-out, plan management (Free → Pro, add-on stacking, override grant/revoke), capability flag flip, tier management (add/flip default/inactive), patient signup + portal onboarding, four-gate middleware paths, audit attribution (human actor, system actor). Detail: setup_clinic_test.go.

  • [ ] Re-run against staging once 1E.3 ships.

1E.3 AWS Staging Deployment

Foundation isn't done until it works in the target environment. Custom-domain TLS via Cloudflare for SaaS, two-pool RLS against managed Postgres, multi-subdomain cookies, Clerk in production mode, KMS-backed encryption, S3 round-trips — none of this validates on *.localhost. Single-AZ staging only; full prod hardening (Multi-AZ, autoscaling ceilings tuned to real traffic, alerting fan-out) is the production deploy in F11. 1E.3 is the foundation gate — staging only, no real patients. Real-clinic launch is the separate operational gate at production-launch-readiness.md.

The full topology, sizing, and cost shape are in aws-infrastructure.md; the deploy mechanics in deployment.md; the Terraform module layout in iac-layout.md; the architectural rationale (why ECS Fargate over App Runner, why Aurora Serverless v2 for staging, why Cloudflare for SaaS for custom domains, why Terraform) in decisions.md. 1E.3 is the work that takes those documents from "specified" to "running."

Stack settled (2026-05-07). ECS Fargate everywhere · Aurora Serverless v2 (single-AZ, scale-to-zero) for staging Postgres · ElastiCache Redis (single-node) · S3 + KMS + Secrets Manager + ECR · SES (production identity verification opens here) · Cloudflare for DNS, CDN, WAF, and per-tenant custom-domain TLS via Cloudflare for SaaS · Terraform as the IaC tool with state in S3 and native conditional-write locking (use_lockfile = true; no DynamoDB).

Provisioning (Terraform-first, no Console clicks for anything reproducible):

  • [ ] Create Terraform module skeleton per iac-layout.md: infra/modules/network, infra/modules/database, infra/modules/ecs-service, infra/modules/cache, infra/modules/storage, infra/modules/observability; infra/envs/staging and infra/envs/production consume them
  • [x] State backend: S3 bucket restartix-tfstate (encrypted, versioned, public-blocked) with native conditional-write locking (use_lockfile = true per env). No DynamoDB table needed. Shipped 2026-05-13 via infra/envs/bootstrap apply; S3 native locking migration landed same day (replacing the initial DynamoDB-lock-table setup from 7f9470b).
  • [x] GitHub Actions OIDC provider + deploy IAM role with least-privilege Terraform-apply policy. Shipped 2026-05-13. Two roles: restartix-deploy-staging (trusts repo:RestartiX/restartix-platform:ref:refs/heads/master) and restartix-deploy-production (trusts repo:RestartiX/restartix-platform:environment:production, gated by GitHub Environment approval). ARNs in infra/envs/bootstrap outputs.
  • [ ] Provision staging VPC: 2 public + 2 private subnets, t4g.nano NAT instance (single AZ — staging accepts the SPOF), VPC endpoints for S3 / ECR / Secrets Manager / KMS / CloudWatch Logs
  • [ ] Provision Aurora Serverless v2 cluster in staging: 0.5–2 ACU range, scale-to-zero enabled, single-AZ, 1-day backup retention, parameter group with rds.force_ssl=1 + shared_preload_libraries=pg_stat_statements, extensions per 1A.16
  • [ ] Provision ElastiCache Redis (cache.t4g.micro single node, encryption in transit + at rest)
  • [ ] Provision ECR repos with lifecycle policy (last 20 tagged, untagged > 7d deleted)
  • [ ] Provision S3 buckets: restartix-uploads-staging, restartix-audit-archive-staging with versioning + block public access + lifecycle policies (audit archive: Standard → Glacier IA at 90d → Deep Archive at 365d)
  • [ ] Provision customer-managed KMS key in eu-central-1, used as Secrets Manager envelope key for restartix/{env}/encryption (column-encryption keyring + pg_dump envelope key). Key policy: Fargate task role gets kms:Decrypt against the SM context; operations role gets full lifecycle. Direct kmsKeyring (per-data-key KMS calls) is Phase 2 — not wired here.
  • [ ] Provision Secrets Manager secrets per aws-infrastructure.md → Secrets management
  • [ ] Provision ALB with ACM wildcard cert for *.restartix.pro (DNS validation via Cloudflare TXT records) + listener rules for host-based routing across all services
  • [ ] Provision ECS cluster + task definitions + services for: Core API, Telemetry API (sizing per aws-infrastructure.md → Telemetry sub-stack once those decisions land), clinic, portal, console, pgbouncer
  • [ ] Provision EventBridge Scheduler rules for audit-partition-roll, usage-quota-reset, usage-summary-rollup, check-providers, expired-sessions-sweep
  • [x] Provision SES production identity for the staging sender domain — DKIM + SPF + DMARC records added in Cloudflare; sandbox-exit ticket opened with AWS Support (24–48h turnaround). Shipped 2026-05-13 for restartix.pro (platform sender domain; same identity serves staging + production). Sandbox was already exited at the account level; no ticket needed.
  • [ ] Provision baseline CloudWatch alarms per monitoring.md → ECS Fargate & CloudWatch Monitoring
  • [ ] Cloudflare for SaaS configured: zone settings, custom-hostname API token, SSL/TLS edge certificate origin pointing at ALB

Application-layer prereqs (Core API code, lands before or with the IaC):

  • [x] Cloudflare for SaaS Custom Hostnames Go client — new package services/api/internal/integration/cloudflare-saas/ wrapping the Custom Hostnames API (POST /custom_hostnames to register, GET /custom_hostnames/{id} to poll provisioning status, DELETE /custom_hostnames/{id} to deregister). Auth via API token from Secrets Manager (restartix/{env}/cloudflare). Shipped 2026-05-13. Hand-rolled net/http + httptest-based tests; 16 test cases covering happy paths + each sentinel error + the structured Cloudflare error envelope. Config env vars (CLOUDFLARE_SAAS_API_TOKEN, CLOUDFLARE_ZONE_ID, optional CLOUDFLARE_API_BASE_URL) added to config.go. Console handler at POST /v1/admin/organizations/{id}/custom-domain not in scope here — that consumer ships when the custom-domain admin UI lands. Status polling strategy (inline UI vs scheduled task) is the consumer's call; the client exposes the primitives.

Deploy + validation:

  • [ ] CI/CD pipeline operational per deployment.md: branch protection on master, GitHub Actions builds + pushes to ECR + runs migrations as one-shot ECS task + triggers ECS rolling deploys, with a manual approval gate before production deploys (production environment isn't built yet at 1E.3 — the gate exists in the workflow definition for the F11 production rollout)
  • [ ] Deploy Core API + Telemetry API + Clinic + Portal + Console to staging
  • [ ] Re-run 1E.2's setup-a-clinic test list against staging end-to-end
  • [ ] Custom-domain end-to-end via Cloudflare for SaaS: org adds real custom domain in Console → backend calls Cloudflare for SaaS Custom Hostnames API → returns CNAME target → record set in clinic-side DNS → Cloudflare provisions Let's Encrypt cert → app renders at the custom domain → cookies and Clerk both work
  • [ ] Two-pool RLS validated against Aurora Serverless v2: admin + restricted-role pool both acquire connections cleanly under synthetic load; restricted role's default privileges enforce RLS as designed
  • [ ] Multi-subdomain cookies: org-id set on clinic.restartix.pro is read on {slug}.clinic.restartix.pro
  • [ ] Clerk in production mode (not test keys); sign-in / sign-up / JWT verification / blocked-user 403 all work
  • [ ] HTTPS / HSTS verified end-to-end (Cloudflare → ALB → Fargate); security headers present in responses
  • [ ] Customer-managed CMK envelopes restartix/{env}/encryption; Core API boots successfully and round-trips ciphertext through the in-memory keyring loaded from the KMS-protected SM secret (verifies the SM → keyring → AES-GCM path end-to-end against real KMS)
  • [ ] S3 bucket from 1A.8 wired (org-scoped uploads work in real env via signed URLs)
  • [ ] SES production identity verified; FakeChannel replaced with real EmailChannel pointing at SES; foundation MemberInvite + BreakGlassOpened templates send through to a real inbox
  • [ ] Scheduled tasks (audit-partition-roll etc.) firing on schedule, audited correctly
  • [ ] Make.com end-to-end smoke test for outbound webhook subscriptions (1C.4 closing item)
  • [ ] Documentation reflects the deployed staging shape: aws-infrastructure.md, deployment.md, iac-layout.md, scaling-architecture.md, monitoring.md, backup-disaster-recovery.md

Out of scope here (closes in F11 production deploy):

  • Multi-AZ posture (RDS Multi-AZ, NAT Gateway with HA, two pgbouncer tasks across AZs)
  • Production RDS Postgres instance — staging stays on Aurora Serverless v2; production is a separate infra/envs/production apply
  • Read replicas
  • Production-grade alarming + dashboards beyond the staging baseline
  • Cloudflare WAF rule tuning beyond the managed ruleset
  • Auto-scaling ceilings tuned to real traffic
  • Sentry production project (staging Sentry project is enough for 1E.3)
  • Automated cross-region replication for backups (Layer 3 of backup-disaster-recovery.md)

Cost target: under $100/mo idle (currently estimated ~$97/mo). Telemetry is not part of the 1E.3 staging gate — it ships as a Layer 2 service after foundation closes (~+$7/mo when added). See aws-infrastructure.md → Cost: staging and Telemetry sub-stack.


The substantive design for consents, controllership, and break-glass access is recorded in decisions.md → Why clinic is controller, platform is processor. Summary of what landed:

  • Single consents ledger spanning platform-scope and org-scope purposes, with a legal_basis discriminator (contract / legitimate_interest / consent / legal_obligation / vital_interest) and a withdrawable derived flag. Ships in 1B.9.
  • Privacy notice template + clinic fill-in instead of a fixed platform notice. Ships in 1B.10.
  • Break-glass access for any identifiable cross-tenant patient data in Console — per-org scope, time-bound, justification-required, always-on clinic notification, audited. Ships in 1B.11.
  • Cross-tenant features anonymise by default — codified as a foundation principle. Joint controllership (Art. 26) is the failure mode this rule prevents.
  • DSAR routing flows through the clinic, never the platform. The platform's role is auto-respond + portal self-service ("your clinics" list); break-glass is the last-resort path for orphaned requests.
  • Tier B medical consents (telemedicine, video recording, biometric capture, treatment-specific) layer on top of 1B.9 in F3.5 — same table, source='form' rows, multi-modal signature capture (in-portal click, drawn-on-tablet, sent-to-phone).

Deferred Foundation Extensions

Designed but not in any sub-phase's checklist — explicit so they don't get forgotten when the trigger arrives. Each entry names: what, current status, the trigger that lights it up, where the design lives.

  • Platform-level non-human actors (observability agents, cross-org metric aggregators, cross-org audit aggregators)

    • Status: principals.organization_id is nullable so a platform-level agent / service-account row CAN exist; what's missing is the grant mechanism — platform_memberships is human-only by CHECK constraint.
    • Trigger: first observability or cross-org operational feature.
    • Decision when triggered: drop the human-only CHECK on platform_memberships and add non-superadmin platform role codes (e.g. metrics_observer). One table, expanded to non-humans when needed — no separate grant table.
    • Design ref: data-model.md → Area 1 future-sibling note; the 1B.1 ADR's discussion of platform-level actors.
  • Service account authentication flow (the entire integration-auth surface — not just per-key scoping)

    • Status: schema ships in 1B.1 (service_accounts table with api_key_hash, api_key_prefix, lifecycle columns). Operational flow does not exist:
      • No endpoint to create a service account or generate an API key
      • No middleware that resolves an inbound API key to a principal_id
      • No revocation / rotation endpoints
      • No Console / Clinic admin UI surface
      • No per-key scoping (one key per service account, scope = principal's role)
      • No per-key rate-limit knobs
    • Trigger: first concrete external integration that needs service-account auth (Zapier, EHR sync, custom backend integration).
    • Decision when triggered: design the full lifecycle — key creation endpoint that returns the secret once + stores the hash, auth middleware that resolves Authorization: Bearer sa_live_* keys to a principal_id, revoke/rotate endpoints, admin UI. Per-key narrowing (allowed_scopes TEXT[] or a junction table) and per-key rate-limits land in the same wave. Pin the scoping shape against the integration's actual needs rather than designing speculatively.
    • Design ref: data-model.md → service_accounts.
  • Delegation feature using parent_principal_id (the column ships today, semantics defined per-feature)

    • Status: column exists on principals; nothing reads or writes it.
    • Trigger: first feature requiring "principal acts on behalf of human X" attribution — most likely an AI-agent feature in F-tier.
    • Decision when triggered: define delegation semantics (parent's permissions cap the child's? time-bound? per-resource?), surface them in audit reads, add UI for granting/revoking delegation.
    • Design ref: principals ADR ("Scope kept tight"); data-model.md → principals row.
  • Patient-facing account-activity projection (patient_account_activity or similar) — curated activity feed surfaced in Portal + cross-org account surface (1D.5). Foundation principle codified by this entry: patient transparency surfaces are projections, never raw audit_log exposure. audit_log stays staff/forensic-only; patients see purpose-built feeds with privacy-appropriate framing.

    • Status: not built. Today, patients see their own data via per-table self-read RLS — consents (consent trail at /me/consents), patient_impersonation_sessions (access history at /me/patient-impersonation-sessions when 1B.13 lands), break_glass_sessions indirectly via the clinic admin banner. This covers foundation-tier transparency. A unified "things that happened on my account" feed crossing all of these is not.
    • Trigger: first concrete patient UX that needs more than a single source-of-truth table self-read — e.g., a unified activity dashboard, or a per-action drill-down on impersonation sessions ("what entities did the staff touch during this session"). Patient-side audit-row drill-downs land here, not via direct audit_log SELECT.
    • Decision when triggered: design the projection table (likely patient_account_activity partitioned monthly per the events-partitioned/state-not rule, populated by triggers from audit_log + sessions + consents + login events). Filter at the trigger to patient-relevant rows only; skip operational-metadata bumps. Keep the surface tight — no raw request_id, ip_address, user_agent, or technical audit columns; only "X happened on Y date" framing.
    • Design ref: this section; the 1B.13 design discussion that surfaced the principle ("audit_log is technical and raw, patient-facing wants account-activity framing").
  • Session permission-revocation sweep (covers 1B.11 break-glass + 1B.13 impersonation)

    • Status: gap intentionally accepted in foundation. When a principal who has an open session (break-glass or impersonation) loses the permission that authorised it (membership removed, role demoted, custom role edited to drop the permission), open sessions are NOT auto-closed — they live until expires_at (max 4h). The middleware re-checks the session row, not the permission, on each request. Same gap exists for both primitives.
    • Trigger: compliance review (Romanian DPA, clinic procurement) flags the residual window, or a real incident makes the gap concrete.
    • Decision when triggered: hook on organization_memberships UPDATE/DELETE and role_permissions UPDATE → close any open sessions for the affected (principal × org) with closed_at = NOW() and a system-close reason. Cheap to bolt on later; the partial unique index on active sessions and the close-path service-layer logic are already in place.
    • Why deferred: max 4h cap bounds the residual window; product impact is low; foundation discipline argues against speculation. The gap is documented here so future incident review or a deliberate hardening pass can find it.
    • Design ref: 1B.11 break-glass middleware + 1B.13 impersonation middleware; this entry.
  • Marketplace Mediation (Patient → Clinic Payments via Platform) — strategic future product offering where the platform mediates patient-to-clinic payments end-to-end (patient pays platform; platform pays clinic minus a fee). Distinct from Option A (clinic uses their own payment provider for patient billing; platform never touches the money — supported today via Cat C webhooks). Both options coexist: clinics with existing payment infrastructure stay on Option A; clinics that want a turnkey solution opt into marketplace mediation when it ships.

    • Status: strategic design pending. Foundation accommodates via four Cat A capability skeletons declared in 1C.1 (payment.Provider, invoicing.Provider, patient_payment.Provider, clinic_payout.Provider) — no implementations yet. Foundation does NOT lock the engine.
    • Trigger: company is ready to make the strategic move (legal review of payment institution licensing, fee model decision, capacity to onboard each clinic with KYB/KYC, dedicated payment-ops engineering work). Likely 12+ months post-launch; pace driven by demand from clinics that lack their own payment infrastructure.
    • Decision when triggered: pick mediation provider per market (Stripe Connect for international + EU; Romanian alternative TBD — Netopia has marketplace features, may need legal review; PSD2 considerations for EU). Define fee model (per-transaction percentage vs. flat vs. hybrid; per-tier differentiation). Build patient-facing payment UI in portal (Cat A patient_payment.Provider impl). Build clinic onboarding flow for KYB/KYC (Stripe Connect Express or Custom). Build payout management (clinic_payout.Provider impl + payout scheduling). Define refund/dispute handling (platform reverses; clinic balance debited). Romanian-specific: VAT + e-Factura split between platform's fees and clinic's revenue (separate ADR with accounting/legal review).
    • What foundation reserves to keep this option open (no schema or code change today; just principles to honor):
      • patient_subscriptions (1B.7) stays informational at foundation but the schema accommodates extension with payment metadata when marketplace ships.
      • The four billing capability interfaces declared at foundation (1C.1 list) cover the marketplace use cases — patient_payment.Provider for patient-facing payment, clinic_payout.Provider for sending money to clinics. Real impls slot in via the same Cat A pattern as everything else.
      • organization_integrations.config (Cat B, 1C.5) accommodates clinic-side integrations like a clinic's own Stripe/Netopia account FOR Option A — webhook subscriptions push subscription state into our system from the clinic's payment provider (Cat C / Cat D mix depending on direction).
    • Design ref: glossary → Marketplace mediation; patterns.md eventual P-entry when first impl ships; this entry.
  • Tenant Isolation Foundation (tenancy_mode reservation) — foundation establishes the tenancy-topology discriminator and the draft-state lifecycle reservation. The only structural axis foundation locks in is identity (per-tenant Clerk org); per-tenant storage and per-tenant encryption are deferred entitlements, not foundation work, and ship later in one PR each alongside the operational mechanism they depend on. Both modes (shared and dedicated) target SMB clinics; hospital networks and dedicated-infrastructure tiers (per-tenant RDS/Redis/CloudHSM) are permanently out of scope (see CLAUDE.md → Project Overview).

    • Status: architectural commitment settled. Today only shared mode is sellable end-to-end; dedicated is a schema reservation — no creation flow accepts it and no provisioning code provisions per-tenant Clerk orgs. Full spec at features/platform/tenant-isolation.md. Runtime dedicated-mode build deferred until a paying contract funds the operational setup.
    • Foundation pre-work in 1B (shipped)humans partition column ready for per-tenant identity namespace:
      • [x] Add humans.provider_org_id TEXT NULL column to migrations/core/000002_tenancy_rbac.up.sql (no FK initially — the FK target lands with the dedicated-mode runtime feature).
      • [x] Replace humans.email NOT NULL UNIQUE with UNIQUE (email, provider_org_id) NULLS NOT DISTINCT in the same migration. Functionally identical for shared-mode tenants today (all have NULL provider_org_id); future-proofs for dedicated mode where the same email can exist once per auth-provider tenant.
    • Foundation pre-work in 1E (shipped)organizations.tenancy_mode + draft-state reservation:
      • [x] Add organizations.tenancy_mode TEXT NOT NULL DEFAULT 'shared' CHECK (tenancy_mode IN ('shared', 'dedicated')) to migrations/core/000002_tenancy_rbac.up.sql. Single-enum topology discriminator; see decisions.md → Why tenancy_mode is a single enum, not multi-axis.
      • [x] Add organizations.activated_at TIMESTAMPTZ NULL to the same migration. NULL = draft (org row exists but unroutable from public endpoints); non-NULL = active. Every creation path today sets activated_at = NOW() in the same transaction as the INSERT; the NULL state is a reservation for the future dedicated-mode async provisioner. See decisions.md → Why activated_at as the org draft-state mechanism.
      • [x] Add partial index CREATE INDEX idx_organizations_draft ON organizations(id) WHERE activated_at IS NULL for cheap draft-org lookups when the future provisioner needs them.
      • [x] Column-classification registry entries for tenancy_mode and activated_at (data-classification.md → organizations).
      • [x] Column entries in data-model.md → organizations.
      • [x] Public org resolve handler (GET /v1/public/organizations/resolve) gates on activated_at IS NOT NULL — returns 404 for draft orgs.
      • [x] Owner-welcome dispatch gated on activated_at IS NOT NULL — welcome email is fired by activation transition, not by raw INSERT.
    • Deferred (out of foundation scope; ships when the first paying dedicated contract closes — see features/platform/tenant-isolation.md → Deferred design surface for the canonical narrative):
      • Per-tenant Clerk org provisioner (Clerk Backend API integration; writes the dedicated provider_org_id per tenant).
      • Re-introduced finalize-provisioning endpoint with proper preconditions (Clerk org exists, platform_service_providers overrides written) that flips activated_at = NOW() and queues the welcome email.
      • Addons via entitlements catalog: own_s3_bucket (ships with the exit / portability tool) and own_cmk (ships with the documented crypto-shred runbook). Available on either tenancy mode; not coupled to tenancy_mode = 'dedicated'.
      • Terraform module for per-tenant infrastructure (S3 bucket + CMK + IAM bindings + Clerk org wiring).
      • Operational templating for dedicated mode (custom DNS / ACM cert / per-tenant SES / SMS / Daily.co domain — the universal branding pieces work on shared mode too).
      • Dedicated-mode DPA template (legal-counsel work; can run in parallel).
      • Pricing model: one-time setup fee + premium MRR uplift + termination service fee.
    • Design ref: features/platform/tenant-isolation.md; decisions.md → Why tenancy_mode is a single enum, not multi-axis; decisions.md → Why tenant-isolation has its own controllership story; decisions.md → Why activated_at as the org draft-state mechanism.