Foundation
The actor-identity stack, cross-cutting runtime, capability + integration architecture, and admin surfaces — across all three audiences (superadmin, clinic staff, patient). Built once, before any feature lands. See the implementation plan index for the framing.
Acceptance. A superadmin manages orgs and platform-level templates (privacy notices, plans, entitlement flags) from Console; a clinic admin manages their org's settings, billing, members, roles, domains, tiers, privacy notice, integrations, and audit log from Clinic; a patient signs up, accepts the platform + clinic consents, onboards, manages their own profile, subscription, consent toggles, and data export from Portal. All three flows pass on AWS staging with real Clerk, real KMS, real S3.
Sub-phases. 1A (cross-cutting runtime), 1B (identity & tenancy), 1C (capabilities, integrations & metering — new), 1D (admin surfaces), 1E (foundation gate). Taxonomy used here is canonical per glossary.md.
Discipline. Foundation gaps that surface after a sub-phase ships live as new sub-sections under that sub-phase. Layer 2 (Features) does not start until 1E closes.
Phase 0 -- Boot (DONE)
Project scaffolding, DB foundation, three Next.js apps with Clerk auth, multi-tenant orgs, per-org RBAC, RLS, custom domain routing, role-gated dashboards. Everything that shipped in commit 3ade31c. Detail in the git history and in the architecture docs.
Status: shipped. Do not re-implement.
1A. Cross-Cutting Runtime
Every layer above uses these. Built once, hardened in 1E against staging.
1A.1 Audit Logging Infrastructure
Status: shipped. Centralised audit recorder writes one row per mutation with redaction; failed-request logging covers 401/403/5xx; append-only at the DB layer. Detail: patterns.md P10/P11, internal/core/audit/.
1A.2 RLS Integration Test Harness
Status: shipped. testcontainers-based harness exercises the full identity stack under the renamed RLS helpers. make test-integration runs 33+ test cases against real Postgres in ~2s warm. Detail: internal/test/rlstest/.
1A.3 Encryption Helper
Status: shipped. AES-256-GCM helper with versioned ciphertext, multi-version dual-decrypt rotation, in-memory keyring loaded from ENCRYPTION_KEYS (Phase 1 production path: keys live in a KMS-envelope-protected Secrets Manager secret). The kmsKeyring stub is reserved for Phase 2 (direct per-data-key KMS calls + BYOK) — see aws-infrastructure.md → Direct-KMS keyring + BYOK (Phase 2+). Detail: reference/encryption.md, internal/core/crypto/.
- [ ] Customer-managed CMK provisioned +
restartix/{env}/encryptionSM secret created under that envelope (closes in 1E.3)
1A.4 Soft-Delete Pattern
Status: shipped. deleted_at TIMESTAMPTZ NULL convention, partial-index pattern, repo helpers, RLS exclusion of deleted rows unless caller has data.view_deleted, GDPR anonymisation primitive. Detail: patterns.md P13, internal/shared/softdelete/.
1A.5 PII Redaction in Logs
Status: shipped. slog redaction of sensitive keys (password, secret, token, etc.), audit JSONB walker shares the same predicate, every log call site audited for raw PII, telemetry pseudonymisation helper ready for F10. Detail: patterns.md P11, internal/shared/redact/.
1A.6 Error-Response Surface Audit
Status: shipped. *AppError envelope everywhere, recovery middleware → generic 500 with request_id, validation 422 with field-level reasons, no leaky DB internals. Detail: reference/error-envelope.md.
1A.7 API Contract Conventions
Status: shipped. Pagination + sort + filter conventions, idempotency-key middleware, OpenAPI spec-first (oapi-codegen Go + openapi-typescript frontends) with three-way drift test, picker + date-range URL conventions. Detail: reference/api-conventions.md, apps/docs/openapi.yaml.
1A.8 File Storage (S3)
Status: shipped (bucket provisioning deferred to 1E). Org-scoped key prefixes, signed URLs, MIME validation, file-size caps per surface, bucket-policy authored, LocalStack integration test for cross-org isolation. Detail: reference/file-storage.md, internal/integration/s3/.
- [ ] AWS S3 bucket provisioned in staging (closed by 1E)
1A.9 Internal Event Bus
Status: shipped. In-process pub/sub with envelope, panic recovery, backpressure policy, schedule-stub interface, code → catalog drift check. Org lifecycle publishers wired. Detail: patterns.md P28, events.md, internal/core/events/.
1A.10 Translation Infrastructure (UI)
Status: shipped. next-intl wired in all three apps, en + ro seeded, org language_code drives default locale, convention documented. Detail: reference/i18n.md.
1A.11 Activity Tracking Columns
Status: shipped. Middleware bumps organization_memberships.last_used_at and patients.last_used_at with throttled in-process cache. Convention: best-effort, ~minutes precision, never a substitute for audit. Detail: reference/activity-tracking.md.
1A.12 Reserved Column Inventory
Status: shipped. audit_log reserved columns verified, organization_memberships.invited_at / invited_by / accepted_at reserved for 1B's invite flow. Detail: architecture/reserved-columns.md.
1A.13 Sensitive-Endpoint Rate Limiting + SOUP List
Status: shipped. Redis-based rate limiter with per-IP and per-principal extractors, auth_verify and public_resolve policies wired, SOUP inventory for backend + frontend deps with AI/ML model schema, cmd/check-soup enforces append-on-add at PR time. Detail: reference/soup.md, internal/core/ratelimit/.
1A.14 Column-Level Data Classification
Status: shipped. Every column carries class + egress allow-list in data-classification.md; cmd/check-classification fails the build when a column lacks a registry entry. Default is block. Detail: patterns.md P39.
1A.15 Audit Log Partitioning
Status: shipped (staging cron wiring deferred to 1E). audit_log and audit_ai_provenance are range-partitioned monthly on created_at / audit_log_created_at. PK on audit_log is (id, created_at); the AI-provenance FK is composite (audit_log_id, audit_log_created_at) → audit_log(id, created_at) so both tables hand off the same monthly window together when archived. No DEFAULT partition — missed rollover is loud, not silent. The migration seeds only the current month so the rollover cron is exercised in real environments rather than masked by a long pre-seeded runway; audit.EnsurePartitions + cmd/audit-partition-roll (default -ahead=3) maintain the forward window. Detail: internal/core/audit/partitions.go, cmd/audit-partition-roll/main.go.
Why a foundation item. Audit_log carries the longest retention on the platform (≥ 6 years), grows monotonically with every mutation × every tenant, and the hot/warm/cold tier hand-off in CLAUDE.md is exactly what partitioning is built for. Retrofitting a partitioned-from-unpartitioned table after launch is a multi-day operation on a billion-row append-only table with no allowable write gap (audit_log gaps are themselves a compliance finding). Foundation discipline says fix it once, in pre-prod, where the cost is a 60-line migration edit.
- [ ] Wire
audit-partition-rollinto the staging scheduler (k8s CronJob or GitHub Actions, monthly cadence; default-ahead=3). Closed by 1E. - [ ] Add a staging alert when the newest existing partition is less than 1 month ahead — early warning that the rollover stopped firing.
1A.16 Postgres Extension Preinstall
Status: shipped (RDS parameter group deferred to 1E). Enables unaccent + pg_trgm for diacritic-folded fuzzy search (Romanian picker UIs need unaccent("Stefan") = unaccent("Ștefan")), vector for AI-feature embedding columns, and pg_stat_statements for top-N slow-query observability. Extensions live in 000001_init.up.sql; local docker-compose.yml switched to pgvector/pgvector:pg17 and configured shared_preload_libraries=pg_stat_statements.
Why a foundation item. Each of these costs more to retrofit than to enable preemptively.
unaccent+pg_trgmneed to be present before the first picker migration adds a GIN trigram index;vectorneeds the right Postgres image at the infrastructure layer, not a per-feature decision;pg_stat_statementsneedsshared_preload_librariesconfigured at server start, which means a Postgres restart in production. Doing all three in pre-prod is a 6-line migration edit; doing them piecemeal post-launch is three separate operational coordinations.
Conventions for feature migrations:
Picker-queryable text columns:
CREATE INDEX ... ON tbl USING GIN (col gin_trgm_ops);and query withunaccent(col) ILIKE unaccent('%' || $1 || '%'). TheAsyncMultiSelectFiltertypeahead path expects this shape.Embedding columns:
embedding vector(N)with N matching the model's output dimensionality. Add an HNSW or IVFFlat index when the column is queried at scale; small tables can scan.Slow-query inspection (locally):
SELECT query, calls, mean_exec_time FROM pg_stat_statements ORDER BY mean_exec_time DESC LIMIT 20;.[ ] AWS RDS parameter group sets
shared_preload_libraries=pg_stat_statements. Closed by 1E.
1A.17 Frontend Performance Foundation
Status: shipped. Five composable patterns + one security middleware, all live-verified end-to-end against a production Console build. The bundle's load-bearing pieces:
- P44 — Connection pooling via pgbouncer (patterns.md). Local docker-compose service mirrors the AWS ECS Fargate setup; same
pgbouncer.ini.DATABASE_URL/DATABASE_APP_URLroute through:6432; migrations bypass viaDATABASE_DIRECT_URLbecausegolang-migrateuses session-scopedpg_advisory_lock.min_pool_size=5keeps backends warm, eliminating the 1.3s cold-start spike we observed live. - P43 — Tuned undici dispatcher (patterns.md). Each Next.js app's
instrumentation.tsinstalls a globalAgentwith 30s keep-alive + 64-conn pool per origin. Default 4s keep-alive cycles TCP under bursty traffic; this kills the TIME_WAIT pile-up at fleet scale. - P42 — Server-side response caching with scope-keyed tags (patterns.md). Tagged GETs in
packages/api-clientgo throughunstable_cachefromnext/cache; server actions invalidate viaupdateTag()(Next.js 16+). Tag taxonomy inpackages/api-client/src/cache-tags.ts. Live-discovery: Next.js's built-in fetch tags hashAuthorizationinto the cache key, so rotating Clerk JWTs make every request a miss —unstable_cacheis the only mechanism that works with auth-walled APIs. - P45 — Redis-backed query cache (cache-aside in repo layer) (patterns.md).
services/api/internal/core/cache/providesAside+Invalidate+ key builders mirroring the P42 namespace. Two layers compose: Next.js'sunstable_cacheis per-process (each Next.js instance has its own); Redis is shared across the Core API fleet. At 10k concurrent the layering goes 0 / 1 / 1+ Postgres queries (Next.js hit / Redis hit / Redis miss). - URL ≡ scope guard (url_org_scope.go).
RequireURLOrgMatchesScope("id")mounted on/v1/organizations/{id}route group. Closes a latent gap surfaced by the cache work: RLS protects the FIRST request from a URL/header mismatch, but caching propagates the response — turning a one-time silent 404 into a recurring data leak. Apply preemptively on every per-org route group whether or not the endpoint caches today; cost is one UUID parse, benefit is that adding caching later is mechanical. - P46 — Portal hybrid architecture (patterns.md). Decision matrix for server-render vs client-side data per type. Documents-only at this phase; SWR install + first wired client-cache example land with the first Portal F-feature that needs them.
Wired examples (the references future domain work copies from):
- Canonical:
getOrganization(id)(P42 + P45) +updateOrganizationAction(updateTaginvalidation). Org summary read on every Console org-detail render and on future Clinic / Portal "my clinic" pages. - Hot proxy path:
organization.Service.ResolveBySlug/ResolveByDomain(P45). Hit by every Next.js app'sproxy.tscold-load — the original hand-rolled cache, refactored onto the helper. - Portal-critical:
consents.Service.ListPurposesWithLatestForOrg(P45) +listConsentPurposes(P42). Read by every patient on every consent gate check. - Platform-scope reference:
listLegalDocumentTemplates+publishLegalDocumentTemplateAction(P42). Console superadmin scope; kept as the simplest platform-scope example.
Why a foundation item. All five patterns + the URL≡scope guard sit in the load-bearing layer every feature rests on. Retrofitting them after Portal F-features ship means revisiting every endpoint that was authored without the cache contract in mind, plus a security audit pass to confirm no per-org route mounts caching without the URL guard. Each individual pattern is small; the cost of bundling them now is one PR; the cost of doing them piecemeal post-launch is one PR per feature plus the integration audit between them. Same calculus as 1A.15 audit partitioning and 1A.16 postgres extensions.
The URL≡scope guard is also a security improvement independent of caching — RLS-only protection at the DB layer left the API tolerating mismatched header/URL combinations that future agents could trip into a real leak. Foundation discipline catches these now.
- [ ] AWS ECS Fargate pgbouncer service in front of RDS — Dockerfile + IaC. Closed by 1E (aws-infrastructure.md → Connection pooling).
- [ ] Cloudflare cache rules per the aws-infrastructure.md → CDN table (
/_next/static/*cached forever, HTML never cached). Activate when traffic warrants; documentation captures the rules so they're not invented per-deploy.
1A.18 Notifications & Email Transport
Durable email + multi-channel notification primitive. Every foundation consumer that needs to reach a human outside the request path goes through this — no ad-hoc
ses.SendEmailcalls. The calling contract, outbox shape, recipient-preference model, audit/classification integration, and timezone-aware scheduling are all expensive to retrofit once 50+ features write to the primitive.
Status: shipped except the SES infra closes (production identity verification + suppression list, both gated by 1E AWS staging) and the recipient-timezone profile UI (closes alongside 1D.3 patient self-service). Schema, dispatcher, EmailChannel + FakeChannel, foundation templates (MemberInvite + BreakGlassOpened × email × en+ro), and the integration acceptance test are in place. Wire-in calls into 1B.11 / 1B.12 light up when those features ship — the primitive is ready. F-tier features (1D.3 export-ready / account-deletion-confirmed in F11; F2/F5 appointment reminders; F8 automations engine) layer additional categories onto the same primitive without schema change.
Deltas from the original spec, locked at implementation:
- Audit row removed from
notify.Send. The calling handler audits the originating event (e.g., 1B.12 invite endpoint auditsorganization_membershipCREATE); the notification row IS the body-of-record (RLS-scoped to recipient). The original spec's redundantnotification.enqueueaudit row would land on the request's tx while the notification row lands on an admin-pool tx — a rollback inconsistency the simpler design avoids. Forensic story is unchanged: "what did we send?" = SELECT onnotifications. - Per-recipient rate limit deferred. The safety-net cap was design intent for misbehaving F-tier producers, not foundation-tier consumers (which are transactional and bypass it anyway). Lands when the first operational producer (appointment reminders) ships and the rate-limit math is concrete. Code path is not stubbed — adding to
Sendis a one-line guard againstnotification_preferences.global_cap_*columns when those land. organization_settings.default_timezone(TEXT NULL, IANA) added in 000003 to close the resolution chain (humans.timezone → org default → 'Europe/Bucharest'). Same column anchors P23's scheduling-timezone chain (location → specialist → here → platform default), so it serves both 1A.18 (recipient-where-they-read) and 1B.14 (slot-where-it-happens).CategoryDefinition.BillingScopeadded 2026-05-12 to distinguish platform→user from tenant→customer mail. All four foundation categories (owner_welcome,break_glass_opened,member_invite,webhook_subscription_paused) are RestartiX talking to a human about their RestartiX account — they send from a platform-owned SES identity and are not metered against any org's plan. Tenant-scope categories (F-tier; none today) route throughWrapMeteredProviderand resolve to the clinic's per-org SES identity when one is configured. The split was originally implicit — the dispatcher wrapped every email withMeteredChannel, which crashed every platform-scope send withErrUnauthenticatedbecause the metering soft-quota gate required a request-scopedprincipal.Subjectthat the background dispatcher does not carry. Discovered live when owner-welcome stopped sending. Closed bynotify.email.ScopedChannel(routes byBillingScope).- Quota soft-gate works on system paths too (2026-05-12). The
enforceQuotamiddleware used to require aprincipal.Subjectand bailed withErrUnauthenticatedon the notify dispatcher's system-driven path. Closed bycapabilities.SetLimitLookup+metering.Repository.LookupLimit— when no Subject is in context but a metering org IS attached viaContextWithMeteringOrg, the gate reads the sameorganization_subscription_limits+usage_quotassourceLoadLimitsreads for request paths. Request and system paths now see identical soft-gate behavior; the inner atomicReserveinmeterAroundCallremains the load-bearing cap enforcement either way. notify.Sendrejects principal-id-only recipients on email-routed categories (2026-05-12). The email channel does not resolve principal → humans.email by design (callers resolve at the call site, the rendered email address is the body-of-record). Previously this convention was unenforced — a producer that callednotify.Send(notify.To(principalID), CategoryOwnerWelcome, ...)would queue successfully and dead-letter at dispatch time withrecipient_emailNULL. The guard inService.Sendnow fails fast at queue time with a message pointing the producer at the right helper.categoryNeedsAddresscovers email / sms / whatsapp; in-app / push channels are address-free and still accept principal-id recipients.- Address-recipient producers carry locale + timezone responsibility too (2026-05-12). The locale + timezone resolution chain (
humans.preferred_language → org default → "en",humans.timezone → organization_settings.default_timezone → "Europe/Bucharest") reads the humans row ONLY for principal-id recipients. Address-based recipients (notify.ToAddress) skip the humans read and fall straight through to org default → platform default, losing anyhumans.preferred_language/humans.timezonepreference. Producers that have a principal_id at the call site (the common shape — break-glass, owner-welcome, webhook auto-pause) must look up the recipient's locale + timezone alongside their email and pass all three through (notify.ToAddress(...) + notify.WithLocale(...) + notify.WithTimezone(...)). Foundation acceptance test insetup_clinic_test.godemonstrates the canonical pattern. Today's foundation producers (break-glass, owner-welcome, webhook auto-pause) pass email only and fall through to org defaults — acceptable for now because Romanian orgs default toro/Europe/Bucharestanyway, but the convention is documented so F-tier producers don't repeat the partial fix. A futurenotify.ResolveForPrincipal(principalID) (email, locale, timezone)helper would collapse the three lookups; not built today (premature against one consumer).
Gaps deferred to first consumer (no foundation category exercises them):
- Operational-category preferences read path. Spec says the dispatcher should consult
notification_preferencesfor explicit opt-outs before fan-out. Foundation categories are transactional and skip preferences; the actual lookup code does not yet exist.CategoryDefinition.Classificationcarries the discriminator so the branch slots in cleanly when the first operational category (appointment reminder) ships. Writing the lookup now would be speculation against an unknown shape (does opt-out apply pre-render or per-channel? does the dispatcher fan back in if all channels are opt-out?); deferring keeps that decision concrete-driven. - Marketing-category consents read path. Same shape — spec says the dispatcher should consult the
consentsledger formarketing_{channel}purpose at the recipient's org. Lands when the first marketing category surfaces. Foundation has no marketing producer.
Calling contract. notify.Send(ctx, recipient, category, data, opts...). Caller never names a channel — the dispatcher picks channels from the category default × recipient preferences. Opt-outs become impossible to forget because the dispatcher is the only code that maps category → channels. Transactional categories (MemberInvite, BreakGlassOpened, future AccountDeleted) carry their own legal basis and bypass preference filters; operational categories (future appointment reminders, treatment nudges) respect them.
Schema — three new tables + one column on humans.
notifications (parent — one row per logical send). Columns: id, organization_id UUID NULL (mirrors patient-portable nullable scoping — NULL for cross-org sends like account-deletion-confirmed), recipient_principal_id UUID NULL + recipient_email TEXT NULL (CHECK exactly one set; address-based supports invite where there is no principal yet), category VARCHAR(64), idempotency_key TEXT NULL (partial unique on (category, idempotency_key) WHERE idempotency_key IS NOT NULL — producer dedup), locale VARCHAR(8) + timezone VARCHAR(64) (snapshotted at enqueue), subject TEXT + body_text TEXT + body_html TEXT NULL (rendered at enqueue, immutable thereafter), scheduled_at TIMESTAMPTZ NULL (NULL = ASAP; non-NULL = worker holds dispatch until NOW() >= scheduled_at; caller computes via notify.AtRecipientLocal() so wall-clock-to-UTC conversion happens once when the offset is known — DST-correct), created_at. RLS: recipient reads own; org members read organization_id = current_app_org_id(); platform-scope (NULL org_id) admin-only via AdminPool / superadmin (existing pattern; no new permission for foundation).
notification_deliveries (child — one row per (notification × channel)). Columns: id, notification_id FK, channel VARCHAR(16) (email | sms | whatsapp | push | in_app), status VARCHAR(16) (pending → claimed → sent | failed | dead_letter), attempts SMALLINT, claimed_at TIMESTAMPTZ NULL + claimed_by_worker_id TEXT NULL (SKIP LOCKED claim), next_attempt_at TIMESTAMPTZ NULL (exponential backoff: 1m / 5m / 30m / 1h / 6h, dead-letter at 5 attempts), sent_at TIMESTAMPTZ NULL + provider_message_id TEXT NULL + last_error TEXT NULL, read_at TIMESTAMPTZ NULL (meaningful only for channel='in_app'; ignored by other adapters — read state is per-recipient across apps, so a clinic admin who reads in Clinic sees it as read in Console). RLS inherits via notification_id join.
notification_preferences (sparse override — only rows that DIFFER from category defaults exist). Columns: recipient_principal_id, category, channel, enabled BOOLEAN, updated_at. PK (recipient_principal_id, category, channel). No org_id — preferences are cross-org per the patient-portable model. RLS: recipient reads/writes own. At 20k humans × ~1% explicit opt-outs across ~10 future categories × 4 opt-outable channels, table sits in low thousands of rows for the foreseeable future.
Edit-in-place in 000002_tenancy_rbac.up.sql: add humans.timezone VARCHAR(64) NULL next to humans.locale. Resolution chain at enqueue: humans.timezone → organization_settings.default_timezone → 'Europe/Bucharest'. Distinct from P23's scheduling-timezone chain (recipient-where-they-read vs. slot-where-it-happens — both true simultaneously). Per CLAUDE.md "Migrations are editable pre-production."
Worker model. In-process polling goroutine, one per Core API instance, polls every 1–2s with ... WHERE status='pending' AND (scheduled_at IS NULL OR scheduled_at <= NOW()) ... FOR UPDATE SKIP LOCKED LIMIT N. SKIP LOCKED handles cross-instance coordination naturally — no advisory locks (P44), no LISTEN/NOTIFY (P44). Migration to a separate cmd/notification-dispatch binary is mechanical when volume warrants independent scaling: same dispatcher.Run() function, hosted by a different process.
Channel adapters. Channel interface: Send(ctx, *Notification, *Delivery) error. Foundation registers EmailChannel (AWS SDK v2 SES). Tests inject FakeChannel capturing sends to an in-memory slice. F-tier additions (SMSChannel, WhatsAppChannel, PushChannel) register against the same interface. In-app is a channel — dispatching channel='in_app' writes the delivery row with status='sent' immediately (no transport call); the future inbox bell queries JOIN notifications ... WHERE channel='in_app' AND recipient = me. In-app is always-on, no opt-out (a recipient who doesn't want to see in-app messages can simply not look at the bell — opting out at delivery would lose the audit trail of "we tried to tell you").
Categories + templates. Go-source enum + map in internal/core/notify/categories.go. Each entry: {default_channels, classification, template_key}. Foundation defines MemberInvite and BreakGlassOpened (both transactional, email-only). The transactional/operational classification is GDPR-significant (legitimate-interest vs. consent-based processing) — never a runtime config; always code. Templates: //go:embed templates/*.tmpl, one file per {category, channel, locale} combo (~4 files for foundation: 2 categories × email × en+ro). Engine: stdlib text/template + html/template. All timestamp rendering goes through inTZ / inLocale helpers; raw UTC dumps in template bodies are a code-review violation. F8's clinic-customizable templates layer a DB overlay table on top of the embedded foundation defaults.
Render + resolve at enqueue. Send() resolves recipient locale + timezone (one humans row read), renders the template, stores the rendered subject + body in the parent row. Worker is dumb — ships stored bytes to a stored address. Audit story is "what did we send?" = SELECT on the row. Rendered PII (subject, body_text, body_html, recipient_email) registers as pii_basic in data-classification.md — same shape as forms.submitted_data.
Idempotency. Caller-provided key (opts.IdempotencyKey(eventID)) enforced via partial unique. Foundation callers all have a natural key (invite_id, break_glass_session_id). Worker double-delivery (worker crashes between SES call and status update) is accepted as the failure mode — patient gets two copies of a transactional email, recoverable. Real exactly-once via two-phase commit not justified at foundation consumer scale.
Preferences vs consents — two distinct concepts dispatched separately. Marketing categories (future): consult consents ledger for marketing_{channel} purpose at the recipient's org. Required for legal basis. Operational categories (future appointment reminders): consult notification_preferences for explicit opt-outs. Transactional categories (foundation's two): skip both — these are part of the service contract. Foundation's two consumers are transactional, so the preferences/consents read paths exist as code but are not exercised at gate close.
Rate limiting. Per-recipient global cap, checked at enqueue, transactional bypass. Reuses 1A.13's Redis ratelimit. Default: 50 notifications/recipient/hour for operational categories. Catches stuck retry loops and misconfigured automation rules before they flood an inbox. Producer-level rate-limiting (one rule firing 1000×/sec) is a Layer 8 concern — this is the safety net.
Audit + classification. Per CLAUDE.md operational-metadata-exempt rule: enqueue is a state-changing mutation → audit_log row written (action notification.enqueue; recipient + category + idempotency_key recorded; body content NOT recorded — the row itself stores the body and is RLS-scoped to recipient). Per-delivery transitions (claimed, sent, failed, read) are operational metadata, no audit row — saves ~10–50× audit volume; forensic reconstruction available from the delivery row's columns directly. Classification entries for every new column ship in the same PR as the migration (CI gate enforces).
F7 webhook boundary. F7's webhook dispatcher subscribes to events.Bus (1A.9) for outbound deliveries to clinic systems. 1A.18's notification dispatcher reads from notification_deliveries for first-party email/SMS/in-app to humans. Both can fire from the same domain event (e.g., appointment.completed → webhook to clinic CRM AND notification email to patient) but are independent code paths writing to independent stores. Two transports, one event — neither cascades into the other.
Test strategy. Existing testcontainers Postgres + new FakeChannel. Acceptance test: enqueue → polling loop runs → fake captures one rendered email with the right subject / body / locale / timezone snapshots. Wired into setup_clinic_test.go once 1B.12's invite flow consumes the primitive.
Deferred to F-tier. Per-app inbox bell UI (no foundation consumer needs it; storage + RLS in place so F-tier just adds a GET /v1/me/notifications endpoint via P42's me:{principalId}:notifications tag + a shared packages/ui component). Quiet hours (no foundation consumer needs "don't email after 10pm recipient-local"; lands later as a recipient-preference column + dispatch-time check, additive). SMS / WhatsApp / push adapters (slot in via the same Channel interface when their first consumer ships). Bounce / complaint webhooks + suppression list automation (lands when first SES bounce noise warrants — manual SES suppression list config ships at 1E as the floor).
Foundation email-presentation polish (deferred — does not block 1E). Two items shipped functional-but-rough in foundation; both are visible to the first real owner that signs up:
Shared HTML shell + retrofit four foundation templates. Today every template ships plaintext-only (
subject+body_textblocks). HTML clients (Gmail / Outlook / Apple Mail — i.e., everyone) render this as monospace walls with raw 600-char URLs, which feels 1990s and degrades the brand. The fix is a sharedtemplates/_layout.email.tmpldefining the HTML shell (logo block, type scale, footer with platform contact, "you got this because..." trust line) that each per-category template extends with Go-templatedefine "body_text"/define "body_html"blocks. Notify already supportsbody_htmlend-to-end (column onnotifications, sent by email.go:177-182 when present); the work is template-side. Retrofit all four foundation categories in the same PR (owner_welcome,break_glass_opened,member_invite,webhook_subscription_paused) — the shell only pays off when adoption is consistent, and four templates is small enough that piecemeal would mean two visual styles in the wild during the rollout window. Cross-template considerations to settle in that PR: the "address the recipient by name" data shape (which producers gain the responsibility to loadhumans.full_namealongsidehumans.email+ locale + timezone — sameResolveForPrincipalhelper noted under the address-recipient delta above), the From-name display string (today raw email; HTML lets us render"RestartiX <[email protected]>"), and whether to load any imagery (recommend NO — images block in many clients, leak read-tracking pixels, and tank deliverability if remotely hosted).Owner-welcome magic link should land on our Clinic app, not Clerk's hosted sign-in page. Today auth/clerk/provisioning.go:80-99 returns the URL
signintoken.Createproduces (a Clerk-hosted*.accounts.dev/auth.<clerk-domain>URL), which is then dropped into the email body. The new owner clicks "Welcome to Demo Clinic on RestartiX" and lands on a domain they don't recognize (true-sawfly-92.accounts.devin dev; someauth.clerk.com-shaped string in prod) with no RestartiX branding, no "Welcome to Demo Clinic" context, and a generic Clerk sign-in form. Phishy-looking and brand-discontinuous. Proper fix:CreateSignInLinkreturns a URL on our domain (e.g.https://{slug}.clinic.restartix.pro/welcome?ticket=<clerk_ticket>); the Clinic app's/welcomeroute consumes the ticket server-side via Clerk's API (SignInToken.Verifyor equivalent), renders a RestartiX-branded "Welcome to {org_name}" page, has the user set their password inline, drops the session cookie, redirects to the dashboard. Foundation work touchesauth.Provisioninginterface (the return becomes "our URL", not "Clerk's URL"); the Clinic-app/welcomeroute is 1D.2 territory. Splittable across PRs: backend swaps URL construction first (the route can land as a stub that just bounces to Clerk-hosted as a transition), then Clinic-app implements the real ticket-consuming route, then the stub is removed. Until this lands, the owner-welcome email functionally works but is one of the weakest first impressions in the product.
Why a foundation item. Multiple foundation consumers (1B.11 break-glass alert, 1B.12 invite) and every F-tier feature that ever notifies a human all write through this same primitive. Retrofitting the outbox shape, recipient model, preferences/consents boundary, idempotency contract, audit treatment, or timezone-aware scheduling once 50+ producers exist is exactly the cross-cutting cost foundation discipline exists to prevent. Adding
humans.timezoneafter first prod deploy means a backfill across the whole humans table; doing it now is one column in an editable migration. Same calculus as 1A.15 audit partitioning, 1A.16 postgres extensions, 1A.17 frontend performance.The 1B.11 spec previously punted the break-glass email to "alongside F7 OR a foundation-tier direct mail send (decide at implementation)." This section closes that punt — break-glass emails go through
notify.Send, not a TODO comment.
- [x] Migration: create
notifications,notification_deliveries,notification_preferencestables with RLS policies (recipient reads own; org members withorganizations.view_directoryread org-scoped; sparse-prefs writable by recipient). Shipped in 000010_notifications. - [x] Edit-in-place: add
humans.timezone VARCHAR(64) NULLto 000002_tenancy_rbac.up.sql next tohumans.preferred_language. - [x] Edit-in-place: add
organization_settings.default_timezone TEXT NULLto 000003_org_settings.up.sql — closes the resolution chain (humans → org → platform). - [x] Data classification entries for every new column in data-classification.md — same PR as the migration (CI gate enforces).
- [x]
internal/core/notify/package:Send,AtRecipientLocal,Channelinterface, dispatcher loop with SKIP LOCKED claim, exponential-backoff retry (1m/5m/30m/1h/6h), dead-letter cap at 5 attempts, transactional-category bypass for preferences. Per-recipient rate limit deferred (see Deltas above). - [x]
EmailChanneladapter usingaws-sdk-go-v2/service/sesv2in internal/core/notify/email/.FakeChannelfor tests in the notify package itself (so the integration suite can compose it without a build-tag dance). - [x] Embed
MemberInvite+BreakGlassOpenedtemplates × email × en+ro (4.tmplfiles). - [x] Template helpers:
inTZ,inLocale, locale-aware date/time formatting (folded into template.go). - [x] Wire 1B.11 elevation endpoint to call
notify.Send(adminPrincipal, BreakGlassOpened, data, opts.IdempotencyKey(sessionID))after the session row commits. Wired in breakglass/service.go — fan-out lookup byadminsystem role + per-(session × admin) idempotency keys. - [x] Wire 1B.12 invite endpoint to call
notify.Send(notify.ToAddress(email), MemberInvite, data, opts.IdempotencyKey(membershipID)). Deferred to BYO ESP migration — 1B.12 chose Clerk's Invitations API for invite delivery while we're on Clerk for auth emails (see 1B.12 decisions). TheMemberInvitetemplate +notify.Sendplumbing are shipped and ready; the wire flips when Clerk auth emails migrate to our SES at the BYO ESP cutover. Foundation acceptance is satisfied by the break-glass consumer above + thesetup_clinic_test.goacceptance test. - [x] SOUP row for
aws-sdk-go-v2/service/sesv2in reference/soup.md — same PR as the adapter (CI gate enforces). - [x] Acceptance test in setup_clinic_test.go: enqueue →
dispatcher.RunOnce→ fake captures rendered email with correct subject + body + locale + timezone. Plus a retry/dead-letter test exercising the failure path withMaxAttempts=2. - [x]
CategoryDefinition.BillingScope+notify.email.ScopedChannelrouting — platform-scope categories bypass metering and send from the platform-owned SES identity; tenant-scope categories (F-tier; none today) route throughWrapMeteredProvider. - [x]
capabilities.SetLimitLookup+metering.Repository.LookupLimit— system-driven dispatch paths (notify dispatcher, background jobs) get the same soft pre-check as request paths. Wired in cmd/api/main.go alongsideSetMeterStore. Tests:TestWrapMeteredProvider_SystemDriven_QuotaExceeded/_QuotaUnderCap_RunsInner/_NoLookup_PassesThroughin capabilities_test.go;TestMetering_LookupLimit_*in metering_test.go. - [x]
notify.Sendguard rejects principal-id-only recipients on email-routed categories at queue time. The convention (callers resolve principal → humans.email at the call site; the email channel does not resolve) is now compiler-enforced for any future producer. Tests:TestCategoryNeedsAddress_*in category_test.go. - [x] AWS SES production identity verification + DKIM for the platform sender domain. Shipped 2026-05-13.
restartix.prodomain identity verified ineu-central-1; DKIM CNAMEs + SPF (v=spf1 include:amazonses.com -all) + DMARC (v=DMARC1; p=none;) published in Cloudflare. Sandbox exited account-wide (211k/day quota). Configuration set + bounce/complaint webhook handler still gated on F-tier app-layer work (per production-launch-readiness.md L65-70). - [x] AWS SES suppression list initial config (auto-add hard bounces + complaints, per-account suppression). Enabled in console 2026-05-12.
- [ ] Foundation email-presentation polish (deferred). Shared HTML shell + retrofit four foundation templates AND owner-welcome magic link lands on our Clinic app's
/welcomeroute instead of Clerk's hosted sign-in page. Both described in the "Foundation email-presentation polish" paragraph above. Functional but rough; the first real owner that signs up sees both rough edges. Closed once those two land — neither blocks 1E. - [ ] Recipient-timezone profile UI in 1D.3 patient self-service — IANA picker, default to browser-detected at first set. Closed alongside 1D.3.
1B. Identity & Tenancy
The actor-identity model + tenancy economics. This is where patient identity now lives — patients are not memberships, patient tiers are not roles. See decisions.md → Why patients are not memberships, and patient tiers are not roles.
Build order inside 1B. Each item here depends on the items in its column or to its left. Items in the same column or rightmost are independent of each other and can run in parallel.
1B.1 Principal Model (foundational — everything else builds on this)
↓
1B.2 Org Settings / Billing / Entitlements
1B.3 Plans / Subscriptions / Overrides (parallel with 1B.2)
1B.4 Patient Tiers Catalog (depends on 1B.3 catalog tables)
1B.5 Plan-Entitlement / Org-Entitlement / Limit Middleware (depends on 1B.2 + 1B.3 + 1B.4)
↓
1B.6 Patient Identity (depends on 1B.1 principals + 1B.4 default tier reference)
↓
1B.7 Patient Subscriptions (depends on 1B.6 patients + 1B.4 tiers)
1B.8 Portal Onboarding (depends on 1B.6 + 1B.7 — provisions all three in one txn)
↓
1B.9 Consents Ledger (depends on 1B.6 patient_profile_id FK)
↓
1B.10 Privacy Notice Templates (depends on 1B.9 consent_purpose_versions)
↓
1B.11 Platform Break-Glass Access (independent of 1B.9/1B.10 except for auditing of consent reads — can run parallel to 1B.10)
1B.12 Member Invite Flow (independent — can run any time after 1B.1)
1B.13 Patient Impersonation Sessions (independent — can run any time after 1B.6; mirrors 1B.11)
1B.14 Locations & Multi-Site Support (independent — can run any time after 1B.2; blocks F4 Scheduling and any specialist/appointment work)1B.1 through 1B.14 are shipped backend end-to-end. 1B.1–1B.10 carry their UI consumers as well (Console template management, clinic-admin legal-documents editor, portal re-consent modal); 1B.11–1B.14 ship backend + RLS integration tests, with their UI consumers (Console + Clinic admin surfaces for break-glass / invites / impersonation oversight / locations CRUD) deferred to the unified UI pass once the foundation backend is fully locked. 1B.14 closing before any Layer 2 scheduling / specialist / appointment work was the load-bearing constraint — once those tables exist, retrofitting location_id is a cross-cutting backfill the foundation discipline rule exists to prevent.
What moved out of 1B. The original 1B.15 "Per-Org Integrations Catalog" was retired into the new § 1C. Capabilities, Integrations & Metering — its scope expanded into a dedicated sub-phase covering the full integration architecture (Curated Providers, Connected Accounts, Outbound + Inbound Webhooks, Internal Events, Metering, AI Hooks, Entitlements rename). See glossary.md for the canonical taxonomy that drove the move.
1B.1 Principal Model as Root Identity (was 1.24)
Status: shipped. principals is the actor-identity root; humans, agents, service_accounts are siblings sharing the principal_id PK; the singleton 'system' principal attributes trigger fan-out and unauthenticated paths; audit_log.actor_id + actor_type carry the actor; AI provenance lives in a sibling audit_ai_provenance table. Detail: decisions.md → Why principals as the root identity, data-model.md § Area 1.
1B.2 Org-Level Settings, Billing & Entitlements (was 1.19)
Status: shipped. Three typed companion tables (organization_settings, organization_billing, organization_entitlements) auto-created on org INSERT via trigger; entitlement flags are AdminPool-only (regulated trust boundary); organizations.update_settings and organizations.manage_billing permissions seeded. Detail: architecture/org-settings.md.
1B.3 Plans, Subscriptions & Sales Overrides (was 1.20)
Status: shipped. Catalog tables (plans, plan_versions, entitlements, limit_definitions, plan_entitlements, plan_limits); per-org tables (organization_subscriptions, organization_subscription_entitlements, organization_subscription_limits, organization_subscription_overrides); snapshot-on-subscribe (P14b); entitlement projection from regulated entitlements onto organization_entitlements. Detail: architecture/plans-and-subscriptions.md.
1B.4 Patient Tiers Catalog (was 1.21, restructured by 1.26)
Status: shipped. Per-org tier catalog with versioning columns, parallel patient_tier_entitlements / patient_tier_limits mirroring the org-side billing engine, default-tier invariants (single-default partial unique index + atomic flip). No tier→role binding. Detail: architecture/plans-and-subscriptions.md § Patient tiers.
1B.5 Plan-Entitlement / Org-Entitlement / Limit Middleware (was 1.22)
Status: shipped. Four-gate model (Permission / PlanEntitlement / OrgEntitlement / Limit) with distinct error codes (403 / 402 / 403 / 402); RequirePlanEntitlement, RequireOrgEntitlement, EnforceLimit middlewares; Subject.{OrgEntitlements, PlanEntitlements, Limits} extensions; behaviour-aware Limit semantics (hard_block / soft_meter / informational). Detail: architecture/middleware-composition.md.
1B.6 Patient Identity (was 2.1, 2.2)
Status: shipped. patient_profiles (portable, no org_id) + patient_caregivers + patients (per-org link with profile_shared, consumer_id, last_used_at); patient-scoped RLS helper current_human_patient_profile_ids(); patients table grants portal access by row existence (no app.access_portal permission); per-org admin endpoints under patients/. Detail: data-model.md § Area 2, decisions.md → Why patients are not memberships.
- [x]
patients.profile_sharedflips when theprofile_sharingconsent (org-scope, Tier A toggle in 1B.9) is granted, and back to FALSE when it's withdrawn. Wire as a trigger onconsentsinsert/update. (Shipped in 000008 —trigger_consent_profile_sharing_flip.)
1B.7 Patient Subscriptions (was 2.5)
Status: shipped. patient_subscriptions + snapshot tables (patient_subscription_entitlements, patient_subscription_limits, patient_subscription_overrides); every onboarded patient has a defined subscription state from day one (default tier, status active, snapshots frozen at subscribe time). Detail: data-model.md § Area 16.
- [ ] Patient self-service tier change (
PATCH /v1/me/patient-subscription) — defer until external billing wires up. - [ ] Override grant/revoke admin endpoints — table + RLS shipped, CRUD pending a concrete use case.
1B.8 Portal Onboarding (was 2.4 portal endpoint)
Status: shipped. POST /v1/portal/onboard provisions patient_profiles + patients + patient_subscriptions (snapshots from the org's default tier) in one AdminPool transaction; idempotent on re-onboard; mounted outside OrganizationContext because that gate would 403 a fresh human with no patients row. Detail: portalonboarding/.
- [x]
patient.onboardedevent publish — wired in portalonboarding/handler.go via the 1A.9 event bus. Fires only whenresult.PatientCreatedis true (re-onboarding at a clinic where the patient already has a row stays silent). Payload carriespatient_profile_id,tier_id/tier_version, and asourcediscriminator (invite/share_link/self_signup) plus the corresponding source-id when relevant. F7 webhook consumers subscribe byType = "patient.onboarded". - [ ] Caregiver onboarding for account-less patients — table + RLS shipped; admin endpoint deferred until a real product use case.
1B.9 Consents Ledger
Single append-on-grant table that records every consent event across both platform-scope and org-scope purposes. The "trail" UX (granted, withdrawn, re-granted, current state) falls out of
WHERE patient_profile_id = $1 ORDER BY granted_at DESC. Substantive design rationale in decisions.md → Why clinic is controller, platform is processor.
Status: shipped. Schema + RLS + cascade trigger + permission seeds + initial purpose/version seeds in 000008. Consents domain (model/repository/service/handler) + grant + withdraw + trail-view + catalog endpoints. Two-step onboarding (decisions.md → Why two-step onboarding): step 1 (POST /v1/me/patient-profile) creates the portable patient_profiles row + writes platform-scope consents in one admin tx; step 2 (POST /v1/portal/onboard) requires the profile to exist (409 profile_missing otherwise) and writes the per-clinic patients + patient_subscriptions chain + org-scope consents. Re-consent middleware (412 consent_required with missing list), RequireConsent(purpose) foundation-tier stub, version-supersession path on Service.Grant (so a v2 republish unblocks the gate via re-grant), and patient trail UI at /(patient)/consents consuming /v1/me/consents + /v1/consent-purposes. The current_required_consent_versions helper filters to non-consent legal-basis purposes — optional toggles never block. User.has_patient_profile exposed on /v1/me so the portal /onboard page routes step 1 vs step 2 without an extra round-trip. Withdrawing org_terms from the trail UI fires the cascade with an explicit "Leave clinic" confirmation dialog (the action is per-clinic relationship-ending, not a regular toggle).
Catalog tables:
- [x]
consent_purposes—(code, scope, name, description, legal_basis, withdrawable, created_at).scope IN ('platform', 'org').legal_basis IN ('contract', 'legitimate_interest', 'consent', 'legal_obligation', 'vital_interest').withdrawableis derived in code (TRUE only whenlegal_basis='consent') but stored as a column for query speed. - [x]
consent_purpose_versions—(id, purpose_code, organization_id NULL, version, body_translations JSONB, published_at, published_by_principal_id). NULLorganization_id= platform-default text. Setorganization_id= org override (only valid for org-scope purposes). - [x] Initial purpose seeds:
- Platform scope:
platform_terms(contract, non-withdrawable),platform_privacy_notice(legitimate_interest, non-withdrawable, informational acceptance). - Org scope:
org_terms(contract),org_privacy_notice(legal_obligation + legitimate_interest),profile_sharing(consent — patient lets the clinic see DOB, allergies, insurance instead of name only),marketing_email(consent),marketing_sms(consent),analytics(consent),ai_processing(consent). - Reserved for F3.5:
telemedicine,video_recording,biometric_capture,treatment_specific_*(all consent-basis, registered when F3 ships).
- Platform scope:
Ledger:
- [x]
consents—(id, organization_id NULL, patient_profile_id, purpose_code, purpose_version, source, source_form_id NULL, granted_at, granted_by_principal_id, granted_via_ip, withdrawn_at, withdrawn_by_principal_id, withdrawal_reason). NULLorganization_id= platform-scope grant.source IN ('signup_checkbox', 'self_toggle', 'form', 'staff_action', 'api').source_form_idis NULL except whensource='form'(FK to F3'sformstable). Append-only on grant — re-grant after withdrawal = new row. Withdrawal = UPDATE that setswithdrawn_at+withdrawn_by_principal_id(the only mutation allowed; rest is INSERT).
Hooks + middleware:
- [x] Sign-up flow (two-step):
- Step 1 —
POST /v1/me/patient-profile(idempotent on the profile): creates the portablepatient_profilesrow keyed by the calling human + writes platform-scope consents (platform_terms,platform_privacy_notice) in one admin tx. Org-scope codes here fail400 scope_mismatch(those belong on step 2). Mounted in the principal-RLS group outsideRequireCurrentConsentsso a fresh human can clear the gate by completing this step. - Step 2 —
POST /v1/portal/onboard: requires the portable profile to exist (409 profile_missingotherwise); writes org-scope consents (org_terms,org_privacy_notice, plus any optional toggles the patient ticked) and provisions the per-clinicpatients+patient_subscriptionschain. - Failure to accept any required purpose at either step = whole admin tx rolls back with
400 consents_requiredand the missing-purpose list. Each fresh consent INSERT audits as a CREATE row sourced assignup_checkbox. Implemented inportalonboarding.Service.SetupProfile(step 1) andportalonboarding.Service.Onboard(step 2). - Why split: platform identity (profile + platform terms) is processor-side; per-clinic membership (patients row + clinic terms) is controller-side. A patient who leaves every clinic keeps their portable profile + platform consents; the next clinic they join skips step 1 entirely. The portal
/onboardpage readsUser.has_patient_profileto route between the two screens.
- Step 1 —
- [x] Self-toggle endpoints in patient settings (1D.3): patient flips
marketing_email/marketing_sms/analytics/ai_processingfrom portal settings; subject and grantor are the same principal. Each toggle is a new ledger row (grant) or an UPDATE on the active row (withdrawal). (Shipped:POST /v1/me/consents— idempotent grant withsource=self_toggle;POST /v1/me/consents/{id}/withdraw.) - [x] Staff-action endpoints (1D.2): gated by new
consents.managepermission; records the staff principal as grantor and the patient as subject (CS rep flips marketing on patient's behalf when the patient calls in). (Shipped:POST /v1/organizations/{id}/patients/{patientId}/consentswithsource=staff_action;POST .../consents/{consentId}/withdraw.) - [x]
current_required_consent_versions(principal_id, organization_id)— RLS-helper-style function returns the set of(purpose, version)the user hasn't accepted yet, restricted to non-consentlegal-basis purposes (contract / legitimate_interest / legal_obligation / vital_interest). Optionalconsent-basis toggles never appear here. Platform purposes always apply; org purposes apply whenorganization_idis non-NULL. - [x] Re-consent middleware (
middleware.RequireCurrentConsents): wraps/me/*routes afterRequirePrincipalRLS. Callscurrent_required_consent_versions; if non-empty, returns412 Precondition Failedwith{"error": {"code": "consent_required", "missing": [{"purpose_code": ..., "version": ...}, ...]}}. The consent endpoints (/v1/me/consents,/v1/me/consents/{id}/withdraw) are mounted in a sibling group without this gate so the 412 → re-grant loop can close.Service.Grantnow supersedes a stale older-version active grant (writeswithdrawal_reason='superseded_by_v{N}') before the re-insert, so non-withdrawable purposes can clear version drift. - [x] Withdrawal endpoints — patient self-withdraw (
POST /v1/me/consents/{id}/withdraw); staff-on-behalf-of-patient withdraw (gated byconsents.manage). (Both shipped; cascade trigger fires onorg_termswithdrawal at either path.) - [x] Cascade rule: withdrawing platform
platform_termsis not a patient-initiated path — UI says "to revoke these, delete your account" and triggers the GDPR erasure flow (handled in F11.1). Withdrawingorg_termsat clinic A is a single transaction that (1) setspatients.deleted_at = NOW()at clinic A, (2) transitions the activepatient_subscriptionsrow tostatus='canceled'withcanceled_at = NOW()— the row stays in place for billing / audit history, never hard-deleted, and (3) cascades withdrawal of all org-scope consents at that org via trigger. (Shipped in 000008 astrigger_consent_org_terms_cascade— DB-enforced; defense-in-depth for any code path.) Supersession-guard (added 2026-05-02): the trigger skips whenwithdrawal_reason LIKE 'superseded_by_v%'— the consents service uses that withdrawal-reason convention when a re-grant supersedes a stale older-version row, and without the guard re-acceptance oforg_termswould auto-soft-delete the patient. The cascade fires only on real "leave clinic" intent (nosuperseded_by_*reason); locked byTestConsents_VersionSupersessionDoesNotFireOrgTermsCascade. - [x] Re-onboarding semantics (foundation invariant, locked in 2026-05-01): the partial unique index
patients_profile_org_active_uniq(000006) allows a returning patient to sign up at the same clinic again — they get a brand-newpatientsrow + brand-newpatient_subscriptionsrow. The previous (soft-deleted) patients row + (canceled) subscription row stay in place as historical record. Each onboarding chapter has its own audit trail via per-rowentity_id. The portablepatient_profilesrow is reused (one identity, many processing chapters). When 1B.9 ships the consents withdrawal flow, the cancel-subscription step above is what makes the historical chapter stay queryable as "patient was at clinic A from time X to time Y." (Mechanically enforced by the cascade trigger from 000008.) - [x]
RequireConsent(purpose)middleware — analogous toRequireOrgEntitlement/RequirePlanEntitlement. Used by features that require a specific consent (telemedicine flow, AI inference, biometric capture). Returns403 consent_requiredwithmissing_purpose. Foundation-tier stub: no production route consumes it today; F3 / F5 / F9 wire it up when those features land. Resolves scope from the catalog (platform-scope = no org context; org-scope = current org).
RLS + permissions:
- [x] RLS on
consents: org staff with newconsents.view_orgsee consents in their org; patients see their own across all orgs (viacurrent_human_patient_profile_ids()); platform-scope rows (NULLorganization_id) visible to the patient and to break-glass-elevated staff via 1B.11. - [x] RLS on
consent_purposes+consent_purpose_versions: SELECT for everyone (these are catalog rows; the text is by definition public). Mutations via AdminPool only (catalog edits are migrations or admin-tool actions). - [x] Permission seeding:
consents.view_org(granted to specialist + customer_support + admin),consents.manage(admin + customer_support — staff-action grantor path).
Trail view:
- [x] Patient-side:
GET /v1/me/consentsreturns the patient's full consent history grouped by(organization_id, purpose_code)with current state + history rows. Optional?organization_id=filter narrows to one clinic. - [x] Staff-side:
GET /v1/organizations/{id}/patients/{patient_id}/consentsreturns the same shape scoped to that patient at that clinic (gated byconsents.view_org). - [x] Catalog:
GET /v1/consent-purposes?organization_id={id}returns each purpose paired with the latest applicable version body (org override wins, platform-default fallback). Used by the sign-up consent block + patient settings UI. - [x] Patient settings UI surface in 1D.3:
/(patient)/consents/page.tsxrenders the trail grouped by platform-scope and org-scope. Withdrawable purposes (legal_basis = 'consent') get inline Grant / Withdraw buttons; non-withdrawable purposes show "delete account to revoke" (platform) or "leave clinic to revoke" (org) copy. Server actions hit/v1/me/consentsand/v1/me/consents/{id}/withdraw. Sign-up consent block ships in/onboard(OnboardForm+JoinClinicButton) — required purposes block submit until checked; server enforces regardless. Sidebar entry undernav.consents(en + ro).
1B.10 Privacy Notice Templates
Platform provides a versioned template; clinic fills placeholders + selects toggleable sections; the assembled markdown becomes the
org_privacy_notice(andorg_terms) text the patient sees and accepts. Maintains the controller/processor split — the clinic owns the legal artefact, the platform provides the scaffolding. Substantive design rationale in decisions.md → Why clinic legal documents are templated, not forms.
Scope decision (locked): the same machinery covers both org_terms and org_privacy_notice — both are clinic-authored, versioned, template-assembled, and gate onboarding. Discriminator column document_type IN ('terms', 'privacy_notice'). One editor surface, one publish path.
Schema (shipped in 000009):
- [x]
legal_document_templates— platform-level versioned templates:(id, document_type, version, locale, body_with_placeholders, required_placeholders TEXT[], toggleable_sections JSONB, published_at, published_by_principal_id, created_at). UNIQUE(document_type, version, locale). One row per locale; placeholder values + section toggles are global, locale-agnostic.required_placeholdersis the contract the publish path enforces. - [x]
organization_legal_documents— per-org editor state:(id, organization_id, document_type, source_template_version, placeholder_values JSONB, included_sections JSONB, published_version, last_reviewed_by_principal_id, last_reviewed_at, created_at, updated_at). UNIQUE(organization_id, document_type). Mutable: clinic admin saves drafts repeatedly.published_versioncorresponds to theconsent_purpose_versionsrow minted at publish time. - [x] Org-create trigger extended:
create_organization_companion_rows(000003) updated via CREATE OR REPLACE to insert twoorganization_legal_documentsrows (one per type) withpublished_version = NULLpointing at the latest available template version. Backfill DO block in 000009 covers pre-existing orgs. - [x] RLS:
legal_document_templatesSELECT for everyone (catalog text is public-by-design).organization_legal_documentsSELECT for org members (editor state names DPO email + registered address); UPDATE gated byorganizations.manage_privacy_notice. INSERT via AdminPool only (trigger + backfill); DELETE never (cascade with organizations). - [x] Permission seeded:
organizations.manage_privacy_noticegranted to theadminsystem role template. - [x] Seed templates v1 (en + ro) for both
termsandprivacy_noticewith placeholder bodies + the three foundation toggleable sections (video_recording,biometric_capture,cross_border_transfer). Real ANSPDCP-compliant text + lawyer review lands as part of the Romanian compliance pass (Deferred Foundation Extensions).
Backend domain (internal/core/domain/legaldocument/):
- [x] Repo: CRUD on
organization_legal_documents(RLS-gated UPDATE on ctx tx) + read onlegal_document_templates.CallPublishinvokes the SECURITY DEFINERpublish_legal_documentfunction from the request's ctx tx so both writes stay atomic and the function's permission/scope checks see the calling principal. - [x] Service:
- [x]
Assemble(template, placeholderValues, includedSections) (string, error)— pure function (testable; no DB). Replacesplaceholders, validatesrequired_placeholders, appends toggleable sections in catalog order with explicit-toggle-wins-default-fallback semantics. Unit tests cover the four key cases (placeholder substitution, missing required, blank required, default-section fallback). - [x]
SaveDraft(orgID, docType, placeholderValues, includedSections, principalID)— UPDATE onorganization_legal_documentsvia ctx tx; stampslast_reviewed_*. RLS denial →403 forbidden. - [x]
Publish(orgID, docType, principalID)— full validation chain (document_type valid → editor row exists → templates exist → required placeholders satisfied → assemble per locale → callpublish_legal_document(...)). Returns(before, PublishResult); handler auditsconsent_purpose_versionCREATE +organization_legal_documentUPDATE in two rows. - [x]
Preview(orgID, docType, locale)— assembles one locale without persisting. Same validation as Publish minus the required-placeholder pre-check (a partial draft can still preview). - [x]
ListConsoleTemplates()— latest per(document_type, locale)for the Console template-management surface.
- [x]
- [x] Handlers (clinic admin /
/v1/organizations/{id}/legal-documents/...):- [x]
GET /— list both editor rows (terms + privacy_notice). Used by dashboard task card + onboarding gate. - [x]
GET /{type}— current draft + latest templates per locale. - [x]
PUT /{type}— save draft (no version bump). - [x]
POST /{type}/preview— assemble without persisting. - [x]
POST /{type}/publish— version-bump path.
- [x]
- [x] Handlers (Console /
/v1/admin/legal-document-templates):- [x]
GET /— superadmin-gated list of latest platform templates per(document_type, locale). - [x]
POST /— superadmin publishes a new platform template version. Atomic per-locale write atMAX(version)+1; server validates document_type, locale uniqueness, non-blank bodies, and that every required placeholder has amarker in every locale's body. One auditlegal_document_templateCREATE per locale row.
- [x]
- [x] Permission
organizations.manage_privacy_noticeexposed asauth.PermOrganizationsManagePrivacyNotice; route layer gates every clinic-admin endpoint, RLS UPDATE policy +publish_legal_documentdefense-in-depth re-check enforce it at the DB.
Onboarding integration:
- [x]
POST /v1/portal/onboardreturns409 org_setup_incompletewhen eitherorganization_legal_documents.published_version IS NULLfor the target org. Gate runs pre-tx (afterself_signup_disabledcheck, before profile resolution) so a fail leaves zero rows behind. Error context lists the unpublished document_types so the portal can surface a useful "this clinic is finalising" message. Defense-in-depth: without this gate a fresh-but-unfinished clinic would have its patients implicitly accepting the platform-defaultconsent_purpose_versionsrows for purposes the clinic is supposed to author. Test harnessHarness.PublishLegalDocumentsshort-circuits the editor for tests that drive/portal/onboard. Integration tests ininternal/test/rlstest/legal_document_gate_test.gocover the rejected-when-unpublished, rejected-when-half-published, and passes-when-both-published paths.
Frontend (Clinic Admin / 1D.2):
- [x]
/legal-documents/page.tsx(list) +/legal-documents/[type]/page.tsx(editor). Editor renders one input perrequired_placeholderskey + one checkbox pertoggleable_sections[].key, seeded from the existing draft (or template defaults). Save Draft / Publish buttons; Publish opens a confirmation modal explaining "every existing patient re-consents on next login." Calls API client methods through server actions (saveDraftAction,publishAction);publishActionrunssaveDraftthenpublishso the version is minted against the latest values. Gated byorganizations.manage_privacy_noticeat the route layer; the sidebar item is suppressed for non-permission-holders. - [x] Dashboard task card on
(dashboard)/page.tsx: probeslistOrganizationLegalDocumentsand renders one of three states: hidden (no permission, or all docs published + on the latest template), "Complete your legal documents" (any unpublished — onboarding blocked; priority over the stale prompt), or "Review template update" (all published but at least one editor row is on a stale platform-template version). Suppressed for principals without the permission. Probe failures are swallowed.
Frontend (Console / 1D.1):
- [x] Templates list page + "Publish new template version" form (raw markdown body + sections JSON per-locale tabs, comma-separated required-placeholders input). Both audit-logged. Foundation-level scope: not a real markdown editor — platform team pastes from a lawyer-reviewed Word doc. Polish (markdown preview, structured section editor) is deliberately deferred since this surface is used rarely by a few people. See decisions.md → Why clinic legal documents are templated, not forms for the editor-vs-form-builder boundary.
- [x] Cross-tenant read-only view of which orgs are stale on the latest platform template.
GET /v1/admin/legal-document-templates/stale-orgs(superadmin) returns every(org, document_type)wheresource_template_version < MAX(legal_document_templates.version), sorted document_type → source ASC → org name. Surfaced on the Console templates page as a per-doc-type table (org, source v, latest v, published v).
Re-consent semantics (1B.9 backend, surfaced by 1D.3 portal modal):
- [x] Clinic re-publishes → new
consent_purpose_versionsrow at version N+1 → re-consent middleware (RequireCurrentConsents) catches existing patients on next request → 412 with{code: "consent_required", missing: [...]}→ portal(patient)layout probesGET /v1/me/required-consents→ blockingReconsentModalrenders missing purposes' bodies → patient accepts →acceptRequiredConsentsserver action grants every missing purpose; consents service supersedes the v1 active grant withwithdrawal_reason='superseded_by_v{N}'and inserts the v_n row. Theorg_termscascade trigger skips supersession-driven withdrawals (withdrawal_reason LIKE 'superseded_by_v%') so re-acceptance does NOT trigger leave-clinic; locked byTestConsents_VersionSupersessionDoesNotFireOrgTermsCascade(subtests cover bothorg_termsandorg_privacy_notice). - [x] Platform template version bump → clinics with
source_template_version < latestsee a "Review template update" prompt. Closed via three pieces: (1)GET /v1/organizations/{id}/legal-documentsandGET .../{type}now returnlatest_template_versionper row so every consumer can compute stale-ness without extra round-trips; (2) clinic dashboard task card switches to "Review template update" copy + CTA when the org is published-but-stale (priority over "Complete your legal documents" only when the org is fully published); (3) editor page renders an amber banner with a "Refresh to v_n" button that calls the newPOST /v1/organizations/{id}/legal-documents/{type}/refresh-source-templateendpoint (audited UPDATE on the editor row, returns 409already_currentfor idempotent UI guard). Refresh bumpssource_template_versionand preserves existing draft values for unchanged keys; new keys in v_n surface as empty inputs the admin must fill before re-publishing.
1B.11 Platform Break-Glass Access
Controlled, audited, transparent access for platform staff to identifiable cross-tenant patient data. The processor boundary is the default; break-glass is the documented exception path. Lives in foundation because every Console surface that touches patient data has to know whether it's always-on or break-glass-gated.
Status: primitive shipped. Schema, middleware, elevation endpoints, audit attribution, notification fan-out via 1A.18, and integration tests all closed. Console surface classification + Clinic admin banner light up as 1D.1 / 1D.2 surfaces ship — the foundation primitive is the gate they consume.
Decisions locked during implementation:
- Platform-permission model — pure Go, not data-driven. Per-org RBAC (
permissions/roles/role_permissions) is for tenant authorization checked by RLS; platform permissions are checked in Go middleware only. Encodingbreak_glass.*codes + the role → permission map in services/api/internal/core/principal/platform_permissions.go (withSubject.HasPlatformPermission(code)as the call site) keeps the two models cleanly separated. New platform permission OR support_engineer scope adjustment is a single-file Go change. Theplatform_role_permissionstable reserved at 000002:416 lands when a real consumer needs DB-side platform-permission joins; until then the speculation cost is zero. - Schema mutations on AdminPool, not AppPool. The original spec wording said "AppPool-INSERT via the elevation endpoint"; we landed on AdminPool-only writes with
REVOKE INSERT, UPDATE, DELETE, TRUNCATE FROM restartix_app, mirroringaudit_logandnotifications. The service-layerSubject.HasPlatformPermissionis the load-bearing authorization gate; the REVOKE is the DB-layer floor. AppPool-with-WITH-CHECK would gratuitously couple session inserts to the request-tx GUC binding lifecycle, which the test path doesn't always have. - Active-session uniqueness via partial unique index.
(principal_id, organization_id, scope) WHERE closed_at IS NULL. Same-principal double-clicked elevation modal hits the constraint; service catches the violation and returns the existing session. No duplicate audit rows, no duplicate notification fan-out (per-admin idempotency keys dedup at the notify layer). - Lazy expiry finalization. A row with
closed_at IS NULL AND expires_at < NOW()is closed on the admin pool the first timeService.ActiveForreads it.closed_at = expires_at(system-finalized at the natural-end moment),closed_by_principal_id = NULL(system close). Keeps the unique index honest without requiring a sweeper cron. - No redundant audit row from
Service.Open/Close. The session row IS the artifact; the open + close events ride on the calling handler's audit row (same shape as 1A.18 notifications). audit_log carriesbreak_glass_idlinking back via the GUC bound byset_app_break_glass_session_id+ the redefinedaudit_log_insert(this migration's CREATE OR REPLACE).
Schema (shipped in 000011):
- [x]
break_glass_sessions—(id, principal_id, organization_id, scope, reason_category, reason_text, reason_ref NULL, opened_at, expires_at, closed_at, closed_by_principal_id NULL).scope IN ('patient_list', 'patient_detail', 'audit_full', 'cross_org_lookup', 'org_management')(the last added in 1B.11.x).reason_category IN ('support_ticket', 'security_incident', 'dsar_routing', 'fraud_investigation', 'platform_engineering'). CHECKlength(btrim(reason_text)) >= 10+ CHECKexpires_at > opened_at AND expires_at <= opened_at + INTERVAL '4 hours'. RLS-enabled (SELECT for own + org members withaudit_log.view_org); DML REVOKE'd fromrestartix_appso writes go through admin pool only. Partial unique(principal_id, organization_id, scope) WHERE closed_at IS NULLfor active-session uniqueness; covering indexes on(organization_id, opened_at DESC),(principal_id, opened_at DESC), partial(opened_at DESC) WHERE closed_at IS NULLfor the Console "all currently active" surface. - [x]
audit_log.break_glass_id UUID NULLwas already reserved in 000001;audit_log_insertredefined in 000011 to populate it fromcurrent_app_break_glass_id()(a session GUC bound byset_app_break_glass_session_id). Every audit row written inside an elevated session carriesaction_context = 'break_glass'+break_glass_id = <session.id>automatically — no per-handler plumbing. - [x] Data classification entries for every column registered in data-classification.md → Break-glass sessions. Reason fields ship
pii_basic+support_export-only (operator free-text may carry support context).
Permissions + middleware (shipped):
- [x] Platform permission constants + role → perms map in platform_permissions.go:
RoleSupportEngineer,PlatformPermBreakGlass{PatientList,PatientDetail,AuditFull,CrossOrgLookup,Manage}. Superadmin holds everything viaIsSuperadmin == true;support_engineerholdsPatientList + PatientDetail + AuditFull(CrossOrgLookup + Manage stay superadmin-only).Subject.HasPlatformPermission(code)is the Go call site. - [x]
RequireBreakGlass(scope)middleware factory: reads URL{paramName}for the target org id, looks up the active session, returns 403break_glass_required(no session) or 410break_glass_expired(session expired but not closed). Active match binds the session GUC + attaches the session id to context viaBreakGlassSessionIDFromContext. - [x] Elevation endpoint
POST /v1/break-glass/sessions— body{organization_id, scope, reason_category, reason_text, reason_ref?, expires_in_minutes}. Service validates platform permission viaHasPlatformPermission(scopeToPermission(scope)); partial-unique-index conflict returns existing session instead of 409. Per-principal rate-limited viaRATELIMIT_BREAK_GLASS_OPEN_LIMIT(default 5/min). - [x] Close endpoint
POST /v1/break-glass/sessions/{id}/close. Self-close path checksrow.principal_id == subject.principal_id; manage-close path checkssubject.HasPlatformPermission(PlatformPermBreakGlassManage). Both audit-logged withentity_type = 'break_glass_session'. - [x]
GET /v1/break-glass/sessions[?org_id=&principal_id=&only_active=&limit=&offset=]— RLS-scoped read on the request tx for the elevating principal + org members; admin-pool read for superadmin /break_glass.manageholders (cross-org Console "all active sessions" surface).GET /v1/break-glass/sessions/{id}for detail.
Notification (shipped):
- [x] Always-on email to the clinic admin(s) when a break-glass session opens against their org. Sent via 1A.18:
Service.Openqueriesorganization_membershipsfor principals with the systemadminrole and callsnotify.Send(notify.To(admin), CategoryBreakGlassOpened, data, IdempotencyKey(<session>:<admin>), Org(org))per recipient. Per-admin idempotency keys dedup retries at the notification layer. Template fields (org_name,staff_name,staff_email,scope,reason_*,opened_at/expires_atastime.Time) loaded viaFindOrgName+FindHumanIdentityrepository helpers. - [x] Audit attribution end-to-end: every audit row written inside an elevated session carries
action_context = 'break_glass'+break_glass_id = <session.id>. Locked by TestBreakGlass_Middleware_GatesProtectedRouteAndStampsAudit.
Open follow-ups (consumed by 1D.1 / 1D.2 / 1E):
- [x] Expired-session sweeper for
break_glass_sessions+patient_impersonation_sessions—cmd/expired-sessions-sweep(EventBridge Scheduler → ECS RunTask, every 15 min, smallest Fargate sizing). Per-domainSweepExpired(ctx, repo, now, batchSize)free functions ininternal/core/domain/{breakglass,impersonation}/sweep.gofind rows withclosed_at IS NULL AND expires_at < now, call the existingCloseAdminpath (stampsclosed_at = expires_at,closed_by_principal_id = NULL), and emit a system-attributedaudit_log UPDATErow withaction_context = 'break_glass'/'impersonation'. Race-safe via the underlyingWHERE closed_at IS NULL. 3-test rlstest acceptance atexpired_sessions_sweep_test.gocovers close-stamping, audit attribution, skip-active, skip-already-closed, idempotency-on-rerun. Partial-scope follow-up (not load-bearing): the middleware lazy-finalize path (Service.ActiveFor/Service.Open) still closes expired rows on next-request WITHOUT writing an audit row — sessions touched within ~15 min of expiry have no close audit row. Extending that path to mirror the sweep's system-attributed audit shape is a separate cleanup; the sweeper alone closes the originally-stated gap ("sessions opened and never re-touched stay closed_at=NULL forever"). - [ ] In-app banner in Clinic admin UI (1D.2) showing active and recent break-glass sessions against the org. Reads
GET /v1/break-glass/sessions?org_id={current_org_id}(RLS-scoped viaaudit_log.view_orgSELECT policy). Lights up as 1D.2 admin surfaces ship. - [ ] Console surface classification (1D.1). Foundation ships the
RequireBreakGlass(scope)primitive; mounting it on each Console route is 1D.1 work alongside the actual surfaces. Specific classifications:- Org list, org detail (profile, billing, plan, entitlements) →
aggregate - Patient counter per org →
aggregate - Audit log metadata cross-tenant (timestamps, actions, status codes — diff content masked) →
aggregate - Audit log full content cross-tenant (with diffs, IPs, request bodies) →
break_glass:audit_full - Patient list per org →
break_glass:patient_list - Patient detail per org (profile, subscriptions, consents) →
break_glass:patient_detail - Cross-org patient lookup (rare, narrow) →
break_glass:cross_org_lookup - Humans/users page filtered to staff principals only →
aggregate - Humans/users page including patient principals →
break_glass:patient_list
- Org list, org detail (profile, billing, plan, entitlements) →
Cross-tenant guardrail (locked at the foundation level):
- [x] Foundation principle codified in implementation-plan.md: cross-tenant features operate on anonymised data only. Any feature that needs identifiable cross-tenant data must either go through the break-glass pattern (one-shot, audited, narrow) or surface as an explicit ADR-worthy request to break the rule. Joint controllership avoidance is the underlying reason — see decisions.md → Why clinic is controller, platform is processor.
- [x] DSAR routing flows through the clinic, not the platform.
[email protected]auto-responder + portal self-service ("Your clinics" list at/v1/me.patient_org_ids) handle 99% of misdirected requests without break-glass. Genuinely orphaned requests (ex-patient, no active account) are break-glass withreason_category='dsar_routing'.
1B.11.x Console-Side Break-Glass Primitive
Sub-phase opened 2026-05-10. The 1B.11 backend is complete; this slice closes the Console-side wiring that turns the gate from theatre into reality. Reusable primitive, not per-feature one-offs — see decisions.md → Why one Console-side break-glass primitive and patterns.md → P55.
Status: shipped. Backend smarter middleware + new scope, Console session context + hook + modal + banner + gate wrapper + action wrapper, four routes mounted (PATCH /:id, POST/DELETE /members, POST /staff-invitations), acceptance tests landed, ADR + pattern entry documented.
Decisions locked during implementation:
- One scope (
org_management), not per-resource. Covers staff/role/member/settings writes. Single elevation session covers a related task without re-prompting per click; theaudit_log.actionrow records the specific action so blast radius is reconstructable. - Smarter middleware on the same route, not duplicate Console routes.
RequirePerOrgPermissionOrBreakGlass(permission, scope, svc, "id")admits tenant principals via per-org permission and platform principals via active break-glass session. Same staff-invitations route serves the Clinic app (tenant) and the Console (platform-with-elevation). Avoids the doubled-routes-that-drift failure mode. - Reads stay always-on; only writes need elevation. Mirrors the patient surfaces — aggregate view always-on, identifiable lists / writes elevation-gated. Staff data isn't PHI; the controller-vs-processor risk is on writes, not list reads.
- Platform principals don't bypass even when org members. The whole point is every cross-tenant write links to an open session — incidental membership doesn't bypass.
- Console-side fetch is
React.cache-shared, not P42-tagged. Session timing is load-bearing — a session expired 30s ago must not be cached as active.React.cachekeeps layout + descendant page on one round-trip without crossing the staleness threshold.
Backend (shipped):
- [x] Add
org_managementto thechk_break_glass_sessions_scopeCHECK in 000011_break_glass.up.sql (pre-prod, edited in place per CLAUDE.md "migrations editable pre-production"). AddScopeOrgManagementto breakglass/model.go +IsValid+scopeToPermissionmapping. - [x]
PlatformPermBreakGlassOrgManagementin platform_permissions.go; granted tosupport_engineer. Superadmin holds viaIsSuperadmin == true. - [x]
middleware.RequirePerOrgPermissionOrBreakGlass(permission, scope, svc, paramName)— splits by principal type; tenant viaSubject.HasPermission, platform via active session lookup + audit-GUC binding. - [x] Mount on
PATCH /v1/organizations/{id},POST /v1/organizations/{id}/members,DELETE /v1/organizations/{id}/members/{principalId},POST /v1/organizations/{id}/staff-invitationsin routes.go. Read endpoints (GET /members,GET /staff-invitations,GET /) stay onRequirePermissiononly — always-on for operational support. - [x] OpenAPI
BreakGlassScopeenum updated to includeorg_management;packages/api-client/src/generated.ts+services/api/internal/core/server/openapi/spec.gen.goregenerated. - [x] Acceptance tests in per_org_permission_or_break_glass_test.go: tenant admin → 200; superadmin without session → 403
break_glass_required; superadmin with session → 200 +bg=<session_id>in handler context (proves the GUC bind path); tenant without permission → 403; platform expired session → 410break_glass_expired.
Console (shipped):
- [x]
apps/console/lib/break-glass.ts—getActiveBreakGlassSessionsForOrg(orgId)(server-only,React.cache-wrapped) +findActiveSession(sessions, scope). - [x]
apps/console/lib/break-glass-actions.ts—openBreakGlassSessionAction+closeBreakGlassSessionActionserver actions calling the api-client; standardrevalidatePath+refresh()chain. - [x]
apps/console/lib/with-break-glass.ts—withBreakGlass(fn)server-action wrapper; surfacesbreak_glass_required/break_glass_expiredas a typed sentinel foruseActionStateconsumers. - [x]
apps/console/components/break-glass/break-glass-session-provider.tsx— context provider +useBreakGlassSession(scope)+useAllActiveBreakGlassSessions()hooks. Honours P48 viauseServerSyncedState. - [x]
apps/console/components/break-glass/elevation-modal.tsx— real backend-connected modal (replaces the URL-stub?bg=activemodal that lived undercomponents/organizations/). - [x]
apps/console/components/break-glass/active-session-banner.tsx— pinned in the clinic-detail layout; per-session row with reason + minutes-left + close button (calls real backendcloseBreakGlassSession). - [x]
apps/console/components/break-glass/require-break-glass.tsx—<RequireBreakGlass scope=…>client wrapper. Defense-in-depth UI gate. - [x] Clinic-detail layout wraps children in
<BreakGlassSessionProvider>+ renders<ActiveBreakGlassBanner>above content; old URL-state stubclinic-detail-banners.tsxdeleted. - [x] Patients page rewired from
searchParams.bg === "active"URL state tofindActiveSession(sessions, "patient_list")against the server-fetched session list. Old stubcomponents/organizations/elevation-modal.tsxdeleted; new modal undercomponents/break-glass/. - [x]
InviteStaffDialog— first integration consumer of the primitive end-to-end. Dialog wraps its content in<RequireBreakGlass scope="org_management">so the body switches between elevation prompt (no session) and the invite form (session active).inviteStaffActionwraps the api call inwithBreakGlass(...); if the session expires between dialog-open and submit, the result'sneedsElevationfield flips the body back to the elevation prompt without dropping the user out. Validates the full flow: tenant principals would consume the same route via per-org permission from the Clinic app; Console superadmins consume it via elevation.
Open follow-ups (next consumers, not blocking 1B.11.x close):
- [ ] Mount
RequirePerOrgPermissionOrBreakGlass(perm, ScopeOrgManagement, …)on org-settings + designations + webhooks + integrations + privacy-notice + domains write routes when those Console surfaces light up. Mechanical wraps; no design left. - [ ] Member role-change + remove flows on the Console members page mount the
RequireBreakGlassUI gate + send actions throughwithBreakGlass. Backend already gated.
Inverted onboarding paths: clinic admins invite staff and patients by email (personal invitations); clinics also mint multi-use share-links for QR codes and intake forms (patient-only). One mechanism (Clerk Invitations API) covers both invitation kinds; share-link is a separate code-anchored primitive. Platform-engineer invites are deferred to a non-foundation feature — they will live in a sibling
platform_invitestable (purely additive; foundation contract preserved).
Status: backend complete. Personal-invitation primitive (staff + patient) + share-links primitive (patient-only) + portal-onboarding integration (tier resolution from invite/share-link, atomic redeem, mark-consumed) + resend endpoint + 12 RLS integration tests all shipped. Only the 1D.2 / 1D.3 admin + portal UI surfaces remain — those belong in the 1D slice on top.
Decisions locked during implementation:
- Mechanism: Clerk's
InvitationsAPI (invitation.Create) — not custom-token-via-our-SES. While we're on Clerk for auth emails (Layer 1+2 customisation: dashboard template + custom from-domain), the invite email rides the same pipeline. When we flip to BYO ESP (Layer 3) before the dedicated-mode launch, both Clerk auth emails and invite emails migrate together. Avoiding the mixed "our-SES for invites + Clerk for auth" middle state. The shipped 1A.18MemberInvitetemplate stays in repo as the visual source-of-truth; copy into Clerk's dashboard "Invitation" template until BYO ESP lands. - No Svix webhook receiver. Acceptance is detected on every authenticated request: the auth middleware's
OnAuthHookcallsinvites.Service.BindForPrincipal(principalID, email), which finds every open invite matching the email and binds them. No webhook signature verification, noemails.createdsubscription. The hook fires on EVERY authenticated request — not just first-sight — because cross-clinic invites for principals that already exist in our system never triggercreated=true(a specialist hired by a second clinic, a patient referred to a second clinic). The bind is idempotent by design: zero open invites = single index hit, zero rows, return; pending invites = one bind, subsequent calls find consumed_at set and return zero. Re-evaluate Clerk webhooks when a real event use case (e.g.email.bouncedcleanup,user.deletedcascade) justifies the webhook surface area. - Single mechanism, two binding paths. The webhook-free bind step dispatches by
kind:staff→organization_membershipsrow created in one admin tx (accepted_at+consumed_at+ the row'sinvited_at/invited_byreserved in 1A.12).patient→accepted_atset only; portal onboarding step 2 setsconsumed_atwhen thepatients+patient_subscriptionschain commits. Consents still grant explicitly on the consent gates — invite acceptance never bypasses them.
- Schema co-locates both invite kinds in
organization_invites. Org-scoped, named for what it is. CHECK constraint pins (kind=staff⇒role_idset,patient_tier_idNULL) and (kind=patient⇒role_idNULL). Partial unique index(organization_id, lower(email), kind) WHERE consumed_at IS NULL AND revoked_at IS NULLblocks duplicate open invites; the service catches the violation and returnsErrPendingInviteExistsinstead of a 500. - No
audit_log_insertredefinition. Invitations are normal CRUD events;entity_type='organization_invite'rows attribute correctly via the existing actor + envelope chain. Break-glass and impersonation redefineaudit_log_insertbecause they needaction_contextoverrides; nothing here does. organization_memberships.principal_idstays NOT NULL. The original spec wording said "pending row with no principal_id yet" — we landed on a separateorganization_invitesshadow table that binds to a freshly created principal at the auth-middleware step, keeping the membership table coherent.
Schema (shipped in 000012):
- [x]
organization_invites—(id, organization_id, provider_invitation_id, email, kind, role_id NULL, patient_tier_id NULL, invited_by_principal_id, invited_at, expires_at, accepted_at NULL, accepted_principal_id NULL, consumed_at NULL, revoked_at NULL, revoked_by_principal_id NULL). RLS policies split by kind:manage_membersfor staff invites,patients.managefor patient invites. AppPool DML revoked; service writes through admin pool. - [x] Partial unique index for "one open invite per (org, email, kind)" + supporting indexes for cross-org email scan (the bind path) and clinic admin oversight (org_recent).
- [x] No
audit_logcolumn added — see decisions block above.
Clerk SDK + service (shipped):
- [x] internal/core/auth/clerk/invitations.go wraps the Clerk Backend SDK's
invitationpackage.Createissues a magic-link invite with ourpublic_metadata(invite_kind + organization_id) and the kind-specificredirect_url(clinic.* for staff, portal.* for patient).Revokeis best-effort — the local row is the source of truth for our bind path.Notify=trueuntil BYO ESP lands. - [x] internal/core/domain/invites/ — model, errors, repository, service, handler.
CreateStaff/CreatePatient/Resend/Revoke/List/Getplus the foundation hookBindForPrincipalconsumed by the auth middleware, andMarkPatientConsumed+FindAcceptedPatientInviteForPrincipalOrgconsumed by portal onboarding step 2. - [x] internal/core/domain/sharelinks/ — model, errors, repository, service, handler.
Mint/List/Get/Revokefor clinic admins;ResolvePublicfor the portal landing page (admin-pool, no auth);RedeemForPatientOnTxruns inside the portal-onboarding admin tx so the use_count increment commits/rolls-back together with the patients chain. - [x] internal/core/middleware/auth.go extended with
OnAuthHook+WithOnAuthHookoption. Auth middleware fires registered hooks on every authenticated request afteraudit.SetActor, inside a panic recover so a hook crash doesn't lock anyone out. Wired in routes.go to callinvitesService.BindForPrincipal. Every-request semantics (vs. first-sight only) is what makes cross-clinic invites for existing principals work — the bind is idempotent by design and returns immediately on a single index hit when there are no open invites.
Endpoints (shipped):
- [x]
POST /v1/organizations/{id}/staff-invitations— gated byorganizations.manage_members. Body{email, role_code, expires_in_days?}. Defaults: 7 days, max 30. - [x]
POST /v1/organizations/{id}/patient-invitations— gated bypatients.manage. Body{email, patient_tier_id?, expires_in_days?}.patient_tier_idNULL = use org default tier at consume time. - [x]
GET /v1/organizations/{id}/staff-invitations[?status=&limit=&offset=]andGET /.../patient-invitations[...]— RLS-gated lists.statusfilter:pending/accepted/consumed/revoked/expired. - [x]
GET /v1/organizations/{id}/invitations/{inviteId}— detail. RLS-gated; the row's kind decides which permission admits. - [x]
POST /v1/organizations/{id}/invitations/{inviteId}/revoke— service-layer permission gate branches onrow.kind. Revokes the Clerk-side invitation too (best-effort). - [x]
POST /v1/organizations/{id}/invitations/{inviteId}/resend— atomically rotates the provider-side invitation: mints a new one with a fresh magic-link URL + expiry window, revokes the old one. Local row'sprovider_invitation_idis updated in place; rejects on already-accepted / already-revoked rows. - [x] Share-link endpoints:
POST/GET /v1/organizations/{id}/share-links+GET /v1/organizations/{id}/share-links/{shareLinkId}+POST /v1/organizations/{id}/share-links/{shareLinkId}/revoke(all gated byorganizations.manage_share_links); publicGET /v1/public/share-links/{code}returns{org_name, slug, tier_name, valid}for the portal landing page (per-IP rate-limited under public_resolve). - [x]
POST /v1/portal/onboardaccepts optionalshare_link_codein body, resolves pending patient invite by(accepted_principal_id = me, organization_id = current), picks tier with precedenceinvite > share-link > org default, atomically incrementsshare_links.use_countunder "still active" predicate, marks the patient invite consumed — all in the same admin tx. Per-IP rate-limited via the newshare_link_redeempolicy. - [x] Audit-logged everywhere: invite create / revoke / resend / consume / membership create on bind (with explicit actor override since the bind hook fires before
audit.SetActor); share-link mint / revoke / redeem; patient invite acceptance. - [x]
principal.PermOrganizationsManageShareLinksconstant + Clerk webhook-free architecture documented inline.
Tests (shipped):
- [x] invites_test.go — 5 cases covering staff bind creates membership + marks accepted+consumed; patient bind sets accepted only (consumed_at NULL); revoked invite skipped on bind; expired invite skipped on bind; partial unique index blocks duplicate open invites per (org, email, kind).
- [x] sharelinks_test.go — 7 cases covering mint persistence + generated code; permission gate denies non-admin; atomic use_count increment; max_uses cap enforced under race; cross-org code rejection; public resolve returns org metadata; revoked code surfaces as 410.
- [x]
auth.InvitationProvidertest stub in api_harness.go — consumption paths (bind + redeem) don't reach the identity provider; create-side helpers get a synthetictest_provider_*id.
Open follow-ups (UI):
- [ ] 1D.3 Portal (next-active).
/join/{code}share-link landing page → callsGET /v1/public/share-links/{code}→ branded "Join Acme Clinic" pre-auth CTA → Clerk sign-up → onboarding step 2 withshare_link_codein body./onboardpage surfaces a "you've been invited to Acme Clinic" banner when a pending patient invite exists for(principalID, currentOrgID). No new backend endpoints required — this is pure portal-app work. - [ ] 1D.2 Clinic admin (deferred until the clinic-app refresh — see 1D.2 below). Staff invite list + form + revoke + resend; patient invite list + form + revoke + resend; share-link mint form (tier picker, max_uses, expires_at, note) + list + revoke + copy-code + QR;
/welcomelanding page (staff invite magic-link redirect target). Backend contract is stable — no Go changes needed when this lands.
1B.13 Patient Impersonation Sessions
Clinic-internal access pattern for staff acting on a patient's behalf — assisted form fill, accessibility help, language barriers, troubleshooting. Lives entirely within a clinic's controllership scope (this is not a controller/processor concern; it's a "make staff actions on patient data reviewable" concern). Lands in foundation so the primitive is locked before any feature consumes it.
Design note. This is a deliberately minimal primitive. The audit + transparency mechanism is the load-bearing part; per-action-type scopes, granular permissions, real-time notification toggles can all be added later if a real product need surfaces. Foundation discipline argues against speculation here — clinics trust their staff (they hired them and granted
patients.manage); finer-grain controls add no security on top of that.
Status: backend complete. Schema, RLS WITH-CHECK policies (staff self + patients.manage org-member + target-patient self), set_app_impersonation_session_id GUC + current_app_impersonation_id reader, redefined audit_log_insert carrying impersonation_id, patients.impersonate permission seeded with admin + customer_support grants, full domain (model/errors/repo/service/handler), RequireImpersonation middleware, open/close/list/get endpoints, /v1/me/patient-impersonation-sessions patient self-read, cross-context exclusion guard (one elevated session at a time per principal × org, bidirectional with break-glass), per-principal rate limit, and 14 RLS integration tests all shipped. Only the 1D.2 / 1D.3 UI surfaces remain — those belong in the 1D slice on top.
Decisions locked during implementation:
- Simple authorship semantics — no data-layer rebind. The original spec's split-author model (forms appearing patient-authored at the data layer + staff actor in audit log) was considered and dropped at design time: every Layer 2+ feature with an "author" column would have to remember
coalesce(acting_as_patient_id, current_principal_id), and one missed call site would leak staff names into patient-facing records. Foundation discipline argues against the cross-cutting invariant. Instead:actor = current_app_principal_id()always; the audit row carriesimpersonation_idlinking back; consumers that want "who really did this" follow the link. Clean foundation invariant, zero per-feature glue. - Active-session uniqueness
(staff_principal, organization). One impersonation at a time per staff member per clinic. Mental model: "I'm currently helping Alice; close that before starting Bob." Partial unique indexclosed_at IS NULL. - Patient access history reads
patient_impersonation_sessions, notaudit_log. No patient SELECT policy onaudit_log(kept staff/forensic-only). Patient sees session metadata via the table's self-read RLS; per-action drill-down deferred to the futurepatient_account_activityprojection (see Deferred Foundation Extensions). - AppPool + RLS WITH CHECK (not AdminPool). The opening principal is an authenticated org member with
patients.impersonateand full RLS context. Same write-side pattern asconsentsandorganization_invites, NOT the audit_log/notifications/break_glass AdminPool pattern (which exist because their writers don't have an org-scoped principal context). Break-glass's AdminPool design was driven by platform staff lacking tenant membership; impersonation doesn't have that issue. - Cross-context exclusion bidirectional.
impersonation.Service.Openrejects when the principal already has an active break-glass session for the same(principal × org);breakglass.Service.Openrejects symmetrically. The redefinedaudit_log_insertreads BOTH GUCs unconditionally (so a future legitimate compounding case writes both columns correctly without another schema change), but the runtime guards prevent the case from arising today. - No
acting_as_patient_idGUC plumbing. Locked design: simple authorship means no rebind helper. The foundation primitive is bounded by what consumers will actually see — no speculation against an F3/F5 model that we've explicitly chosen not to build. closed_atusesclock_timestamp(), notNOW(). NOW() returns transaction-start; same-tx Open+CloseSelf paths (some test/retry scenarios) would makeclosed_at < opened_atand violate the CHECK. clock_timestamp() returns wall-clock at statement time.
Schema (shipped in 000013):
- [x]
patient_impersonation_sessions—(id, staff_principal_id, target_patient_id, organization_id, reason, opened_at, expires_at, closed_at, closed_by_principal_id).target_patient_idFKspatients(id)(per-org row);organization_iddenormalized for RLS efficiency, mirroringpatient_subscriptions.reasonis free-text with 10-char trimmed-length floor.expires_atenforced ≤ opened_at + 4h by CHECK. Partial unique on(staff_principal_id, organization_id) WHERE closed_at IS NULL; supporting indexes for org-recent / staff-recent / target-recent / active-set / FK-target. - [x]
audit_log.impersonation_id UUID NULL— reserved in 1A.12 (000001). 000013 redefinesaudit_log_insertto populate it from the GUC (idempotent CREATE OR REPLACE on top of the 000011 break-glass redefinition; signature unchanged so all existing callers remain valid). - [x] Data classification entries for every column registered in data-classification.md → Patient impersonation sessions. Reason ships
pii_basic+support_export-only (operator free-text may carry clinical context).
Permissions + middleware (shipped):
- [x]
patients.impersonatepermission seeded inline in 000013; granted by default toadminandcustomer_supportsystem role templates (specialist deliberately excluded — clinical role, not service role). Custom roles can be granted via the role editor (1D.2) once that ships. Go constant:principal.PermPatientsImpersonate. - [x]
RequireImpersonation(svc, paramName)middleware factory: reads URL{paramName}for the target org id, looks up the active session viaService.ActiveFor, returns 403impersonation_required(no session) or 410impersonation_expired(session expired but not closed). Active match binds the session GUC + attaches the session id to context viaImpersonationSessionIDFromContext. No scope argument — single-permission primitive. - [x] Open endpoint:
POST /v1/organizations/{org_id}/patient-impersonation-sessions— body{patient_id, reason, expires_in_minutes}. Service validatesSubject.HasPermission('patients.impersonate'); cross-context exclusion check; partial-unique-index conflict returns existing session. Per-principal rate-limited viaRATELIMIT_PATIENT_IMPERSONATION_OPEN_LIMIT(default 5/min). Audit-logged withaction_context = 'impersonation'+impersonation_id. - [x] Close endpoint:
POST /v1/organizations/{org_id}/patient-impersonation-sessions/{id}/close. Self-close path runs through the request tx with thepatient_impersonation_update_selfRLS policy; manage-close path checkssubj.HasPermission('patients.manage')and runs throughpatient_impersonation_update_manage. Both audit-logged withentity_type = 'patient_impersonation_session'. - [x]
GET /v1/organizations/{org_id}/patient-impersonation-sessions[?staff_principal_id=&patient_id=&only_active=&limit=&offset=]— RLS-scoped read (staff sees own; org members withpatients.managesee all org sessions).GET .../{sessionId}for detail. - [x]
GET /v1/me/patient-impersonation-sessions[?organization_id=&only_active=&limit=&offset=]— patient access history. RLS self-read onpatient_impersonation_sessionscascades throughcurrent_human_patient_profile_ids()to span every clinic the patient is at; the cross-org account surface (1D.5) consumes the unfiltered shape.
Audit attribution (shipped):
- [x]
actor_idis the staff principal, not the patient. Per the locked simple-authorship decision above. Audit row carriesaction_context = 'impersonation'+impersonation_id = <session.id>. Locked end-to-end by TestImpersonation_Middleware_GatesProtectedRouteAndStampsAudit.
Tests (shipped, 14 cases):
- [x] impersonation_test.go — happy-path open + audit attribution; validation errors (5 sub-cases); permission denied for specialist; patient-must-be-at-org boundary; idempotent same-(staff, org) returns existing; "one thing at a time" (second-patient open returns existing session, doesn't create new); lazy expiry finalization; CloseSelf attribution; CloseSelf re-close (already_closed); CloseManaged by org admin holding
patients.manage; RLS patient-self-read; RLS other-staff-cannot-see; RLS deny-other-org-patient; cross-context exclusion (impersonation → break-glass blocked, and break-glass → impersonation blocked); end-to-end middleware gating + audit-attribution validation.
Open follow-ups (UI):
- [ ] 1D.2 Clinic admin oversight — per-org "Staff impersonation oversight" view in 1D.2 — list of all sessions across the clinic, filterable by staff member, patient, date range. Same DataTable foundation as the audit log viewer (1D.4). Gated by
patients.manage(already exists). Per-patient impersonation history shown alongside the per-patient consents view. Backend contract is stable — no Go changes needed when this lands. - [ ] 1D.3 Patient access history — per-clinic list of staff impersonation sessions on this patient, served via
GET /v1/me/patient-impersonation-sessions. Shows who opened it, when, the reason, duration. Foundation-tier scope is session metadata only; per-action drill-down ("what entities they touched") is deferred to the patient_account_activity projection (Deferred Foundation Extensions).
Differences from break-glass (deliberate):
| Break-glass (1B.11) | Patient impersonation (1B.13) | |
|---|---|---|
| Who initiates | Platform staff | Clinic staff |
| Whose data they access | Cross-tenant patient data (any clinic) | One specific patient at their own clinic |
Audit action_context | 'break_glass' | 'impersonation' |
| Linking column on audit_log | break_glass_id | impersonation_id |
| Who gets notified | Clinic admin (always-on email) | Patient (always-recorded in access history; no real-time email in v1) |
| Permission grants | Per-scope (break_glass.patient_list, audit_full, etc.) | Single permission (patients.impersonate) |
| Cross-tenant? | Yes — explicit cross-tenant primitive | No — single-clinic primitive |
| Controllership concern? | Yes — processor boundary | No — within clinic's controllership |
| Writer pool | AdminPool (REVOKE on AppPool) | AppPool + RLS WITH CHECK |
| Authorship semantics | N/A (platform staff only access; doesn't write tenant data routinely) | Simple — actor = current_principal_id() always |
Why break-glass kept granular permissions but impersonation didn't. Break-glass crosses the platform↔clinic trust boundary; clinics genuinely care about "your support engineers can ONLY view audit logs, never patient detail" and procurement reviewers ask about it. Impersonation lives inside one clinic's trust boundary — the clinic already gave the staff member patients.manage; layering finer impersonation-scope permissions on top of that adds no security.
1B.14 Locations & Multi-Site Support
A clinic (organization) may operate at one or more physical locations. Locations are a logistics layer on top of org-scoped tenancy — they partition appointments, schedules, and availability operationally without fragmenting permissions, consents, or patient identity. Lands in foundation because every clinical entity that ships in Layer 2 (specialists, calendars, appointments) needs
location_idfrom day one; retrofitting is a cross-cutting backfill the foundation discipline rule exists to prevent.Design note. This is intentionally minimal. Org stays the trust boundary, entitlements stay org-wide, RBAC stays org-wide ("all staff see all locations"). The only thing locations partition is physical operations — where a specialist physically is at a given moment, where an appointment happens. See patterns.md P40 for the full pattern and the deliberate non-goals.
Status: backend complete. Schema (000014), RLS policies (SELECT for org members + INSERT/UPDATE/DELETE WITH CHECK gated by current_app_has_permission('locations','manage')), locations.manage permission seeded with admin-only grant, full domain (model/errors/repo/service/handler), routes mounted under per-org group with RequireURLOrgMatchesScope("id") (P47) + RequirePermission on mutations, and 13 RLS integration tests all shipped. Only the 1D.1 / 1D.2 UI surfaces remain — those belong in the unified UI pass per CLAUDE.md "UI deferred until foundation locked" stance.
Decisions locked during implementation:
closedis terminal. Service rejects any transition out ofclosed(active/inactive→ 409closed_terminal). Re-opening "the same place" later means creating a new row — preserves audit trail clarity. Theinactivestatus covers the temporary "renovation, lease pending, seasonal closure" case where the site will resume operations. CHECK constraintchk_locations_closed_at_consistencypinsclosed_atnon-NULL iffstatus='closed'as a structural backstop.- Slug is mutable, normalised at the service layer. Unlike
organizations.slug(which lives in DNS hostnames and is immutable), location slugs only appear in deep paths like/locations/main-floor. Renaming costs at most a 404 on a stale bookmark; FKs use UUIDs. The validator auto-lowercases + trims (matching the org-domain normalisation pattern atorganization/service.go:96); the regex enforces^[a-z0-9]+(-[a-z0-9]+)*$(no leading/trailing/double hyphens, no underscores, no dots). - Country as free TEXT, no ISO 3166-1 enforcement at this layer. Romania-launch context makes ISO-2 tempting, but the constraint can be added later non-breakingly when a UI form renders a country picker. Service-layer normalisation only trims whitespace.
- AppPool only — no admin-pool surface. Locations are entirely org-scoped: there is no cross-tenant or pre-membership write path comparable to
patients.CreateAdmin(portal onboarding) orbreakglass. The repository skips the dual-pool plumbing. - DELETE exposed but discouraged. The route + RLS allow hard-delete for the rare "created in error, never used" case. The canonical retire-a-location flow is
PATCH ... {status: "closed"}. Once Layer 2 ships and FKs referencelocations(id)(specialist_locations / calendars / appointments), DELETE will RESTRICT naturally on dependent rows — that's the intended steady-state behavior.
Schema (shipped in 000014):
- [x]
locations—(id, organization_id, slug, name, timezone NULL, phone NULL, email NULL, address_line1 NULL, address_line2 NULL, city NULL, county NULL, postal_code NULL, country NULL, status DEFAULT 'active', closed_at NULL, created_at, updated_at).status IN ('active', 'inactive', 'closed')enforced by CHECK. Address fields structured (never freeform). Unique(organization_id, slug)viauq_locations_org_slug. Standardset_updated_attrigger. - [x]
organization_settings.default_timezone— already shipped in 000003 with explicit reference to 1B.14's P23 chain. Resolution chain (location.timezone → specialist.scheduling_timezone → org.default_timezone → platform default) closes layer 3 here; layer 2 lights up with F4. - [x] Address class registration.
locations.address_line1/2/city/county/postal_code/countryregistered aspii_basicwithsupport_exportegress in data-classification.md.locations.timezone/phone/email/statusregistered asorg_internal. Public-face fields (name,slug) registered aspublic. Nobulk_exportegress — locations are clinic operational data, not patient-export data. - [x] Indexes.
idx_locations_org ON locations(organization_id); partialidx_locations_active ON locations(organization_id) WHERE status = 'active'(booking flows + specialist availability pickers only ever care about active).
Permissions + middleware (shipped):
- [x]
locations.managepermission seeded inline in 000014; granted by default toadminsystem role template only (specialist + customer_support deliberately excluded). Custom roles can be granted via the role editor (1D.2) once that ships. Go constant:principal.PermLocationsManage. TS mirror:PERM_LOCATIONS_MANAGEinpackages/api-client/src/permissions.ts. - [x] Routes:
GET /v1/organizations/{id}/locations[?status=&page=&limit=],POST .../locations,GET .../locations/{locationId},PATCH .../locations/{locationId},DELETE .../locations/{locationId}. All mounted under the per-org route group withRequireURLOrgMatchesScope("id")(P47) +RequirePermission(PermLocationsManage)on mutations. List + Get inherit the org membership SELECT policy.
RLS (shipped):
- [x]
locationspolicy: SELECT for org members (organization_id = current_app_org_id()). INSERT/UPDATE/DELETE gated bycurrent_app_has_permission('locations', 'manage')in WITH CHECK. No per-location RLS helper (current_app_location_ids()deliberately not added) — staff see all locations within their org. Per-location scoping is a future ADR if a customer requires it.
Tests (shipped, 13 cases):
- [x] locations_test.go — happy-path Create + audit-attribution; permission-denied for specialist (service-layer); RLS deny INSERT for specialist (repo-direct, defense in depth at DB); slug uniqueness per org; slug collision across orgs allowed; closed is terminal (active→closed→active blocked); active↔inactive roundtrip with
closed_atstaying NULL; RLS cross-org SELECT denied; List scoped to org + status filter (cross-org bleed prevented); invalid slug shapes rejected (8 sub-cases); Delete by admin happy path; PATCH no-op detected asbefore == after(handler skips audit row); PATCH explicitly clearing optional column ({phone: null}) sets the DB column to NULL.
Forward-binding (no schema yet — locks the contract for Layer 2):
- [ ] When
specialistsships (F4), it addsspecialists.scheduling_timezone TEXT NULL(IANA fallback layer 2 in P23). - [ ] When
specialistsships, it adds the join tablespecialist_locations(specialist_id, location_id, created_at)— many-to-many with composite PK. - [ ] When
specialist_weekly_hoursships (F4), it addslocation_id UUID NULL FK → locations(id)— NULL means remote/telerehab availability. - [ ] When
specialist_schedule_overridesships (F4), same convention —location_id UUID NULL. - [ ] When
calendarsships (F4), it addslocation_id UUID NULL FK → locations(id)— NULL means org-level virtual (telerehab) calendar. - [ ] When
appointmentsships (F5), it addslocation_id UUID NULL FK → locations(id)— NULL means remote / telerehab session. - [ ] Single-true-availability invariant (enforced when
specialist_weekly_hoursships): DB-levelEXCLUDE USING gistconstraint on(specialist_id, day_of_week, time-range)so no two windows for the same specialist may overlap, regardless oflocation_id. A specialist physically cannot be in two places at once. Same constraint shape onspecialist_schedule_overrides. Locations partition the labels on availability, never the availability itself.
Open follow-ups (UI):
- [ ] 1D.1 Console org-detail page — locations list per org with CRUD (gated by
locations.manage). Same DataTable foundation as members / domains. Backend contract is stable — no Go changes needed when this lands. End-to-end wiring is verified by the 11 RLS integration tests above; the UI consumer was deferred to the unified UI pass per CLAUDE.md "UI deferred until foundation locked" stance. - [ ] 1D.2 Clinic admin "Locations" page — same CRUD scoped to the admin's own org. Same DataTable foundation as members / domains / roles.
- [ ] Patient Portal does not surface locations until F4 booking ships — there is nothing useful to show in foundation (no calendars, no appointments).
Org with zero locations — a pure-telerehab clinic operates with no locations rows. UI skips the location picker; appointment creation accepts location_id = NULL. No "Virtual" placeholder row is ever auto-created.
Org with one location — UI auto-picks the only active location at booking time; no picker shown. Schema-wise indistinguishable from the multi-location case.
Deliberate non-goals (recorded so they don't creep in):
- No per-location entitlements (entitlements stay org-wide on
organization_entitlements). - No per-location billing or pricing.
- No
services_per_locationtable — service catalog stays org-wide. - No inter-location transfer workflow — point the next appointment at the other location.
- No room / facility booking. Telerehab is mostly remote; rooms are a future concern if a clinic asks.
- No
current_app_location_ids()RLS helper — org-scoping is the only RLS dimension.
1C. Capabilities, Integrations & Metering
Cross-cutting infrastructure for how the platform composes capabilities, reaches external systems, and measures consumption. Locks the conventions every feature consumes from Layer 2 onwards. The taxonomy used here lives in glossary.md.
Why a new sub-phase. 1A laid the runtime foundations. 1B locked the identity, tenancy, and entitlement primitives. 1C connects them outward: the capability convention so every feature builds on stable interfaces, the four integration categories so every external touchpoint has a designated home, the metering seam so every metered call is captured before billing exists, and the AI hookup so the platform's stated agent direction has a load-bearing path. Without 1C, Layer 2 features would each invent their own external-call shape and we'd retrofit through every one — exactly what foundation discipline exists to prevent.
Scope discipline. Each sub-phase ships the primitive and (where relevant) one Day-1 consumer. Subsequent consumers ride the same primitive without schema or contract change. Foundation discipline applies — primitive design must be correct now; mechanical extension is fine later. See glossary.md for category definitions (Cat A Curated Provider, Cat B Connected Account, Cat C Outbound Webhook Subscription, Cat D Inbound Webhook, Cat E Internal Event, Cat F External API Access via service-account auth — operational flow deferred to its own sub-phase, schema lives in
service_accountsfrom 1B.1).
Build order inside 1C.
1C.1 Capability Framework (foundational — every other 1C sub-phase consumes the convention)
↓
1C.2 Curated Providers (Cat A) (extends 1A.18's Channel pattern; provider-resolution table for per-tenant brand-isolation readiness)
1C.3 Internal Events Registry (Cat E) (parallel — typed registry over 1A.9 events.Bus)
↓
1C.4 Outbound Webhook Subscriptions (Cat C — Day-1 Make.com consumer)
1C.5 Connected Accounts (Cat B) (parallel — table + framework; OAuth deferred to first OAuth consumer)
1C.6 Inbound Webhook Framework (Cat D) (parallel — convention; Daily.co recording handler is the first impl)
↓
1C.7 Metering & Quotas (depends on 1C.2 — the seam where capability calls get measured + capped)
1C.8 AI Capability Hooks (depends on 1C.2 + 1C.7 — special-case Curated Provider with mandatory metering + provenance)
↓
1C.9 Entitlements Rename (mechanical sweep — independent but cleans the vocabulary used by all of 1C)1C.1 Capability Framework
Convention for declaring internal capabilities (named for what they do, not for the provider that implements them) and composing the standard cross-cutting concerns (permission, quota, provider resolution, audit, metering, error classification) around them via wrappers. Every capability ships with the right concerns for free, picked from a small menu of templates.
Status: design locked 2026-05-06; skeleton shipped 2026-05-06 — internal/core/capabilities with the four wrap helpers, sentinel error taxonomy, README + tests, billing capability skeletons (payment / invoicing / patient_payment / payout), cmd/check-capabilities CI guard wired into make check, P50 docs, notify async-outbox exception documented. Resolve / meter / audit hooks remain pluggable seams that 1C.2, 1C.7, and 1C.8 fill in; second real-world consumer rides on 1C.4 (webhook.Deliverer).
What "capability" means here. See glossary.md → Capability. One Go interface, one bounded responsibility, switchable implementation. Examples: email.Channel (shipped, 1A.18), video.Provider (Daily.co today), pdf.Renderer (internal library), ai.LLM (future), webhook.Deliverer (1C.4), calendar.Sync (Cat B / future), sms.Channel (future), payment.Provider + invoicing.Provider (declared at foundation; implementations in F12 — Stripe + FGO for Romania), patient_payment.Provider + clinic_payout.Provider (declared at foundation; implementations in the future marketplace mediation feature — Stripe Connect or equivalent).
Locked decisions:
Composition shape: functional with one helper per implementation-strategy category. Four templates:
capabilities.WrapMeteredProvider(impl, name, perm)for metered Curated Provider / Cat A (email, SMS, video, AI text gen — every call costs the platform money and the org is metered for it);capabilities.WrapProvider(impl, name, perm)for unmetered Curated Provider / Cat A (auth, storage — usage cost is bundled in platform fees);capabilities.WrapOutbound(impl, name)for Outbound Webhook / Cat C delivery (no quota meter, no provider resolution, permission lives at subscription create time not delivery time);capabilities.WrapInternal(impl)for Internal Library (mostly a no-op forwarder; permission and audit happen at the calling layer above). Functional composition (vs. struct decorators) chosen because the stack is fixed per category — no runtime customization needed; decorators speculate against unknown future flexibility.Provider selection: per-call resolver for Cat A, wiring-time for everything else. Cat A capabilities go through 1C.2's
platform_service_providersresolver at every call (logically per-call, physically cached aggressively per(org_id, capability)with ~5min TTL invalidating on the row'supdated_at). The cache-aside pattern from P45 applies — first call per(org, capability)per Core API instance pays the lookup; subsequent calls within TTL pay nothing. Per-call (not startup-only) is required for multi-tenant + per-tenant brand-isolation readiness — startup-only would force one org's provider for the whole process. Non-Cat A capabilities (Internal Library, Cat C outbound) wire their impl at startup incmd/api/main.go.Test-double convention:
Fake{Capability}in same package as interface. Standardized hand-written fake (1A.18'sFakeChannelis the canonical reference). Tests inject the fake at construction time; assertions read accumulated state via accessor methods (f.Sends()etc.). Mocks (gomock) explicitly rejected — verbose, brittle, refactor-fragile. Real-service test layer (smoke / E2E against real SES / Daily.co / Anthropic) is per-capability strategy decided when that capability ships; the foundation pattern is the FakeChannel-based integration test from 1A.18, and the real-service smoke layer lands as part of 1E AWS staging.notify.Channel(1A.18) audit + dispatcher exception.notify.Channelinterface andnotify.FakeChannelalready conform to the codified convention — keep as-is. The notification dispatcher itself is a documented EXCEPTION because it's an async outbox pattern, not a synchronous capability call (worker polls deliveries, ships at its own pace, dispatches across multiple channel adapters). Document the exception in patterns.md alongside the convention. Real rewires fornotify.email(env-config → resolver migration in 1C.2; metering hookup in 1C.7) land in those sub-phases' PRs, not in 1C.1.Wrapper ordering (locked once for every Cat A metered capability): permission → quota → resolve → audit (before provider call) → provider call → meter (after success only) → error classification wraps the whole chain. Audit-before means failed calls are auditable with status code; meter-after-success means failures don't burn quota. Matches the existing 1B.5 four-gate model + 1A.1 audit + 1B.5 quota semantics.
Principal-type-agnostic. The wrapper stack treats all principal types the same — humans, agents, service_accounts (Cat F), and the system principal all flow through identical wrappers. Audit attribution carries
actor_id+actor_type; permissions / quotas / metering operate on org-scope without consulting actor type. This property is essential — accidentally hardcoding "human" assumptions in any wrapper would break Cat F service-account API calls and autonomous-agent calls when those flows light up. See glossary → Principal-type-agnostic primitive.
Wrapper-content matrix by category:
| Category | Permission | Quota | Resolve | Provider call | Meter | Audit | Errors |
|---|---|---|---|---|---|---|---|
| Cat A metered (email, SMS, video, AI text gen) | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
| Cat A unmetered (auth via Clerk, storage via S3) | ✓ | — | ✓ | ✓ | — | ✓ | ✓ |
| Cat C outbound (webhook delivery) | — | — | — | ✓ | — | ✓ | ✓ |
| Internal Library (pdf, signing, encryption) | — | — | — | ✓ | — | — | ✓ |
Implementation order inside 1C.1:
- [x]
internal/core/capabilities/package with the fourWrap*helpers as functional composition. Skeleton at 1C.1 close — permission and quota gates wire against existingprincipal.Subjecthelpers;resolveProvideris a no-op forwarder until 1C.2 registers a resolver viacapabilities.SetResolveFunc;meterAfterSuccessis a no-op forwarder until 1C.7 registers a meter viacapabilities.SetMeterFunc;auditCallfires per-capabilityAuditFunchook (nil = no-op). Sentinel error taxonomy (ErrUnauthenticated,ErrPermissionDenied,ErrQuotaExceeded,ErrProviderUnavailable,ErrTransient,ErrPermanent) is fully wired at 1C.1. - [x]
Fake{Capability}documentation asinternal/core/capabilities/README.mdwithnotify.FakeChannelas the canonical reference shape, plus a Cat A wiring example (ai.textshape) and the multi-method capability section (payment.Providershape). - [x] Audited
notify.Channelagainst the codified convention. Notify keeps its current shape; dispatcher async-outbox exception documented innotify/doc.gowith rationale (producer-side gates, consumer-side state machine, no per-call audit row, future meter hooks the dispatcher's success path directly). - [x] patterns.md P50 — Capability Convention documenting: the four implementation-strategy categories with examples, the four wrap helpers, the locked wrapper-stack ordering (
permission → quota → resolve → audit → provider → meter → errors), the principal-type-agnostic property, theFake{Capability}test-double convention, and the load-bearing notify carve-out. - [x] CI guard
cmd/check-capabilitieswired intomake check. Verifies (1) the four wrap helpers exist ininternal/core/capabilities/capabilities.go; (2) every package underinternal/core/{name}/that looks like a capability (one-method interface +Fake*struct) is either wired through one of the wrap helpers OR allow-listed with a documented rationale (notify is the only allow-list entry today). Loose by design at 1C.1; 1C.2 tightens to per-package matching as Cat A capabilities migrate to the resolver. - [x] Billing capability skeletons declared at foundation:
internal/core/billing/payment/(Providerinterface withCreateCustomer,CreateSubscription,Charge,Refund,HandleWebhook);internal/core/billing/invoicing/(Providerinterface withIssueInvoice,DeliverInvoice,RegisterWithAuthority);internal/core/billing/patient_payment/(Providerinterface —Charge,Refund,HandleWebhookfor marketplace mediation);internal/core/billing/payout/(Providerinterface —ConnectAccount,InitiatePayout,GetBalance). All skeletons; no implementations. F12 shipspayment.Provider+invoicing.Providerimpls; marketplace mediation feature shipspatient_payment.Provider+payout.Providerimpls.
Acceptance:
- [x] Composition shape locked: functional with four category helpers.
- [x] Provider selection locked: per-call resolver for Cat A; wiring-time for non-Cat A.
- [x] Test-double convention locked:
Fake{Capability}in same package as interface. - [x] Wrapper ordering locked: permission → quota → resolve → audit → provider → meter → errors.
- [x]
notify.Channelaudit conclusion: keeps current shape; dispatcher documented as async-outbox exception. - [x]
internal/core/capabilities/package with fourWrap*helpers (skeleton form — fills out as 1C.2 / 1C.7 ship). - [x] patterns.md P50 documentation.
- [x]
cmd/check-capabilitiesCI guard. - [ ] One additional capability (likely
webhook.Delivererfrom 1C.4) ships against the convention to prove it with a second consumer. Deferred to 1C.4 — synthetic Cat A consumer incapabilities_test.goexercises the wrap stack today; the second real-world consumer lands when webhook.Deliverer ships.
1C.2 Curated Providers (Cat A) + Provider Resolution
Cat A is the implementation strategy where a capability calls an external API with platform-owned credentials. The provider name (Daily.co, SES, Twilio, Anthropic, Stripe) is implementation detail; the capability interface is what callers see. The
platform_service_providersresolution table holds platform-default credentials and per-org brand-isolation overrides; per-call resolver with aggressive caching is the runtime path.
Status: design locked 2026-05-06; shipped 2026-05-06 — migration 000015_platform_service_providers + internal/core/providers (resolver, register, bootstrap, healthcheck) + cmd/check-providers (cron) + cmd/check-cata-resolution (CI guard, enforcing) + Console superadmin endpoints under /v1/admin/platform-service-providers + apps/docs/reference/credential-rotation.md + acceptance test in internal/test/rlstest/provider_resolver_test.go (full lifecycle: default→override→fail-loud→healthcheck→repair). All three current Cat A capabilities (notify.email, S3, auth.clerk) wired through the resolver: email per-call, storage + auth startup-only-resolved + singleton-installed per package SDK constraints.
Why a provider-resolution table at foundation, not when the first clinic asks for per-tenant brand isolation. Foundation work designs for stated platform direction. Per-tenant brand isolation IS platform direction (CLAUDE.md "two tenancy modes" + per-tenant Cat A overrides on either mode). The first paying clinic may demand brand isolation, requiring isolated platform-managed accounts (separate sender domain, separate video account, possibly separate AI account). Without the resolution table at foundation, every Cat A call site needs retrofit later. With it, the override path exists and the table seeds with platform-default rows; per-org override rows get added later without schema change.
Locked decisions:
- Initial scope: migrate all three current Cat A capabilities at 1C.2 close —
notify.email(1A.18 SES),internal/integration/s3/(1A.8 storage), andauth.clerk(auth verifier abstraction is already provider-agnostic, just swap credentials source from env to resolver). Same migration shape for each (~10–50 lines per capability). All future Cat A capabilities (Twilio SMS, Daily.co video, Anthropic AI, Stripe payments, future providers) wire through the resolver from Day 1 — never read env directly. - Failure mode: fail loud on broken per-org override. If a clinic's brand-isolation override row exists but credentials don't decrypt or are rejected by the provider, the call fails with
502 provider_unavailablerather than silently falling back to the platform default. Silent fallback would break the brand-isolation contract (clinic thinks they're sending from their domain; actually goes from platform default). The healthcheck cron is the early-warning system that catches broken rows before traffic hits. - Healthcheck:
cmd/check-providerscron. Runs at deploy time + every 5 min in staging / every 1 min in prod (configurable). Walks every row inplatform_service_providers, decrypts credentials via 1A.3, optionally pings the provider with a no-op (SESGetSendQuota, S3HeadBucket, etc.). Marks broken rows asstatus='error'withlast_error_at+last_error. The transitionactive→erroris audit-logged (state change attributed to system principal). - Audit treatment. State changes to
platform_service_providers(CREATE / UPDATE / DELETE via Console superadmin endpoints) audit at the app layer with full diff; healthcheck transitions audit; runtime resolution lookups DO NOT audit (operational metadata, would generate millions of rows/day per CLAUDE.md exempt rule). Runtime failures slog at ERROR level withorg_id,capability,provider_name,error_class,request_idfor tracing via the telemetry sink. App-layer audit only — no DB-trigger audit backstop at foundation (AdminPool REVOKE + superadmin gating is the access control floor; defense-in-depth via DB triggers can be added later if compliance requires). - Rotation runbook. Documented per-provider in
apps/docs/reference/credential-rotation.md(new doc). Standard flow: (1) generate new credentials at provider; (2) superadmin updates the row in Console (server encrypts, bumpsupdated_at); (3) wait ~10 min for cache TTL to expire across the fleet; (4) revoke old credentials at provider. Zero downtime achievable when the provider supports two valid credential sets simultaneously (AWS IAM, Stripe, Twilio, SES — all do). Foundation 1C.2 ships the runbook for SES/email; subsequent providers add their section as they ship.
Schema (locked):
- [x]
platform_service_providers—(id UUID PK, capability TEXT NOT NULL, organization_id UUID NULL FK → organizations(id), provider_name TEXT NOT NULL, credentials_encrypted BYTEA NOT NULL, config JSONB NOT NULL DEFAULT '{}', status TEXT NOT NULL DEFAULT 'active' CHECK (status IN ('active', 'inactive', 'error')), last_error_at TIMESTAMPTZ NULL, last_error TEXT NULL, last_health_check_at TIMESTAMPTZ NULL, created_at, updated_at). CHECK constraint onprovider_nameper capability (today: email→ses, storage→aws_s3, auth→clerk; extends per migration as new providers ship). Nois_defaultcolumn — derived fromorganization_id IS NULL. Partial unique(capability) WHERE organization_id IS NULLfor one platform default per capability. Partial unique(capability, organization_id) WHERE organization_id IS NOT NULLfor one override per (capability, org). CHECK enforcing(status='error') ⇔ (last_error_at + last_error populated). - [x] RLS: AdminPool only — RLS enabled with no SELECT policy + REVOKE SELECT/INSERT/UPDATE/DELETE/TRUNCATE FROM
restartix_app(double-deny mirroringaudit_log,notifications). Console superadmin endpoints write through AdminPool. - [x] Data classification entries shipped:
credentials_encrypted→auth_secret(no egress);provider_name/capability/status/last_error*/last_health_check_at/config→org_internalwithsupport_export;id/organization_id/created_at/updated_at→system_metadatawithsupport_export. - [x] Platform permission
principal.PlatformPermProvidersManage("providers.manage") shipped — Go-only constant, superadmin-only by default (no per-org RBAC row); the platform-permissions layer is the gate.
Note on organization_billing.payment_provider. The plan also called for dropping an enum constraint on this column. Inspection of 000003_org_settings.up.sql showed the column is already plain TEXT with no CHECK constraint (the comment lists 'manual' | 'stripe' | 'chargebee' as forward-compat hints, not as an enforced enum). No migration change needed; F12 will replace this whole shape when the billing engine ships.
Capability-resolution flow (runtime):
Cat A capability call (e.g., email.Channel.Send)
↓
WrapMeteredProvider / WrapProvider helper (1C.1)
↓
Resolver.Resolve(ctx, capability='email', org=ctx.OrgID)
↓
Cache lookup (key: (capability, org_id), TTL ~5min)
├─ Cache hit → return cached impl (~µs)
└─ Cache miss
↓
SELECT FROM platform_service_providers
WHERE capability = $1
AND (organization_id = $2 OR organization_id IS NULL)
AND status = 'active'
ORDER BY organization_id NULLS LAST -- prefer org-specific
LIMIT 1
↓
Row found?
No → 502 provider_unavailable (no platform default!)
Yes → If org-specific row's status='error' → 502 (fail loud per locked decision)
→ Decrypt credentials_encrypted via 1A.3
→ Instantiate provider impl (e.g., ses.NewClient(creds, config))
→ Cache (impl, snapshot of updated_at)
→ Return impl
Cache invalidation:
Pull-based — at lookup, query SELECT updated_at FROM platform_service_providers WHERE id = $cached_id;
if updated_at != cached snapshot → cache miss path.
Cheap (single indexed column read).Implementation order inside 1C.2:
- [x] Migration creating
platform_service_providerswith RLS + data-classification entries (000015_platform_service_providers). The platform permission lives in Go (principal.PlatformPermProvidersManage), not in the migration. Theorganization_billing.payment_provider"drop the enum constraint" item turned out to be a no-op — the column was already plain TEXT (see the schema note above). - [x]
internal/core/providers/package:Resolver+ genericRegister[T]typed lookup + per-instance TTL cache +Bootstrapenv-seed helper +Healthcheck/HealthcheckAllprimitives.cmd/check-providerscron binary uses the same primitives. - [x] Migrate three existing Cat A capabilities to use the resolver:
notify.email(per-call lookup viaemail.Lookupreturned fromproviders.Register[*EmailProvider]);internal/integration/s3/(factory +s3.UseProvidersingleton install at startup; per-call refactor deferred to first S3 consumer's PR);auth.clerk(factory +clerk.UseProvidercalls SDK's process-globalSetKey; per-call non-applicable — auth verification runs before org context).cmd/api/main.gobootstraps platform-default rows from env viaproviders.Bootstrap(idempotent ON CONFLICT DO NOTHING) so behavior is identical post-migration; env vars become non-load-bearing once the row exists. - [x] Console superadmin endpoints (mounted under
/v1/admin/platform-service-providers):GET /(list, optional?capability=),POST /(create),PATCH /{id}(update),DELETE /{id}(hard delete),POST /{id}/test(on-demand healthcheck). Service holds*providers.Resolverand callsInvalidateafter every mutation. Audit shipped with credentials field redacted. Console UI deferred to 1D. - [x] P31 — already scoped to Cat B with a Cat A carve-out pointing at 1C.2.
- [x] CI guard
cmd/check-cata-resolution: walks the three Cat A package dirs, flags references to credential-bearing env-config field names. Wired intomake check; enforcing at 1C.2 close (allow-list empty). - [x]
apps/docs/reference/credential-rotation.mdshipped — full SES/email runbook, stub sections for storage/auth + future providers. - [x] Acceptance test extending the setup-clinic suite:
internal/test/rlstest/provider_resolver_test.goexercises full lifecycle (default → override → fail-loud on inactive/corrupt override → healthcheck flips toerror→ repair → healthy).
Acceptance:
- [x] Initial scope locked: all three current Cat A capabilities (email, storage, auth) migrate at 1C.2 close; future Cat A wires through resolver from Day 1.
- [x] Failure mode locked: fail loud on broken per-org override; healthcheck cron is the early-warning safety net.
- [x] Audit treatment locked: state changes audit, runtime lookups don't, healthcheck transitions audit, runtime failures slog ERROR with full attribution.
- [x] Rotation runbook locked: generate → update → ~10min cache window → revoke; documented in credential-rotation.md.
- [x] Schema locked:
platform_service_providerswith NULL org for default + specific org for override; partial uniques; status enum; provider_name TEXT with per-capability CHECK. - [x] Migration + RLS shipped.
- [x] Resolver package + cache + healthcheck primitives +
cmd/check-providersbinary shipped. - [x] Three existing Cat A capabilities migrated; platform-default rows bootstrap from env at startup.
- [x] Console superadmin endpoints shipped (UI deferred).
- [x] CI guard (enforcing) + credential-rotation doc + acceptance test shipped.
1C.3 Internal Events Registry (Cat E)
1A.9 already ships the in-process events.Bus. 1C.3 adds a typed registry that is the single source of truth for event types and their payload schemas. Webhook subscriptions, automation triggers, and notification dispatcher all reference one registry — preventing the drift the audit found (three documents describing the same stream).
Status: design locked 2026-05-06; shipped 2026-05-07 — internal/core/events/registry.go with EventDef (Name + ResourceType + Description + Layer + typed Payload + DeprecatedAt + ReplacedBy), Register / Lookup / All, JSONSchemaOf reflection-based schema generator, and PublishWith typed-payload publish helper that validates payload type against the registry. Per-domain events.go files for organization (8 events) and portalonboarding (1 event) declare typed payload structs and register via init(); the existing 9 publish sites migrated to PublishWith. cmd/dump-events-registry emits JSON or Markdown; cmd/check-events-registry replaces the older cmd/check-events and validates (a) every events.Type constant has a matching Register call and (b) the committed _generated/events-catalog.md is in sync with the registry. The catalog is auto-generated at apps/docs/architecture/_generated/events-catalog.md, included into the architecture events doc via VitePress <!--@include: -->. P51 documented in patterns.md. Hand-edited rows for Layer 2+ events that have no publisher yet were dropped — the registry is the source of truth and grows feature-by-feature.
Locked decisions:
- Payload schema location: Go struct as source of truth + JSON schema generated. Each domain package declares its events as Go structs (already done implicitly today).
cmd/gen-event-schemasderives JSON schemas via reflection + struct tags for: (a) the future automation engine UI (F8) which needs JSON schema to render trigger configurators, (b) the webhook docs which auto-render payload shape per event, (c) any external consumer that needs a typed contract. One source of truth + codegen mirrors the existing OpenAPI pattern. - Retired events: keep in registry with
deprecated_at+replaced_by. Events are public contract for clinic webhook subscribers and automation triggers — can't silently break. A deprecated event keeps publishing during a grace period; the registry entry surfaces "deprecated" in webhook UI and docs; clinics with subscriptions on the old event get a notice to migrate. After grace period, the registry entry stays (history) but the publish call is removed and the event no longer fires. - Registry has NO per-event fan-out controls. Earlier draft proposed per-event audit/webhook/notification toggles; over-engineered against actual needs. Subscribers decide what they consume, not events. Audit subscriber consumes everything (universal sink). Notification dispatcher consumes events with a registered notification handler (1A.18's
notify.categoriesmap is already the subscription mechanism). Webhook dispatcher (1C.4) consumes events matching each subscription'sevent_filtersarray (per-subscription filtering). Automations engine (F8) consumes events its rules subscribe to. Each subscriber owns its own consumption logic. Registry per-event is just{Name, PayloadType, Classification (doc/UX hint only), DeprecatedAt, ReplacedBy}. - Distributed ownership with central registration. Each domain package (
appointments,patients,consents,invites,breakglass,impersonation,legaldocument, ...) declares its events inevents.goand registers them viainit(). The events package becomes a thin coordinator that auto-discovers via init-time registration. Single source of truth for "what events exist" = grep all domain packages OR runcmd/dump-events-registry. Matches the existing platform pattern (each domain owns itsmodel.go,repository.go,errors.go). - Docs auto-generate from the registry.
cmd/dump-events-registryemits JSON; VitePress build pipeline calls it; webhook events docs + automation trigger docs render from the dump (no hand-edited lists). The "registry IS the spec" delivery — drift between code and docs becomes mechanically impossible.
Reusable pattern for future code-first registries:
This pattern (Go-side init-registration + cmd/dump-{name}-registry + docs auto-gen) is documented in patterns.md as P51 — Code-first registries with generated documentation. Adopt for future cases where a small set of values has multiple documentation consumers and is naturally defined in code. Likely future adopters: per-org permissions catalog (currently hand-written, drift risk). Don't preemptively extend to permissions in this PR — that's foundation discipline (don't speculate; adopt when drift surfaces).
Implementation order inside 1C.3:
- [x]
internal/core/events/registry.go+schema.go—EventDefstruct +Register+Lookup+All+PublishWith(typed-payload publish that validates against the registry then round-trips to map[string]any) +JSONSchemaOf(reflection-based schema gen with uuid / date-time format hints). Tests cover registration, lookup, sort order, duplicate / conflict semantics, publish round-trip, type-mismatch rejection, deprecation metadata, and schema generation across mixed-type structs. - [x] Per-domain
events.gofiles for the two domains that publish today:organization(8 events — created / updated / member_added / member_role_changed / member_removed / domain.added / domain.verified / domain.removed) andportalonboarding(1 event — patient.onboarded). Each event has a typed payload struct (OrgCreatedPayload,OrgMemberAddedPayload, …) registered viainit(). The 9 existing publish sites migrated toPublishWith(eventXxx, orgID, resourceID, typed payload). Domains with no publisher today (consents, invites, breakglass, impersonation, legaldocument, patient_subscriptions, service_plans) getevents.gowhen they wire up publishing — foundation discipline (no speculation; the registry catalogs intent + reality, not aspiration). - [x]
cmd/dump-events-registrybinary —-format=jsonemits the full catalog with payload JSON schemas;-format=mdemits a layered Markdown table for embedding via VitePress include. Folds the optionalcmd/gen-event-schemasintoJSONSchemaOfper the design's "could be folded" caveat. - [x]
cmd/check-events-registry— replaces the oldercmd/check-events(deleted). Validates (1) everyevents.Typeconstant has a matchingevents.Registercall, (2) the committed_generated/events-catalog.mdmatches the freshly-generated form, (3) noevents.Publish(...)/events.PublishWith(...)call uses a string-literal name. Wired intomake check. - [x] VitePress integration:
apps/docs/architecture/_generated/events-catalog.mdis the auto-generated artifact, included fromapps/docs/architecture/events.mdvia<!--@include: -->. Regenerated bymake events-docsfrom the repo root (orservices/api). Hand-edited rows for Layer 2+ events without publishers were deleted — registry is the source of truth, grows feature-by-feature. Webhook docs (1C.4) and automation trigger docs (F8) will consume the same dump when they ship; nothing to wire there at 1C.3 close. - [x]
apps/docs/architecture/patterns.md— P51: Code-First Registries with Generated Documentation added under a new "Capability & Integration Architecture" group, with index entry. Cross-references P28 (events) and P39 (column classification — same registry-with-CI-guard discipline that predated this pattern). - [x] Acceptance test:
internal/core/events/acceptance_test.goexercises the full end-to-end flow — synthetic payload struct → Register → Lookup round-trip → JSONSchemaOf reflects the typed shape → PublishWith validates type + round-trips through bus → subscriber sees Data with omitempty respected → All() includes the entry.
Acceptance:
- [x] Payload schema location locked: Go-truth + JSON-schema generated.
- [x] Retired event handling locked:
deprecated_at+replaced_byin registry. - [x] Fan-out control locked: NO per-event controls; subscribers own their consumption logic.
- [x] Registry physical location locked: distributed per-domain with central init-registration.
- [x] Docs generation locked: auto-gen from registry; pattern documented as P51 for future adopters.
- [x]
internal/core/events/registry.go+ per-domainevents.gofiles shipped. - [x]
cmd/dump-events-registryshipped (JSON + Markdown formats; gen-event-schemas folded in viaJSONSchemaOf). - [x]
cmd/check-events-registryCI guard shipped, replacescmd/check-events, wired intomake check. - [x] Generated catalog at
apps/docs/architecture/_generated/events-catalog.mdincluded into the architecture events doc via VitePress<!--@include: -->. Webhook events docs / automation trigger docs land in 1C.4 / F8 against the same dump. - [x] P51 documented in patterns.md.
1C.4 Outbound Webhook Subscriptions (Cat C)
Clinic-configurable URL + signing secret + event-type filter. We POST signed payloads to the URL when matching events fire on events.Bus. Make.com, Zapier, n8n, custom clinic backends, Slack incoming webhooks — all the same row type from our side. The foundation marketplace primitive for outbound integrations.
Day-1 consumer. First paying clinic uses Make.com for CRM sync. 1C.4 ships the primitive AND the Make.com flow end-to-end at foundation close. No "framework only" — the framework is exercised by a real consumer.
Status: design locked 2026-05-06; shipped 2026-05-07 in two commits — Part 1 (foundation primitives) e4390da (schema 000016 + edits to 000004 + permission + classification + signing helper + domain package); Part 2 (runtime) wires the events.Bus subscriber + dispatcher (internal/core/webhooks/dispatcher) mirroring notify.Dispatcher one-to-one, the auto-pause notify category + en/ro templates, the partition-runner extraction (internal/core/partitions) used by both audit and webhooks, the route group at /v1/organizations/{id}/outbound-webhook-subscriptions/* with EnforceLimit("max_webhook_subscriptions", 1), the cmd/api bootstrap, the integration guide at apps/docs/reference/webhook-integration-guide.md, and a 3-test acceptance suite in internal/test/rlstest/webhooks_test.go. The Make.com end-to-end smoke against the integration guide closes in 1E staging.
Locked decisions:
- Signing scheme: HMAC-SHA256 over
timestamp.body(Stripe convention). HeadersX-RestartiX-Signature: sha256=<hex>,X-RestartiX-Timestamp: <unix>,X-RestartiX-Event: <event_name>. Receiver validates signature AND that timestamp is within ±5 min of "now" (rejects stale/future-dated payloads). Signing secret is shown ONCE at subscription create time + on regenerate; never readable thereafter. - Dual-secret rotation window. Schema carries
signing_secret_encrypted(current) +signing_secret_previous_encrypted NULL(previous, valid for 24h after rotation). OnPOST /{id}/regenerate-secret: write current → previous, generate new → current, return new secret to clinic (one-time response). Dispatcher signs with current; receiver verifies with EITHER. After 24h, previous cleared by background sweep (or on next mutation; sweep is more reliable). Mirrors Stripe rotation experience. - Retry policy. Retry on 5xx + network errors + timeouts. Don't retry 4xx (clinic's URL configuration is wrong; won't get better). Exponential backoff 1m / 5m / 30m / 1h / 6h. Dead-letter at 5 attempts. Mirrors 1A.18 notification dispatcher exactly — same SKIP LOCKED claim pattern, same backoff intervals, same cap. Engineers learn one outbox pattern.
- Auto-pause on consecutive failures. After 10 consecutive dead-lettered deliveries for the same subscription, auto-pause (
status='paused') + send a 1A.18 notification to the clinic admin ("your Make.com webhook subscription is paused; URL appears to be down at $url"). Clinic admin re-enables (PATCH .../{id}withstatus='active') when fixed. Protects us from indefinitely retrying broken endpoints. - Per-subscription rate limit. Configurable per subscription, default 100 deliveries/min. Excess events QUEUE with FIFO (writes a
pendingrow in deliveries; worker picks up at next tick). Cap value driven by entitlement quota (max_webhook_deliveries_per_minute); dedicated tier could have higher caps. - Payload envelope shape (locked):json
{ "event": "appointment.scheduled", "event_id": "evt_01hzg5...", "occurred_at": "2026-05-06T12:34:56Z", "organization_id": "01hzg5...", "data": { "...": "..." } }event_idenables clinic-side dedup of retries (we send the sameevent_idfor retried deliveries of the same event).datacarries the typed payload from the events registry (1C.3) — schema generated from Go struct. - Actor info OMITTED from envelope (locked). The envelope intentionally does NOT include
actor_idoractor_type. Receivers that need actor attribution (e.g., "was this triggered by a human admin or an agent or a service_account?") query our audit log via Cat F service-account API access. Keeps the envelope minimal, avoids leaking internal principal model across the boundary, and prevents versioning churn if the actor model evolves. Decision can be revisited if a customer specifically asks; not speculation worth. - Replay endpoint deferred.
POST /{id}/replay-deliveries(re-send a window of past deliveries) is NOT in foundation. Add when first customer asks. Schema accommodates it (outbound_webhook_deliveries.payloadis the source for replays). - Worker model. Mirror 1A.18 exactly: in-process polling goroutine, one per Core API instance, polls every 1–2s with
... WHERE status IN ('pending', 'retry') AND next_attempt_at <= NOW() FOR UPDATE SKIP LOCKED LIMIT N. Migration to a separatecmd/webhook-dispatcherbinary is mechanical when volume warrants — same pattern. - Wildcard event filters deferred. Foundation: explicit event names only in
event_filters. Wildcards (patient.*) discourage explicit allow-listing and complicate the registry-driven UI. Add later if real customer ask.
Schema (locked):
- [x]
outbound_webhook_subscriptions—(id UUID PK, organization_id UUID NOT NULL FK, target_url TEXT NOT NULL, signing_secret_encrypted BYTEA NOT NULL, signing_secret_previous_encrypted BYTEA NULL, signing_secret_rotated_at TIMESTAMPTZ NULL, event_filters TEXT[] NOT NULL, status TEXT NOT NULL DEFAULT 'active' CHECK (status IN ('active', 'paused', 'revoked')), failure_count INT NOT NULL DEFAULT 0, last_success_at TIMESTAMPTZ NULL, last_failure_at TIMESTAMPTZ NULL, created_by_principal_id UUID NOT NULL FK → principals(id), created_at, updated_at). RLS: org members withorganizations.manage_webhooksSELECT; INSERT/UPDATE/DELETE WITH CHECK same permission. AppPool DML. - [x]
outbound_webhook_deliveries—(id UUID, subscription_id UUID FK, event_id UUID NOT NULL, event_name TEXT NOT NULL, payload JSONB NOT NULL, status TEXT CHECK (status IN ('pending', 'retry', 'success', 'failed', 'dead_lettered')), attempt_count SMALLINT NOT NULL DEFAULT 0, next_attempt_at TIMESTAMPTZ NULL, claimed_at TIMESTAMPTZ NULL, claimed_by_worker_id TEXT NULL, last_attempt_at TIMESTAMPTZ NULL, last_response_status_code INT NULL, last_response_body TEXT NULL, dead_lettered_at TIMESTAMPTZ NULL, created_at). Range-partitioned monthly oncreated_atper P41 — one row per attempt, append-only, time-ordered, multi-month retention. PK(id, created_at). RLS: org members withorganizations.manage_webhooksSELECT (joined via subscription_id). REVOKE INSERT/UPDATE/DELETE fromrestartix_app— dispatcher writes via AdminPool. Migration seeds the current month only; the partition runner (cmd/audit-partition-roll, scope expanded in 1C.4 viainternal/core/partitions) extends the runway. - [x] Data classification:
signing_secret_encrypted+signing_secret_previous_encrypted=auth_secret;payloadregisters as variable-class — the field carries event payloads which already have classifications via the events registry, so the deliveries table inherits the most-permissive class of any included event payload. Practical effect: deliveries table hassupport_exportegress for ops debugging; nobulk_export. Document per CLAUDE.md data-classification rules. - [x] Permission seed: new
organizations.manage_webhookspermission, granted toadminsystem role template only. Subscription count gated by entitlementmax_webhook_subscriptionsquota (default_behavior=hard_block, period_kind=lifetime); per-minute delivery cap viamax_webhook_deliveries_per_minutequota (default_behavior=soft_meter, period_kind=per_minute, new value added tochk_limit_def_period_kind). Tier defaults: pro = 10 subs / 100 deliveries-per-minute; dedicated = unlimited (NULL caps).
Endpoints (locked):
- [x]
GET /v1/organizations/{id}/outbound-webhook-subscriptions[?status=&limit=&offset=]— list. RLS-gated. - [x]
POST /v1/organizations/{id}/outbound-webhook-subscriptions— create. Body{target_url, event_filters: [...]}. Server generatessigning_secret, returns{id, target_url, event_filters, status, signing_secret}(secret one-time read). Validatesevent_filtersagainst the events registry (1C.3) — unknown event names rejected with400 unknown_event_namelisting the offending names. Quota enforced viaEnforceLimit("max_webhook_subscriptions", 1). - [x]
GET /v1/organizations/{id}/outbound-webhook-subscriptions/{id}— read. Returns row WITHOUT signing_secret. - [x]
PATCH /v1/organizations/{id}/outbound-webhook-subscriptions/{id}— updatetarget_url,event_filters, orstatus. Same registry validation onevent_filters. Status only transitions active ↔ paused; revoke goes through DELETE. - [x]
DELETE /v1/organizations/{id}/outbound-webhook-subscriptions/{id}— soft-delete (status = revoked); preserves history. - [x]
POST /v1/organizations/{id}/outbound-webhook-subscriptions/{id}/regenerate-secret— rotates signing secret per the dual-secret pattern. Returns new secret one-time. - [x]
POST /v1/organizations/{id}/outbound-webhook-subscriptions/{id}/test— fires synthetic test payload (eventsubscription.test) inline (signs + POSTs via the handler's HTTPClient — no persistence). Returns receiver status code + body truncated to 4 KiB for the clinic admin to debug their Make scenario in real-time. - [x]
GET /v1/organizations/{id}/outbound-webhook-subscriptions/{id}/deliveries[?status=&limit=&offset=]— list recent deliveries with status. RLS-gated. Useful for "did event X fire successfully?" debugging.
Day-1 Make.com flow:
- Clinic admin creates a Make scenario with "Webhooks" trigger, gets a URL.
- Clinic admin creates a webhook subscription via the UI (1D, deferred): pastes URL, picks events from a registry-driven dropdown (e.g.,
patient.onboarded,appointment.completed), receives signing secret one-time. - Clinic configures Make scenario to verify our
X-RestartiX-Signatureheader. - On first matching event, Make scenario fires; clinic's CRM is synced.
- Clinic uses the test endpoint to validate the round-trip without waiting for a real event.
UI placement (deferred to 1D, locked here for clarity):
- Marketplace page lists "Webhook Subscriptions" as a single card (with logos: Make.com, Zapier, n8n, "custom") that links to the management section. NOT one card per third-party tool — they're all the same row type.
- Webhook Subscriptions section (under Settings → Integrations or similar) — list / create / edit / delete / test / view deliveries. Backend contract is stable per the endpoints above.
- Connected Accounts (Cat B) get distinct cards on the Marketplace; that's a different surface (1C.5).
Implementation order inside 1C.4:
- [x] Migration
000016_outbound_webhookscreating both tables with RLS + permissions + data-classification + audit triggers (per-delivery transitions exempt from audit per CLAUDE.md operational-metadata rule; subscription state changes audit at the application layer with full diff). - [x]
internal/core/domain/webhooks/package: model / errors / repository / service / handler. - [x]
internal/core/webhooks/dispatcher/package: events.Bus subscriber + outbox worker (SKIP LOCKED claim, exponential backoff, dead-letter, auto-pause logic). Mirror 1A.18'snotify.dispatchershape one-to-one. - [x] Endpoints mounted under per-org route group with
RequireURLOrgMatchesScope("id")(P47) +RequirePermission(PermOrganizationsManageWebhooks). Create additionally gates onEnforceLimit("max_webhook_subscriptions", 1). - [x] HMAC signing helper in
internal/core/webhooks/signing/withSign(secret, timestamp, body) string+Verify+VerifyWithRotation(dual-secret) +ParseTimestampHeader+ 9 unit tests covering round-trip, tampered body / timestamp / dual-secret mismatch, stale-window, drift acceptance, header parse. - [x] Integration guide
apps/docs/reference/webhook-integration-guide.mdwith code samples for verifying signatures (Node/Python/Go) + Make.com recipe + Test endpoint usage. Event payload schemas auto-rendered from the 1C.3 registry catalog at/architecture/events. - [x] Entitlement quotas (
max_webhook_subscriptions,max_webhook_deliveries_per_minute) added to the entitlements catalog (1C.9 rename — wire via the new naming). Thechk_limit_def_period_kindCHECK in 000004 expanded to include'per_minute'. - [x] Auto-pause notification: new
notify.CategoryWebhookSubscriptionPaused+ en/ro email templates, plus a dispatcher-sideAdminAutoPauseNotifierthat resolves clinic admins viarole_permissions @> manage_webhooksand fans out per-recipient. Idempotency keyed on(subscription_id, failure_count, principal_id)so re-firing the same auto-pause condition deduplicates. - [x] Partition runner:
internal/core/partitionspackage with sharedEnsureMonthlyhelper.audit.EnsurePartitionsandwebhookdispatcher.EnsurePartitionsboth delegate;cmd/audit-partition-rollrolls both sets each tick (binary name retained for scheduler back-compat). - [x] cmd/api bootstrap: webhook events.Bus subscriber + dispatcher constructed at startup, dispatcher runs in its own goroutine, both stop gracefully on SIGTERM before
events.Shutdown. - [x] Acceptance test in
internal/test/rlstest/webhooks_test.go(3 tests): (1) subscriber → fire event → fake server captures signed POST → verify signature + envelope shape + delivery row transitioned tosuccess; (2) auto-pause path: 10 dead-lettered deliveries → subscription transitionsactive→pausedexactly once + recording notifier fires once; (3) end-to-end notifier writes anotificationsrow with categorywebhook_subscription_pausedto the qualifying admin.
Acceptance:
- [x] Signing scheme + replay window + dual-secret rotation locked.
- [x] Retry policy + auto-pause + per-subscription rate limit locked.
- [x] Payload envelope shape locked.
- [x] API surface locked (replay endpoint deferred).
- [x] Worker model locked (mirror 1A.18).
- [x] UI placement clarified (single marketplace card; dedicated management section; deferred to 1D).
- [x] Schema + RLS + permissions + entitlement quotas shipped (commit
e4390da). - [x] Domain package + dispatcher + signing helper shipped.
- [x] Endpoints shipped (UI deferred to 1D).
- [x] Integration guide shipped; Make.com end-to-end smoke test closes in 1E staging.
- [x] Acceptance test extension to
internal/test/rlstest/webhooks_test.go(split out ofsetup_clinic_test.gofor focus).
1C.5 Connected Accounts (Cat B)
Per-org table where clinic admins connect external services they own (Google Calendar, Slack, HubSpot, future EHRs). Foundation ships the table + framework. OAuth callback infrastructure deferred to first OAuth-using consumer (likely F-tier scheduling for Google Calendar).
Status: design locked 2026-05-06; shipped 2026-05-07 — migration 000017_connected_accounts (integration_services + organization_integrations with RLS, permission seed, app-layer audit), internal/core/domain/integrations/ (model / errors / repository / service / handler), internal/core/integrations/ (Connector interface + init-time Register / Lookup / Reset), per-org route group at /v1/organizations/{id}/integrations/* with RequireURLOrgMatchesScope("id") + RequirePermission(PermOrganizationsManageIntegrations), public catalog endpoint at /v1/integration-services (rate-limited under public_resolve), data-classification entries (auth_secret on credentials_encrypted; no egress), integration guide at apps/docs/reference/connected-account-integration-guide.md, 3-test rlstest acceptance suite (integrations_test.go). No real Cat B catalog rows seeded at foundation — first F-tier consumer (likely Google Calendar at F4 Scheduling) ships the first row + connector implementation + OAuth callback handler in its own PR.
Locked decisions:
- Hybrid auth shape: typed universal columns + encrypted credentials blob + plaintext per-service config JSONB. Mirrors existing platform pattern (
audit_loghas typed fields + JSONB metadata;notification_deliveriessimilar). Single table shape accommodates all auth types (OAuth, API key, signing-secret-only) without per-pattern migrations.- Typed columns (queryable, indexable):
id,organization_id,integration_service_id,auth_type,external_account_id,title,status,oauth_expires_at(so we can run "expiring soon" sweeps without decrypting),last_used_at,last_error_at,last_error,created_by_principal_id,created_at,updated_at. credentials_encrypted BYTEAfor secrets — contents vary byauth_type(OAuth:{access_token, refresh_token, scopes}; API key:{api_key}; webhook-in-only:{signing_secret}). AES-GCM via 1A.3 helper, version-stamped.config JSONBplaintext for non-secret per-service config (calendar IDs, scope subsets, custom field mappings, webhook event filters) — queryable for ops + UI without decryption.
- Typed columns (queryable, indexable):
- Status lifecycle: five values.
connected(auth working, healthy);expired(OAuth refresh-token rejected, requires re-OAuth — distinct user-facing recovery);revoked(clinic admin disconnected via UI);error(provider returned 401/403 repeatedly via healthcheck or runtime);pending(created but OAuth flow not yet completed; auto-deleted if not transitioned within 30 min). Theexpiredvs.errorsplit matters because the user-facing recovery is different —expiredshows "Reconnect" button;errorshows "Provider unavailable, retrying." - OAuth client ownership: platform-level per provider (Cat A inside Cat B). RestartiX registers ONE Google Cloud project, ONE Slack app, ONE HubSpot OAuth app, etc. Per-org clients (Option B from design discussion) explicitly rejected — each clinic would have to create their own Google Cloud project, configure OAuth consent screen, get verified by Google, paste credentials into our UI. Brutal UX especially for non-technical clinic admins. The OAuth client itself (client_id + client_secret per provider) is Cat A — lives in
platform_service_providerskeyed by capabilityoauth_client_google/oauth_client_slack/ etc. The resulting per-clinic access+refresh tokens are Cat B — live inorganization_integrations. The two layers compose: Cat A holds the keys to mint tokens; Cat B holds the tokens themselves. - Catalog seeded EMPTY at foundation. No Cat B integration ships in 1C.5. Each F-tier consumer adds an
integration_servicesrow + firstorganization_integrationsconsumer in its own PR. First likely consumer: Google Calendar at F4 Scheduling. Foundation discipline — don't speculate against unknown future config shapes. - Per-service config validation deferred to per-service connectors. Foundation just provides the
config JSONBcolumn. First connector adds Go-side validation (internal/core/integrations/connectors/google_calendar/validate.go) when it ships. The catalog row carriesconfig_schema JSONBplaceholder column for future JSON-schema-based validation if a UI generic config-form needs it; foundation leaves it NULL. - Cross-tenancy: catalog is platform-scoped, no per-org overrides.
integration_servicesrows are seeded by platform team via migrations. Clinics consume; clinics don't write. No use case for org-customized catalog ever surfaced. - OAuth callback infrastructure deferred. First OAuth-using consumer adds:
/oauth/callback/{provider}route, state-token CSRF protection (signed JWT carrying(org_id, principal_id, integration_service_id, return_url)), auth code → token exchange via the platform OAuth client (resolved via 1C.2 Cat A resolver), refresh-token rotation worker. Foundation 1C.5 just lays the table + service skeleton. - OAuth requires interactive human consent. OAuth connections (auth_type='oauth2') can ONLY be created by human principals — the OAuth dance requires a browser redirect + provider consent UI. Service_accounts (Cat F) and agents cannot trigger OAuth flows. They CAN create API-key-auth connections (auth_type='api_key') programmatically. The
created_by_principal_idcolumn accepts any principal type; the auth-flow-vs-principal-type compatibility is enforced at the handler/connector level, not the schema.
Schema (locked):
- [x]
integration_services— platform catalog of supported integrations.(id UUID PK, slug TEXT NOT NULL UNIQUE, name TEXT NOT NULL, description TEXT, auth_type TEXT NOT NULL CHECK (auth_type IN ('oauth2', 'api_key', 'webhook_in_only')), oauth_scopes TEXT[], oauth_client_capability TEXT NULL, icon_url TEXT, status TEXT NOT NULL DEFAULT 'available' CHECK (status IN ('available', 'beta', 'deprecated')), config_schema JSONB, created_at, updated_at).oauth_client_capabilityis the FK-by-string to the Cat A capability holding the OAuth client_id/secret (e.g.,'oauth_client_google'). Schema CHECK enforcesoauth_client_capability IS NOT NULLexactly whenauth_type = 'oauth2'. RLS: SELECT for everyone (catalog is public-by-design so the marketplace UI works); mutations via AdminPool only (catalog edits = migrations or superadmin admin tool). - [x]
organization_integrations— per-org connections.(id UUID PK, organization_id UUID NOT NULL FK, integration_service_id UUID NOT NULL FK → integration_services(id), auth_type TEXT NOT NULL, external_account_id TEXT NOT NULL, title TEXT NOT NULL, status TEXT NOT NULL DEFAULT 'pending' CHECK (status IN ('pending', 'connected', 'expired', 'revoked', 'error')), oauth_expires_at TIMESTAMPTZ NULL, credentials_encrypted BYTEA NOT NULL, config JSONB NOT NULL DEFAULT '{}', last_used_at TIMESTAMPTZ NULL, last_error_at TIMESTAMPTZ NULL, last_error TEXT NULL, created_by_principal_id UUID NOT NULL FK → principals(id), created_at, updated_at). UNIQUE(organization_id, integration_service_id, external_account_id)— multiple Google Calendars per org (per specialist) is fine; same external account twice is not. Partial indexes for OAuth-expiring sweep + pending-GC sweep. RLS: org members withorganizations.manage_integrationsSELECT; INSERT/UPDATE/DELETE WITH CHECK same permission. AppPool DML for the per-org happy path; AdminPool for OAuth callback handler (which writes outside the request's tx since the OAuth dance flows through a redirect). - [x] Data classification:
credentials_encrypted=auth_secret(no egress targets);external_account_id+title+auth_type+status+last_error=org_internalwithsupport_export;configregisters asorg_internalwithsupport_export(per-service review obligation when each connector ships). Catalog (integration_services) ispublicfor marketplace pre-auth render. - [x] Permission seed: new
organizations.manage_integrationspermission, granted toadminsystem role template only.
Framework (locked):
- [x]
internal/core/domain/integrations/— Connected-Account domain package: model / errors / repository / service / handler. Reads/writes per-org rows; orchestrates connector validation + credential encryption + healthcheck. - [x]
internal/core/integrations/—Connectorinterface + init-time registry (Register/Lookup/RegisteredSlugs/Reset). Per-connector packages register impls viainit(), mirroring the events registry pattern from 1C.3. Final shape (creds passed as opaque[]byteso the framework stays per-impl-type-erased):gotype Connector interface { Slug() string ValidateConfig(ctx context.Context, config map[string]any) error ValidateCredentials(ctx context.Context, creds map[string]any) error Healthcheck(ctx context.Context, creds []byte, config map[string]any) error RefreshOAuthToken(ctx context.Context, creds []byte) ([]byte, error) } - [x] OAuth callback infrastructure NOT shipped at foundation. The
Service.Createpath explicitly rejectsauth_type='oauth2'with400 oauth_requires_callback_flow. The first OAuth-using consumer ships/oauth/callback/{provider}route + state-token CSRF + auth-code → token exchange + refresh-token rotation worker.
Endpoints (locked):
- [x]
GET /v1/organizations/{id}/integrations[?status=&service_slug=&limit=&offset=]— list connections. RLS-gated. - [x]
GET /v1/organizations/{id}/integrations/{id}— read. Returns row WITHOUT credentials_encrypted (decrypted credentials NEVER leave the API surface). - [x]
POST /v1/organizations/{id}/integrations— create. Body{integration_service_id, auth_type, credentials, config}. Used today by API-key auth_type; OAuth auth_type goes through/oauth/callback/{provider}handler instead (deferred). Server validates against the catalog row + per-connector validator. - [x]
PATCH /v1/organizations/{id}/integrations/{id}— updatetitleorconfig(NOT credentials — those rotate via separate flow). - [x]
DELETE /v1/organizations/{id}/integrations/{id}— soft-delete (status='revoked'). Subsequent connector calls fail; clinic re-creates if they want to reconnect. - [x]
POST /v1/organizations/{id}/integrations/{id}/test— runs the connector'sHealthcheckmethod on demand. Returns success/failure + error context for the clinic admin to debug. - [x] Public catalog endpoint:
GET /v1/integration-services— lists available integrations from the catalog. Public-resolve style (no auth required, rate-limited per IP underpublic_resolve) so the marketplace landing page works pre-login, similar to org resolve.
UI placement (1D, locked here for clarity):
- Marketplace page lists each
integration_servicesrow as a discrete card (Google Calendar, Slack, HubSpot, etc., one card per service). Each card shows status if connected, "Connect" button if not. Clicking "Connect" kicks off OAuth (deferred infra) or opens the API-key form depending onauth_type. - Connected Accounts management section — list of active
organization_integrationsrows for this org, with status, last used, disconnect, test, edit-config actions. - Distinct from 1C.4's webhook subscriptions UI placement — Connected Accounts has one card per service; Outbound Webhooks has one card total. Both surface from the marketplace.
Implementation order inside 1C.5:
- [x] Migration
000017_connected_accountscreatingintegration_services+organization_integrationswith RLS + permission + data-classification entries. - [x]
internal/core/domain/integrations/package with model / errors / repository / service / handler. - [x]
Connectorinterface declaration + registration mechanism (init-time, per package). - [x] Endpoints mounted under per-org route group with
RequireURLOrgMatchesScope("id")(P47) +RequirePermission(PermOrganizationsManageIntegrations)for mutations. Public catalog endpoint mounted under/v1/integration-services(no auth, rate-limited). - [x] P31 verified —
organization_integrations.credentials_encryptedis the canonical Cat B per-org credential store the pattern describes. - [x] Acceptance test in
internal/test/rlstest/integrations_test.go(3 tests): (1) framework round-trip — fixture catalog row → org creates connection (API-key auth_type) → connector validate hooks fire → service.RunHealthcheck dispatches → revoke + idempotent re-revoke; (2) OAuth-rejected-from-direct-create — auth_type='oauth2' Create returns ErrOAuthRequiresCallbackFlow; (3) RLS defense-in-depth — specialist role's repo.Insert blocked at the DB layer despite bypassing the service permission check.
Acceptance:
- [x] Hybrid auth shape locked.
- [x] Five-status lifecycle locked.
- [x] OAuth client ownership: platform-level (Cat A holds clients; Cat B holds per-clinic tokens). The two-layer composition is the architectural insight.
- [x] Catalog empty at foundation; first F-tier consumer seeds first row + OAuth infra.
- [x] Per-service config validation deferred to per-service connectors.
- [x] Cross-tenancy: catalog platform-scoped, no per-org overrides.
- [x] Schema + RLS + permission seed shipped.
- [x]
internal/core/domain/integrations/package +Connectorinterface + registration mechanism shipped. - [x] Endpoints shipped; public catalog endpoint live.
- [x] Acceptance test (framework-only — no real Cat B integration at foundation).
1C.6 Inbound Webhook Framework (Cat D)
Convention for
/webhooks/{provider}route mounting + per-provider signature verification + once-and-only-once dedup + state update + Internal Event emission (Cat E). Mostly a documented convention plus a small dedup table and a CI guard.
Status: design locked 2026-05-06; shipped 2026-05-07 framework-only — migration 000018_inbound_webhook_dedup (monthly-partitioned dedup table with REVOKE on restartix_app, current-month seed), internal/core/inboundwebhooks/dedup/ repo helpers (WasProcessed + MarkProcessed, AdminPool only) + EnsurePartitions registered in cmd/audit-partition-roll, P52 documented in patterns.md, cmd/check-inbound-webhooks CI guard (AST-based; scans internal/integration/*/inbound/ packages for the four required call sites; passes with zero handlers today + four unit tests verify the guard's correctness on synthetic inputs), integration guide at apps/docs/reference/inbound-webhook-guide.md, 3-test rlstest acceptance suite (inbound_dedup_test.go). The original spec premise of a Daily.co retrofit was stale (no Daily.co handler exists in the codebase) — first F-tier consumer ships the first per-provider verifier + handler + Cat E event registration in their own PR.
Locked decisions:
- Per-provider verification helpers (no generic abstraction). Each provider's signature scheme lives in its own package —
internal/integration/stripe/inbound/verify.go,internal/integration/dailyco/inbound/verify.go,internal/integration/clerk/svix/verify.go,internal/integration/ses/sns/verify.go. Each package exportsVerify(req *http.Request, secret []byte) error(or equivalent). Signature schemes don't share enough structure for a shared abstraction to be worth the indirection — Stripe ist=...,v1=...HMAC of timestamp+body; SES SNS uses X.509 cert chain validation; Svix has its own three-header format; Google echoes back our opaque token. False abstraction over them would be brittle and obscure. Convention is the framework; per-provider helpers are the work. - Dedup table at foundation:
inbound_webhook_dedup. Range-partitioned monthly per P41.(provider TEXT, event_id TEXT, processed_at TIMESTAMPTZ)with PK(provider, event_id, processed_at)(composite for partition compatibility). Repo helpersWasProcessed(ctx, provider, eventID) (bool, error)andMarkProcessed(ctx, provider, eventID)enforce once-and-only-once across every provider from Day 1. Retention bounded by max provider retry windows (~30 days for Stripe, less for others — drop partitions older than 60 days). Worth shipping at foundation rather than deferring per-provider because the cost is one table + two helpers, and ALL inbound webhook handlers benefit immediately. - Per-connection inbound tokens stored in
organization_integrations.config(Option A). For Cat B providers that push notifications (Google Calendar, Microsoft 365), each per-org connection registers a push channel; the provider echoes back a token we generated at registration. Token lives in the connection'sconfig JSONB(e.g.,config.push_channel.token). Lookup via GIN index onconfig. NOT clinic-facing — purely internal routing. Distinct from Cat C signing secrets (which are clinic-managed and client-visible at create time). Co-locates token with the connection it belongs to; deletion is automatic via cascade. At expected scale (~9k Cat B connections at the high end), JSONB-scan with GIN is microseconds; if a future provider's push volume changes that, migrating to a dedicated table is a denormalization, not a model change. - Standard inbound flow (locked): every inbound webhook handler runs in this order: (1) verify signature using the per-provider helper, return
401on mismatch; (2) calldedup.WasProcessed(provider, event_id), return200early if already seen (provider treats this as ack); (3) call domain service to update state via the request's tx; (4) calldedup.MarkProcessed(provider, event_id)in the same tx; (5) commit; (6) on success, emit a Cat E Internal Event via 1C.3 registry describing the inbound effect (e.g.,appointment.recording_available,payment.received,auth.user_synced). The standard fan-out (audit, notification dispatcher, outbound webhook dispatcher, automations engine) consumes the event downstream. - Audit existing Daily.co handler at 1C.6 close. Today's Daily.co recording webhook verifies the signature and updates the appointment record but doesn't dedup or emit an internal event. Retrofit at 1C.6: add dedup, emit
appointment.recording_availableon success. Same audit will catch any other inbound webhook that's drifted from the convention. Document any unavoidable exceptions explicitly.
Schema (locked):
- [x]
inbound_webhook_dedup—(provider TEXT NOT NULL, event_id TEXT NOT NULL, processed_at TIMESTAMPTZ NOT NULL DEFAULT NOW()). Range-partitioned monthly onprocessed_at. PK(provider, event_id, processed_at). RLS enabled with no policies; REVOKE INSERT/UPDATE/DELETE/SELECT fromrestartix_app— table is invisible to the app role for both reads and writes; AdminPool (owner) bypasses RLS. Retention: drop partitions older than 60 days (configurable; covers max provider retry windows; sweep is a future operational concern). - [x] No new permission seed — inbound webhook handlers run on a sibling router group with their own auth shape (signature verification, not session auth). Dedup table is operational infrastructure, not user-facing.
Per-provider package shape (locked):
internal/integration/{provider}/inbound/
├── verify.go // Verify(req *http.Request, secret []byte) error
├── parse.go // Parse(body []byte) (Event, error) -- typed event extraction
└── handler.go // mount on /webhooks/{provider}; runs the standard flowThe router group (mounted at /webhooks/) is auth-naked from the JWT side — verification happens via the per-provider helper. CSRF doesn't apply because there's no session. Rate limiting per-provider via 1A.13's existing infrastructure (e.g., reject if a provider sends >100 requests/sec to its endpoint, which would indicate a runaway loop or attack).
Implementation order inside 1C.6:
- [x] Migration
000018_inbound_webhook_dedup(monthly-partitioned) with RLS + REVOKE onrestartix_app(read + write) + current-month seed (mirrors 1A.15 audit_log partition shape). - [x]
internal/core/inboundwebhooks/dedup/package withWasProcessed+MarkProcessedrepo helpers (AdminPool path only) +EnsurePartitionsregistered incmd/audit-partition-roll. - [-] Per-provider
verify.gopackages — DEFERRED. No inbound provider handler exists at foundation; original spec premise (Daily.co handler) was stale. First F-tier consumer ships the first per-provider package alongside its handler. - [-] Audit Daily.co handler at 1C.6 close — DEFERRED for the same reason; nothing to retrofit.
- [x] Documentation as P52 — Inbound Webhook Convention in patterns.md covering: route mount path, signature verification, dedup, state update, internal event emission, error handling.
- [x] CI guard
cmd/check-inbound-webhookswalks everyinternal/integration/*/inbound/package (the convention's home for handlers) and asserts the four required call sites:*Verify*,dedup.WasProcessed,dedup.MarkProcessed,events.Publish/PublishWith/NewEvent. AST-based, with four unit tests proving the guard rejects non-compliant fixtures and accepts compliant ones. Wired intomake check. Passes with zero handlers today (noinbound/packages); first F-tier consumer triggers the first non-trivial run. - [x] Integration guide apps/docs/reference/inbound-webhook-guide.md — for engineers adding new inbound webhooks; references the per-provider package shape and the standard flow.
- [x] Acceptance test
internal/test/rlstest/inbound_dedup_test.go(3 tests): WasProcessed/MarkProcessed round-trip; AppPool blocked from SELECT + INSERT (REVOKE proof); repeated MarkProcessed remains idempotent at the protocol surface. Per-provider replay-path tests land with the first F-tier consumer.
Acceptance:
- [x] Verification helpers locked: per-provider, no generic abstraction.
- [x] Dedup table locked:
inbound_webhook_dedupships at foundation, monthly-partitioned, used by every provider. - [x] Per-org inbound tokens locked: stored in
organization_integrations.config(Option A); GIN-indexed JSONB scan; not clinic-facing. - [x] Standard flow locked: verify → dedup → state update → mark processed → emit Cat E event.
- [-] Daily.co handler retrofit scope locked: add dedup + internal event emission. DEFERRED — no Daily.co handler exists in the codebase; spec premise was stale. Retrofit will land with the first F-tier consumer that adds the first per-provider handler.
- [x] Schema (dedup table) + repo helpers shipped.
- [-] Per-provider verify packages aligned with convention. DEFERRED — no per-provider handlers at foundation; convention is documented in P52 + integration guide for the first F-tier consumer.
- [-] Daily.co handler refactored. DEFERRED — same as above.
- [x] P52 documented in patterns.md.
- [x] CI guard
cmd/check-inbound-webhooksshipped. - [x] Integration guide published.
1C.7 Metering & Quotas
Per-capability usage records captured at the capability seam. Per-org quotas enforced as hard limits at the seam — exceeding fails the call. Pricing engine + invoicing deferred to a later subject; this layer just measures + caps. Critical foundation work because AI cost-per-call is high and runaway-cost protection is non-optional from Day 1.
Status: shipped 2026-05-07. Three-table schema landed in migration 000019 with the usage.view_org permission and full data-classification entries; internal/core/metering/ exposes the AdminPool-backed Repository implementing capabilities.MeterStore plus LoadLimits, EnsureQuotaRow, AdvanceExpiredQuotas, RollupClosedPeriod, SyncOrgLimits, and a TelemetryEmitter hook. The capabilities wrap stack moved meterAfterSuccess → meterAroundCall (innermost; Reserve before inner, Refund on failure, Record on success) and now resolves the metering org via principal.Subject (request paths) or ContextWithMeteringOrg (dispatcher / system paths). notify.email.NewMeteredChannel wraps the SES adapter through WrapMeteredProvider; the new cmd/usage-quota-reset and cmd/usage-summary-rollup crons handle period boundaries and closed-period rollups; cmd/audit-partition-roll rolls the usage_records monthly partitions alongside the existing tables. Subscription mutations call subscriptions.Service.SetLimitSyncer to project plan-derived caps to usage_quotas.limit_units. Acceptance suite at internal/test/rlstest/metering_test.go covers atomic gate, refund, period reset, summary rollup, sync, and AppPool write blocks.
Why foundation, not feature. AI feature cost can blow up an org's monthly bill in hours if a runaway agent loop hits an unmetered LLM endpoint. Without per-org quotas, a misconfigured automation could rack up $10K of LLM cost on one clinic before anyone notices. Metering + caps are mandatory before any AI feature ships, which means foundation. Pricing / invoicing / billing UI is later — separate subject built on top of this.
Locked decisions:
Three-table model. Three distinct tables, three distinct purposes:
usage_records— append-only event log. One row per metered call. Source of truth.usage_quotas— running counter for the CURRENT period only. Resets at period boundary. Real-time counter that gates runtime calls.usage_summaries— closed-period historical totals. Created by end-of-period cron. Survives quota resets; provides historical record for billing/analytics.
Retention.
usage_records12-month hot retention; older monthly partitions archive to S3 (mirrors 1A.15 audit_log archive pattern).usage_summarieskeeps longer (closed-period historical record for billing).usage_quotasis one row per (org, capability, period) — no retention question; it's small and lives forever.Quota enforcement: atomic-increment-with-refund-on-failure (Option B). Single SQL on every metered call:
sqlUPDATE usage_quotas SET current_units = current_units + $units WHERE organization_id = $org AND capability = $cap AND period_end_at > NOW() AND (limit_units IS NULL OR current_units + $units <= limit_units) RETURNING current_units;If no row updated → quota exceeded; fail with
402 quota_exceededBEFORE the provider call. Race-free at the DB level — concurrent calls can't both pass when only one slot remains. On provider-call failure, decrement (refund) so failed calls don't burn quota. Standard cloud-API pattern.Period boundaries: calendar UTC.
period IN ('day', 'week', 'month')— three granularities. Each capability picks at registration time (e.g., AI tokens daily for cost protection; emails weekly or monthly; storage monthly). Reset cron runs at boundary (UTC midnight daily, Monday UTC weekly, first-of-month UTC monthly); setscurrent_units = 0and bumpsperiod_start_at/period_end_at. Document timezone explicitly in clinic-facing usage UI so admins aren't surprised.Day-1 metered capability: email only. 1C.7 wires
notify.emailthrough metering at foundation as the exercise consumer. Capabilityemail, unit_typeemails_sent, default monthly quota (configurable per plan viaplan_limits). Real metering data accumulates by 1E staging. Storage / AI / video / SMS metering land at their respective consumers (each one decides its unit semantics — bytes vs. ops for storage; input/output tokens for AI; minutes for video). Foundation discipline — don't speculate on units we don't yet have a consumer for.Aggregation cadence: end-of-period cron. Atomic-increment already keeps
usage_quotas.current_unitsreal-time-accurate (which is what runtime gating needs). The cron rollsusage_records→usage_summariesat period close (one bulk INSERT per period boundary). Continuous summary updates explicitly rejected — doubles every write path; speculation against an unknown future need.Quota source — two-family entitlements (see glossary.md → Entitlement).
usage_quotas.limit_unitssyncs fromorganization_subscription_limits(the QUOTA family —limit_definitions/plan_limits/organization_subscription_limits, distinct from the BOOLEAN family renamed in 1C.9). Sync is one-way: subscription_limits drive quotas; never reverse. On subscription create → snapshot from plan_limits → write quota row. On override → write through to quota row. On period reset → no-op for limit (limit doesn't change at boundary; current_units does).Quotas are org-scoped, not principal-scoped. A single
usage_quotasrow per(organization_id, capability, period)— every actor in the org (humans, agents, service_accounts) shares the same counter. Clinic's plan governs total monthly usage regardless of who triggers a call. Per-actor-type sub-quotas (e.g., "agents can use up to 30% of the org's AI quota") are NOT in the design — speculation against an unknown future need; quota schema doesn't accommodate it (noprincipal_typediscriminator) and adding later is a non-trivial migration if real demand surfaces. Document the choice explicitly so a future reviewer doesn't accidentally add per-principal sub-quotas without ADR-level discussion.Live vs. historical clinic visibility:
- "How much have I used THIS period?" →
usage_quotas.current_units(real-time, gated UI surface in 1D — clinic admin sees live counters against their plan). - "What's my history?" →
usage_summaries(after period close; per-month rollup). - "Show me every email sent on date X for billing dispute?" →
usage_records(event log, available for support within retention window).
- "How much have I used THIS period?" →
Telemetry forwarding. Usage records forward to the telemetry sibling service for analytics — per-tier usage patterns, capacity planning, AI cost trends across orgs. PII pseudonymized at forwarding (org_id hashed; capability/units/cost are non-PII so they pass through). Same pipeline as audit forwarding (1A's pattern).
Schema (locked):
- [x]
usage_records—(id UUID, organization_id UUID NOT NULL FK, capability TEXT NOT NULL, units BIGINT NOT NULL, unit_type TEXT NOT NULL, cost_cents INT NULL, principal_id UUID NULL FK, occurred_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), metadata JSONB NOT NULL DEFAULT '{}'). Range-partitioned monthly onoccurred_atper P41. PK(id, occurred_at). RLS: org members with newusage.view_orgpermission SELECT; AdminPool writes only. REVOKE INSERT/UPDATE/DELETE fromrestartix_app. - [x]
usage_quotas—(id UUID PK, organization_id UUID NOT NULL FK, capability TEXT NOT NULL, period TEXT NOT NULL CHECK (period IN ('day', 'week', 'month')), limit_units BIGINT NULL, current_units BIGINT NOT NULL DEFAULT 0, period_start_at TIMESTAMPTZ NOT NULL, period_end_at TIMESTAMPTZ NOT NULL, last_reset_at TIMESTAMPTZ NULL, updated_at). UNIQUE(organization_id, capability, period). NULLlimit_units= unlimited (e.g., enterprise tier;usage_quotasrow still exists for tracking). RLS: org members withusage.view_orgSELECT; AdminPool writes only. - [x]
usage_summaries—(id UUID PK, organization_id UUID NOT NULL FK, capability TEXT NOT NULL, period TEXT NOT NULL, period_start_at TIMESTAMPTZ NOT NULL, period_end_at TIMESTAMPTZ NOT NULL, total_units BIGINT NOT NULL, total_cost_cents BIGINT NOT NULL DEFAULT 0, calls_count INT NOT NULL, created_at). UNIQUE(organization_id, capability, period, period_start_at). RLS: org members withusage.view_orgSELECT; cron writes via AdminPool. - [x] Data classification: all four tables register their columns in
data-classification.md.metadata JSONBonusage_recordsregisters as variable-class (per-capability metadata may carrypii_basicif e.g., recipient_email is included; reviewed per consumer). - [x] Permission seed: new
usage.view_orgpermission, granted toadminandcustomer_supportsystem role templates (operations need to debug; specialists don't).
Implementation order inside 1C.7:
- [x] Migration creating three tables + RLS + permission + data classification entries.
- [x]
internal/core/metering/package: atomic-increment query helper (Reserve(ctx, org, capability, period, units)), refund helper (Refund(...)), record-write helper (Record(...)),LoadLimitsfor org-scope middleware,EnsureQuotaRow+SyncOrgLimitsfor the subscription-mutation path,AdvanceExpiredQuotas+RollupClosedPeriodfor the crons,EnsurePartitionsregistered withcmd/audit-partition-roll. The capabilities wrap stack shipsmeterAroundCall(replacing the oldmeterAfterSuccessskeleton) so Reserve/Record/Refund compose innermost. - [x] Period-reset cron
cmd/usage-quota-reset— runs at calendar boundaries; loopsAdvanceExpiredQuotasuntil no expired rows remain. - [x] Period-close cron
cmd/usage-summary-rollup— rolls day every run, week on UTC Monday, month on UTC-1st (-forceoverrides for backfill). - [x]
notify.email.SESChannelentersWrapMeteredProvidervianotifyemail.NewMeteredChannel. Capabilityemail, periodmonth, unit_typeemails_sent, units 1. Dispatcher delivers under the system principal so the metered adapter attaches the notification'sorganization_idviacapabilities.ContextWithMeteringOrg. - [x] Quota source sync:
subscriptions.Service.syncLimits(wired viaSetLimitSyncer) callsmetering.Repository.SyncOrgLimitsafter every create / update / override / revoke. Aggregates caps across active subscriptions (SUM, NULL = unlimited propagates) and writes through tousage_quotas.limit_unitsfor every registered(limit_code → capability/period)entry. - [x] Telemetry forwarding hook:
metering.TelemetryEmitterinterface +Repository.SetTelemetryEmitter. Foundation 1C.7 leaves the emitter unset; the hook fires immediately after a successfulusage_recordsinsert when 1A's telemetry sink (or whichever later phase ships first) wires it. - [x] Acceptance suite at
internal/test/rlstest/metering_test.go: atomic Reserve gate (cap=3 → fourth Reserve hitscapabilities.ErrQuotaExceeded); Refund-decrements-counter;AdvanceExpiredQuotaszeroes a stale row and bumps the window forward;RollupClosedPeriodaggregates seededusage_recordsinto oneusage_summariesrow and is idempotent on a second run;SyncOrgLimitswrites throughsubscription_limits.cap_value=250tousage_quotas.limit_units=250; AppPool tx is rejected withpermission deniedon INSERT to all three tables.
Acceptance:
- [x] Three-table model locked:
usage_records(event log) +usage_quotas(live counter) +usage_summaries(closed-period historical). - [x] Retention locked: 12-month hot for records; older partitions archive to S3.
- [x] Atomic-increment-with-refund enforcement model locked.
- [x] Calendar UTC boundaries with
day/week/monthgranularities locked. - [x] Day-1 metered capability locked: email at foundation; storage/AI/video deferred to consumers.
- [x] Aggregation cadence locked: end-of-period cron only.
- [x] Quota source locked: reads from
organization_subscription_limits(quota family from glossary's two-family entitlement structure). - [x] Live + historical clinic visibility model locked.
- [x] Telemetry forwarding locked.
- [x] Schema + RLS + permission seed shipped.
- [x]
internal/core/metering/package + reset cron + rollup cron shipped. - [x]
notify.emailwired through metering middleware. - [x] Telemetry pipe wired (hook only —
TelemetryEmitterinterface; concrete sink lands when 1A's telemetry pipeline ships). - [x] Acceptance test covering quota gating + refund + period reset + summary rollup.
1C.8 AI Capability Hooks
AI is Cat A in shape (curated provider, switchable, platform credentials by default) PLUS extra obs: provenance audit, model registry with pricing history, per-call cost capture, streaming support. Lays the foundation hooks; first AI feature consumer wires the actual LLM / embedding / transcription / vision / classification providers.
Status: shipped 2026-05-07. Schema landed in migration 000020 (ai_models + ai_model_pricing_history + FK from audit_ai_provenance.model_id); audit_log_insert extended with OUT params (audit_log_id, audit_log_created_at) so audit.RecordWithProvenance writes both rows in the same transaction. Five AI capability skeleton packages under internal/core/ai/ (LLM streaming + tools, embeddings / transcription / vision / classification simpler shapes) with Fake test doubles. capabilities.WrapMeteredAI + meterDeferred middleware + Reservation / SettleResult / SettleEntry types implement the variable-cost flow described in P53; metering.Repository.BeginReservation returns the production handle, RecordWithCost writes per-direction usage_records with cost_cents snapshots. internal/core/domain/aimodels/ Console superadmin endpoints under /v1/admin/ai-models (list / get / create-with-initial-pricing / patch / price-change), gated on PlatformPermAIModelsManage = "ai_models.manage" (superadmin-only by default). cmd/check-ai-models CI guard wired into make check; data classification entries cover the new tables. Acceptance suite at internal/test/rlstest/ai_provenance_test.go covers same-tx audit+provenance write, deferred reservation + per-direction settle, Cancel-refunds-full + idempotency, pricing lookup at-time + price change. UI deferred to 1D per the unified UI pass rule. OpenAPI for the admin endpoints deferred to 1D alongside 1C.2's platform-service-providers admin endpoints (same pattern: admin OpenAPI lands when the Console UI does).
Why foundation work before any AI feature ships. AI is stated platform direction (CLAUDE.md: "this platform is built around AI agents as first-class actors"; apps/docs/product/ai-agents.md). Audit provenance + model registry + cost capture + streaming are cross-cutting — every AI-using feature needs them. Retrofitting after F-tier AI features land is exactly the cross-cutting cost foundation discipline prevents.
Locked decisions:
- One interface per AI task. Separate Go packages:
internal/core/ai/llm/,internal/core/ai/embeddings/,internal/core/ai/transcription/,internal/core/ai/vision/,internal/core/ai/classification/. Each has its own interface with task-specific methods. Reasons: tasks have genuinely different shapes (LLM has tools/streaming; embeddings doesn't; transcription has audio I/O; vision has images); different providers specialize per task (Anthropic LLM, Voyage embeddings, Deepgram transcription, Google Vision OCR); validation status is per-(model, task); type safety catches "trying to embed a string with the LLM provider." False unification rejected. Matches the Cat A capability pattern (email.Channel,video.Provider,pdf.Rendererare each their own package). - Model registry with pricing history. Two tables — current state on
ai_models, historical pricing changes inai_model_pricing_history. Pricing changes for AI providers ARE inevitable (Anthropic and OpenAI both adjusted prices multiple times in 2024-2025); historical pricing is necessary for accurate billing reconstruction (closed-period invoice generation, customer disputes, cost-calculation bug recovery). Worth shipping at foundation; small schema cost vs. retrofit later. - AI credentials in
platform_service_providers(Cat A resolver from 1C.2). Same pattern as email / storage / video / payments. AI capabilities (ai_text_generation,ai_embedding,ai_transcription,ai_vision,ai_classification) seed platform-default rows inplatform_service_providersAS THEIR FIRST CONSUMER SHIPS. Foundation 1C.8 doesn't seed any rows — first AI feature does. Per-tenant brand-isolation overrides work the same way as any other Cat A capability. - Provenance wiring via metering middleware extension. The metering wrapper from 1C.7 (
WrapMeteredProvider) is extended with optional provenance config. WhenWithProvenance(...)option is passed, the wrapper writes anaudit_ai_provenancerow in the same tx as the audit row, with(audit_log_id, model_id, inputs_hash, confidence).inputs_hashis SHA-256 of canonicalized prompt — lets compliance auditors verify "was this prompt the one we ran?" without storing the actual prompt content (PII for clinical features). Confidence is provider-supplied where available (Anthropic Claude returns it; some providers don't — NULL is allowed). - Streaming support in the LLM interface from Day 1.
LLM.Generate(ctx, ...) (Stream, error)returns a stream interface withNext() (Token, bool, error)+Close() (Usage, error). Caller iterates tokens;Close()returns final usage info (input/output token counts) which the metering layer writes as oneusage_recordat stream completion. Without streaming-aware design from Day 1, every AI feature has to bolt streaming on after the fact — cross-cutting retrofit cost. ~30 lines of interface design. - BYO-LLM via Cat B deferred until first clinic asks. Foundation accommodates:
organization_integrationsschema already takes AI provider tokens via the same shape as other Cat B connections (auth_type='api_key', credentials_encrypted={api_key}, config={model_preferences}). When BYO-LLM lands, it composes cleanly with the Cat A resolver — clinic's per-org override row inplatform_service_providersreferences the Cat B credentials by relation, OR the resolver falls through to the Cat B side. Mechanics decided when the first BYO consumer surfaces. - AI agent identity in audit. When an AI feature call runs on behalf of an AI agent (per principals model from 1B.1), the call's principal context is the agent (
actor_type='agent'); audit_log row attributes correctly via existing infrastructure. The agent'sparent_principal_id(delegation column reserved in 1B.1, currently semantically undefined) carries the human who delegated to the agent. No new plumbing at 1C.8 — principals model already supports this. - Agent provisioning shape deferred to first AI feature.
agentstable schema exists (1B.1) but no flow creates agent rows today. When the first AI feature ships, it decides: (a) lazy creation — create an agent row inline on first AI call delegated by a human; (b) eager creation — create the agent row when the human enables an AI feature; or (c) per-task agents — one agent per (human, AI feature) combo. Foundation 1C.8 ships skeleton interfaces + provenance hooks that work with ANY of these provisioning shapes; the choice falls to the first consumer. The audit + metering chain attributes to whatever principal context the call carries, so the framework is provisioning-agnostic.
Schema (locked):
- [x]
ai_models—(id UUID PK, model_provider TEXT NOT NULL, model_name TEXT NOT NULL, model_version TEXT NOT NULL, capability TEXT NOT NULL CHECK (capability IN ('text_generation', 'embedding', 'transcription', 'vision', 'classification')), unit_type TEXT NOT NULL, validation_status TEXT NOT NULL CHECK (validation_status IN ('experimental', 'validated', 'deprecated', 'retired')), validation_notes TEXT, status TEXT NOT NULL DEFAULT 'active' CHECK (status IN ('active', 'deprecated', 'retired')), introduced_at TIMESTAMPTZ NOT NULL, retired_at TIMESTAMPTZ NULL, created_at, updated_at). UNIQUE(model_provider, model_name, model_version). Foundation seeds empty. RLS: SELECT for everyone (registry is public-by-design — surfaced in patient-facing AI transparency UIs); mutations via AdminPool only (superadmin actions, audited). - [x]
ai_model_pricing_history—(id UUID PK, model_id UUID NOT NULL FK → ai_models(id), cost_per_input_unit_cents NUMERIC NOT NULL, cost_per_output_unit_cents NUMERIC NOT NULL, effective_from TIMESTAMPTZ NOT NULL, effective_to TIMESTAMPTZ NULL, changed_by_principal_id UUID NULL FK → principals(id), notes TEXT, created_at). Partial unique(model_id) WHERE effective_to IS NULL— at most one current pricing row per model. Historical pricing for date X:WHERE effective_from <= X AND (effective_to IS NULL OR effective_to > X). RLS: SELECT for AdminPool only (pricing detail is platform-confidential); mutations via AdminPool with audit.usage_records.cost_centssnapshots pricing AT CALL TIME so historical reconstruction stays accurate even if pricing rows are amended later. - [x] Permission seeded inline in 1C.8 migration:
ai_models.managegranted to superadmin only via the platform-permissions Go layer (no per-org RBAC row). Constants:principal.PermAIModelsManage. - [x]
audit_ai_provenancealready shipped in 1A.5 / 1A.15. 1C.8 verifies the FK target (model_idreferences newai_models.id); migration adds the FK constraint at this point. - [x] Data classification entries:
ai_modelscolumns mostlyorg_internalwithsupport_exportegress (provider/model names are not patient-PII);validation_notesmay carry medical-device-readiness context (org_internal+support_export).ai_model_pricing_historyrows areorg_internalonly (pricing is platform-confidential).audit_ai_provenance.inputs_hashisorg_internal(hash, not raw prompt).
Implementation order inside 1C.8:
- [x] Migration creating
ai_models+ai_model_pricing_historywith RLS + permissions + data classification. - [x]
internal/core/ai/llm/package withLLMinterface (streaming-first),Streamtype,Usagetype. Skeleton — no provider impls. - [x]
internal/core/ai/embeddings/package withEmbeddingsinterface. Skeleton. - [x]
internal/core/ai/transcription/package withTranscriptioninterface. Skeleton. - [x]
internal/core/ai/vision/package withVisioninterface. Skeleton. - [x]
internal/core/ai/classification/package withClassificationinterface. Skeleton. - [x]
Fake{Capability}per package — test-double convention from 1C.1. - [x] AI metering primitive shipped —
capabilities.WrapMeteredAI+meterDeferredmiddleware +MeterStore.BeginReservation/Reservation/SettleResult/SettleEntrytypes +metering.Repository.RecordWithCostfor per-directioncost_centssnapshots. Provenance lands viaaudit.RecordWithProvenance(extension onaudit.Recorder); the originalWrapMeteredProvider.WithProvenanceframing was superseded by this cleaner separation — see the acceptance note below. - [x] Console superadmin endpoints:
GET /v1/admin/ai-models,POST /v1/admin/ai-models(creates model + initial pricing-history row in one tx),PATCH /v1/admin/ai-models/{id}(updates non-pricing fields),POST /v1/admin/ai-models/{id}/price-change(closes current pricing-history row, inserts new). Console UI deferred to 1D per the unified UI pass rule. - [x] CI guard
cmd/check-ai-modelswalks every AI capability call site, asserts the provider impl references amodel_idthat exists in the registry. Foundation discipline — no AI call without a registered model. - [x] Documentation:
apps/docs/reference/ai-models.md(new) describing how to add a model row, validation status meanings, pricing-change procedure. Cross-references the SOUP entries from 1A.13 (every AI/ML model is a SOUP entry too —model_provider/model_version/validation_statusfields per CLAUDE.md medical-device-readiness rule). - [x] Acceptance test at
internal/test/rlstest/ai_provenance_test.go: same-tx audit+provenance write, deferred reservation + per-direction settle, Cancel-refunds-full + idempotency, pricing lookup at-time + price change.
Acceptance:
- [x] One interface per AI task locked:
ai/llm,ai/embeddings,ai/transcription,ai/vision,ai/classification. - [x] Model registry shape locked:
ai_models+ai_model_pricing_history(history shipped at foundation, not deferred). - [x] AI credentials in
platform_service_providers(Cat A) locked. - [x] Provenance wiring via
WrapMeteredProvider.WithProvenanceextension locked. - [x] Streaming support in LLM interface from Day 1 locked.
- [x] BYO-LLM via Cat B deferred until first clinic asks.
- [x] AI agent identity in audit: existing principals + parent_principal_id model is sufficient; no new plumbing.
- [x] Schema + RLS + permission seeds shipped (migration 000020 +
PlatformPermAIModelsManageconstant + classification entries forai_models/ai_model_pricing_history/ renamedaudit_ai_provenance.model_id). - [x] Five AI capability interfaces shipped (skeletons +
Fakedoubles ininternal/core/ai/{llm,embeddings,transcription,vision,classification}; no provider impls at foundation). - [x] AI metering primitive shipped:
capabilities.WrapMeteredAI+meterDeferredmiddleware,MeterStore.BeginReservation+Reservation/SettleResult/SettleEntrytypes,metering.Repository.RecordWithCostfor per-directioncost_centssnapshots. Provenance lands viaaudit.RecordWithProvenance(extension onaudit.Recorder) — wrap-layer agnostic to provenance shape, AuditFunc closure decides per-capability. (The original "WrapMeteredProvider.WithProvenanceextension" framing in the design notes was superseded by this cleaner separation: metering gets a generic post-call reconciliation primitive that works for streaming-late-output AND non-streaming-late-usage; provenance gets its own audit-recorder API; capability impls compose the two as needed.) - [x] Console superadmin endpoints for model registry management shipped at
/v1/admin/ai-models(UI deferred to 1D; OpenAPI also deferred to match 1C.2platform-service-providerspattern). - [x] CI guard + documentation + acceptance test (
cmd/check-ai-modelswired intomake check+apps/docs/reference/ai-models.md+internal/test/rlstest/ai_provenance_test.gocovering same-tx audit+provenance, split-direction settle, Cancel-refund + idempotency, pricing lookup at-time + price change).
1C.9 Entitlements Rename
Mechanical rename of
features/plan_features/organization_capabilities→entitlements/plan_entitlements/organization_entitlements(plus the snapshot familyorganization_subscription_entitlements,patient_tier_entitlements,patient_subscription_entitlements, theentitlement_code/entitlement_enabled/entitlement_columncolumns, and thecurrent_app_has_org_entitlementSQL helper). Resolves the architectural-vs-billing-vocabulary collision settled in glossary.md.
Status: shipped 2026-05-06. Existing functionality unchanged. See glossary.md → Entitlement and Forbidden terms for the rationale — architectural "Capability" means an internal Go interface, architectural "Feature" means user-facing functionality; the DB tables collided with both and were renamed in this sub-phase before any new 1C code references them.
What changed:
- [x] DB migrations (early-dev, edit-in-place per CLAUDE.md): renamed
features→entitlements,plan_features→plan_entitlements,organization_capabilities→organization_entitlements, plus the snapshot tables (organization_subscription_features→organization_subscription_entitlements,patient_tier_features→patient_tier_entitlements,patient_subscription_features→patient_subscription_entitlements). Updated FK names, indexes, triggers, RLS policies, thecreate_organization_companion_rowstrigger function, and audit log entity_type strings. - [x] Go domain code:
internal/core/domain/orgcapabilities/→internal/core/domain/orgentitlements/(full directory rename). Type renames (Capabilities→Entitlements,Feature→Entitlement,PlanFeature→PlanEntitlement,SubscriptionFeature→SubscriptionEntitlement,OrganizationCapability→OrganizationEntitlement). Repository / service / handler method renames (ListFeatures→ListEntitlements,ListPlanFeatures→ListPlanEntitlements, etc.). Override-kind enum value'feature'→'entitlement'. Audit context constantContextCapabilityChange→ContextOrgEntitlementChange(string'capability_change'→'org_entitlement_change'). - [x] Go middleware + principal:
RequireFeature→RequirePlanEntitlement,RequireCapability→RequireOrgEntitlement.Subject.HasFeature/Subject.HasCapability→Subject.HasPlanEntitlement/Subject.HasOrgEntitlement.Subject.Features/Subject.Capabilitiesfields →Subject.PlanEntitlements/Subject.OrgEntitlements. SQL helpercurrent_app_has_capability(cap_code)→current_app_has_org_entitlement(entitlement_code). Service projection methodsubscriptions.Service.RecomputeCapabilities→RecomputeOrgEntitlements. - [x] Wire-protocol error codes:
feature_unavailable→plan_entitlement_unavailable;capability_disabled→org_entitlement_disabled;missing_feature/missing_capability→missing_entitlement. - [x] OpenAPI spec: schema names (
OrganizationCapabilities→OrganizationEntitlements,Feature→Entitlement,PlanFeature→PlanEntitlement,SubscriptionFeature→SubscriptionEntitlement), endpoint paths (/v1/features→/v1/entitlements,/v1/organizations/{id}/capabilities→/v1/organizations/{id}/entitlements), operationIds, and prose updated. - [x] API client (
packages/api-client/): generated types regenerated viapnpm openapi; Go DTOs regenerated viamake openapi. - [x] UI labels (Console superadmin only — clinic / portal carry no entitlement-context strings yet): "Capabilities" card title → "Entitlements"; activity-feed mock action strings, audit-log filter enum, palette stat card label all updated.
- [x] Documentation: data-model.md, plans-and-subscriptions.md, org-settings.md, patterns.md, middleware-composition.md, decisions.md, dependency-map.md, data-classification.md, error-envelope.md, gdpr-compliance.md, implementation-plan top-level + foundation.md cross-references all updated. Glossary's Forbidden Terms table marks the three renames as completed 2026-05-06.
- [ ] Romanian translations: no entitlement-context strings exist in clinic / portal i18n bundles yet; no work needed in this sub-phase. Will land alongside whichever Layer 2 feature first surfaces an entitlement label to clinic / patient users.
Resolved design questions:
- User-facing label in Clinic admin UI — settled: friendly labels for clinic-facing surfaces ("Plan benefits" / "What's included") when those surfaces ship; strict "Entitlements" for Console superadmin (already in place).
- Naming for the two shapes of entitlement (boolean gates vs. quota limits) — kept the existing split:
entitlementscatalog (boolean gates) +limit_definitionscatalog (quotas). 1C.9 only renamed the boolean side.
Acceptance:
- [x] All six table renames land in migrations (3 originally scoped + 3 snapshot tables for consistency).
- [x] All Go code migrated;
grep -rnconfirms zerofeatures\b|plan_features|organization_capabilit|feature_code|capability_column|cap_code|current_app_has_capability|RequireFeature|RequireCapability|HasFeature|HasCapability|orgcapabilities|capability_changematches in entitlement contexts underservices/api/. - [x] OpenAPI + API client regenerated; Go build passes;
pnpm typecheckpasses. - [x] UI labels updated (Console). en + ro for clinic / portal not applicable yet — no entitlement-context strings exist in those bundles.
- [x] Glossary's Forbidden Terms table reflects the rename completion.
- [ ] CI guard: a check that the forbidden words don't appear in new code in entitlement contexts. Deferred — defaulting to manual review at PR time until a forbidden-words sweeper script lands alongside the broader CI guard buildout in 1C.
Clinical "services" rename — deferred
The clinical-domain rename (services → offerings, service_plans → enrollments) is locked in the glossary as canonical taxonomy but the actual file/code rename is deferred until that area is built. No preemptive sweep. See glossary.md → Offering and the Forbidden terms deferral note.
1D. Admin Surfaces
Three apps with end-to-end admin functionality so each audience can run their own house.
Status (2026-05-07): Inventory audit produced at apps/docs/implementation-plan/1d-ui-inventory.md — 124 mounted routes, 24 domain handlers, 73 in-scope UI surfaces post-decision. OpenAPI catch-up + api-client regen shipped at commit 58dc1c4 (1D-prep — 71 missing operations added, D-9 naming alignment, GET /v1/me/clinics for D-8). All 12 decisions (D-1 through D-12) settled in the inventory doc; outcomes baked into the subsection scopes below. Open: the 1D.0 prerequisite gap-fillers + the 1D.4 primitives PR must close before per-app surfaces start.
1D.0 Prerequisite Backend Gap-Fillers
Small endpoints surfaced by the 2026-05-07 inventory that block specific per-app surfaces. Bundled here so 1D.1 / 1D.2 / 1D.3 / 1D.5 can ship without each one carrying its own backend slice. Foundation-tier endpoints, all small (read-only or single-row mutation), all RBAC + RLS + audit per the standard rules.
Open:
- [x]
GET /v1/admin/permissions— read-only catalog overpermissions, ordered by code (COLLATE "C"for stable ASCII order). Superadmin-gated at route layer. Drops "migration introduced" column from the original surface — not load-bearing for C5, and thepermissionstable doesn't carry it. Add later if Console needs it. - [x]
GET /v1/admin/role-templates— read-only viewer over system role templates (organization_id IS NULL AND is_system = TRUE) with permission grants resolved per-row. Foundation seeds 3 templates (admin / specialist / customer_support). Superadmin-gated. Propagation editor remains deferred per D-2. - [x]
POST /v1/admin/platform-memberships+GET /v1/admin/platform-memberships+DELETE /v1/admin/platform-memberships/{principalId}— grant / list / revoke-all. Role enum app-layer-validated against{superadmin, support_engineer}. Every grant + revoke audit-logged with the newaudit.ContextPlatformMembershipChangeconstant. Self-revoke guard NOT shipped — caller can revoke themselves; recovery is direct SQL per the table's migration comment. Add a guard in Console UI if needed. - [x] Decision settled — aggregator shipped:
GET /v1/admin/subscriptions[?org_id=&status=&tier_id=&page=&limit=]— paginated cross-org subscriptions aggregator with org_name + org_slug + active_overrides_count enrichment per row. Sameapiquerypagination as other list endpoints (default 50, hard cap 500). Iterate-orgs workaround was dead-on-arrival at production scale (5k+ active subscriptions migrating in on launch day per CLAUDE.md → Production Scale). - [ ] Decision settled — endpoint deferred: Per-org aggregate-stats endpoint (
GET /v1/organizations/{id}/stats). Storage tracker doesn't exist; MRR shape isn't settled; no production-blocking gap. Console C8 cards ship mock at 1D close and the endpoint lands later (1E observability or production-launch-readiness) when storage tracking and MRR semantics are settled. - [ ] Deferred per D-2: System role template editor (mutation endpoints + propagation handler — grants propagate to all org clones; revocations do not). Cross-tenant propagation semantics need design before this ships. C6 stays read-only viewer in 1D.
- [ ] Deferred (gap #7): Custom roles editor for clinics (
POST/PATCH/DELETE /v1/organizations/{id}/roles+role_permissionsmutations). Clinic L7 ships as system-roles-only in 1D; custom roles per org wait until cross-tenant propagation and per-org permission-catalog UX are designed.
1D.1 Console UI (platform operator)
Console manages the platform: orgs, users, plans, overrides, entitlement flags, platform-wide audit. Layer-1/2 of the four-layer authorisation model lives here; layer-3/4 lives in the Clinic admin UI.
Status: partially shipped. Foundation pieces below are open; the cross-tenant audit log viewer + DataTable foundation already shipped. Inventory rows referenced as Cn map to 1d-ui-inventory.md.
Already shipped:
- [x] Organisations CRUD with profile + atomic owner provisioning (C7) — owner provisioned synchronously through
auth.PrincipalProvisioner(Clerk createUser), magic-link welcome email viaOwnerWelcomenotify category. New staff onboard via the staff-invitations primitive (item below), not provisioning. See the ADR "Why owner uses provisioning, staff and patients use invitation" in decisions.md. - [x] Org detail: profile edit, members section, custom domains section (C9, C10, C14) — members section currently lists confirmed members; staff-invite affordance now writes through the staff-invitations primitive (Clerk Invitations API + bind-on-first-auth) per the ADR above; pending-invitations surface (list / revoke / resend) follows below.
- [x] Cross-tenant audit log viewer at
/audit-logswith server-side pagination + sort + filters + detail Sheet (C28; see 1D.4)
Open (always-on / aggregate scope):
- [ ] Console staff-invitation surface (1B.12 — backend shipped, dialog landed): add a "Pending invitations" sub-section to the org detail members card: list pending/accepted/revoked (calls
GET /v1/organizations/{id}/staff-invitations?status=), inline revoke + resend per row (POST .../invitations/{inviteId}/revoke|resend). The "Add staff" dialog already writes throughPOST /v1/organizations/{id}/staff-invitations; this grows the visibility side. Mirrors the Clinic-app surface defined in 1D.2. - [ ] Users page (C1): list staff humans across the platform — search by email/name, view memberships across orgs (uses
organization_memberships.last_used_atfrom 1A.11), block/unblock, view per-user audit trail. Listing patient principals requires elevation (see break-glass below). - [ ] Platform memberships management (C4): grant/revoke superadmin + future
support_engineer. Consumes the shipped 1D.0 endpoints:GET /v1/admin/platform-memberships[?role=&principal_id=],POST /v1/admin/platform-memberships,DELETE /v1/admin/platform-memberships/{principalId}. Every grant + revoke audit-logged withaction_context = 'platform_membership_change'(the backend writes this — UI does not need to set it). Reminder: OpenAPI +packages/api-clienttyped wrappers bundle with this UI per project_ui_deferred_until_foundation — add entries toopenapi.yamlwhen wiring the page. - [ ] Permission catalog viewer (C5, read-only): every registered permission with
(code, resource, action, description). Consumes the shipped 1D.0 endpoint:GET /v1/admin/permissions. Endpoint returns rows sorted by code withCOLLATE "C"for stable ASCII order (no client-side sort needed). "Migration introduced" column was dropped from the original surface — not present in the table or the response. OpenAPI + api-client bundle with this UI. - [ ] System role templates viewer (C6, read-only per D-2): list system role templates (
admin,specialist,customer_support) and the permissions they grant. Consumes the shipped 1D.0 endpoint:GET /v1/admin/role-templates(returns each template with itspermissions: [{code, resource, action}]array resolved server-side). Editor (mutation endpoints + propagation handler) deferred — see 1D.0 deferred items. OpenAPI + api-client bundle with this UI. - [ ] Plans / entitlements / limits catalog viewers (C18, read-only): every plan version, every entitlement code (regulated highlighted), every limit definition.
- [x] Platform-scope consent purpose editor (C32, from 1B.9): list every
scope='platform'purpose paired with its current platform-defaultconsent_purpose_versionsrow + per-locale body editor for publishing a new version (POST /v1/admin/platform-consent-purpose-versions). Inserts atMAX(version)+1for(purpose_code, organization_id IS NULL); triggers re-consent across the entire platform via 1B.9'scurrent_required_consent_versionshelper. Confirmation modal explains the cross-tenant blast radius before publish. Audit-logged. - [ ] Consent purpose catalog viewer (read-only, from 1B.9): every
consent_purposesrow with itsscope,legal_basis,withdrawable. Org-scope purposes' platform-default fallback bodies are visible through the platform editor above. Edits toconsent_purposesitself remain migration-only. - [x] Privacy notice template management (C31, from 1B.10): list
legal_document_templates; create new version; publish writes one row per locale atMAX(version)+1. Audit-logged. - [ ] Org subscription management (C12, C16): list orgs with current base plan + active add-ons + active overrides; change base plan; attach / cancel add-ons; grant usage packs. Cross-org aggregator (C16) consumes the shipped 1D.0 endpoint:
GET /v1/admin/subscriptions[?org_id=&status=&tier_id=&page=&limit=]. Each row carriesorganization_name+organization_slug+active_overrides_count(server-side enrichment — no per-row lookup). Pagination via the standardapiqueryenvelope. Per-org mutations (change plan / attach add-on / cancel) hit existing per-org subscription routes. OpenAPI + api-client bundle with this UI. - [ ] Sales overrides (C12, C17): grant a per-subscription override with required reason and optional expiry; list active overrides; revoke. Cross-org list (C17) reads from the same aggregator (
GET /v1/admin/subscriptions—active_overrides_countper row points at orgs with active overrides; per-row detail comes from existing per-org override endpoints). Grant / revoke endpoints already exist atPOST /v1/organizations/{id}/subscriptions/{subId}/overrides+POST .../overrides/{ovId}/revoke. OpenAPI + api-client bundle with this UI. - [ ] Org entitlement flags (C12): per-org page showing all
organization_entitlementsflags; toggle (audit-logged withaction_context = 'org_entitlement_change'). - [ ] Org billing editor (C12): write access for billing email, address, encrypted tax ID, payment_provider config.
- [ ] Console superadmin's own preferences page (C35, "Settings"): consumes
GET /v1/me+PATCH /v1/me. Per D-6, this is NOT a place to editorganization_settings.feature_flags— that's engineering-internal and not a UI surface anywhere. - [ ] Clinic overview cards (C8): patient counter, MRR, storage-used per org. Aggregate-stats endpoint deferred at 1D.0 (storage tracker doesn't exist; MRR shape unsettled; no production-blocking gap). Cards ship mock at 1D close; the endpoint lands later in 1E observability or production-launch-readiness when storage tracking + MRR semantics are settled.
- [ ] Audit log metadata viewer cross-tenant (C13, C28 already built — extends per-clinic slice): timestamps, actions, status codes; diff content masked unless break-glass-elevated.
- [ ] Platform service providers (C26 — confirmed in 1D.1 scope per D-5): Console superadmin CRUD over
platform_service_providers(1C.2). List / create / get / update / delete platform-default + per-org-override providers (email/ses, storage/aws_s3, auth/clerk). Sidebar entry/platform-providersadded at 1D close. Load-bearing for dedicated-tier rollout (Cat A per-org overrides). Gated byproviders.manageor superadmin. - [ ] AI models registry (C27 — confirmed in 1D.1 scope per D-5): Console superadmin CRUD over the 1C.8 AI model registry. List / create / get / update / pricing. Sidebar entry
/ai-modelsadded at 1D close. Load-bearing for AI cost configuration. Gated byai_models.manageor superadmin. - [ ] Break-glass sessions list (C29): list active + recent sessions; close session; view session detail. Foundational — required before C11's elevation modal can wire real flow. Sidebar entry
/break-glass.
Removed from Console scope (decisions 2026-05-07):
Patient tiers cross-tenant (C19)— D-1, stays clinic-side only.Notification templates editor (C21)— D-3, foundation templates stay migration-managed; F-tier features add per-template editing if needed.Webhooks cross-tenant (C24)— D-1, Cat C subscriptions stay clinic-side.Connectors cross-tenant (C25)— D-1, Cat B integrations stay clinic-side.Feature flags page (C33)— D-6,organization_settings.feature_flagsJSONB is engineering-internal, not a UI surface.Locales catalog (C34)— D-7, i18n config is system-only (next-intl messages).- System health (C30) — D-11, moved to 1E (consumes staging KMS / S3 / RDS / SES that exist after 1E.3). Page stays mock at 1D close.
- F-tier sidebar stubs (
/specialties,/services,/exercises,/forms,/announcements,/sales,/marketing,/compliance,/onboarding) — D-12, removed from sidebar at 1D close. They reappear when their backends ship.
Open (break-glass / elevated scope, gated by RequireBreakGlass):
- [ ] Patient list per org (
break_glass:patient_list). - [ ] Patient detail per org — profile, subscriptions, consent trail (
break_glass:patient_detail). - [ ] Audit log full content cross-tenant — diffs, IPs, request bodies (
break_glass:audit_full). - [ ] Cross-org patient lookup — narrow surface for DSAR routing of orphaned ex-patients (
break_glass:cross_org_lookup). - [ ] Active break-glass sessions list — show all currently-open sessions across the platform; close / extend / audit (gated by
break_glass.manage). - [ ] Elevation modal — common UI pattern that wraps any restricted route. Captures
reason_category,reason_text,reason_ref,expires_in_minutes. Posts toPOST /v1/break-glass/sessions.
1D.2 Clinic Admin UI (org self-service)
Clinic admin manages their own org without depending on a superadmin.
Status: parked behind the clinic-app refresh. The refresh comes first; every item below lights up against backends that are already stable in master. Inventory rows referenced as
Lnmap to1d-ui-inventory.md. One backend gap remains (custom roles editor — see 1D.0 deferred items).
Open:
- [ ] Org profile edit page (L1, gated by
organizations.update). - [ ] Members section (L6): list with staff/patient split, inline role-change dropdown, remove (gated by
organizations.manage_members). - [ ] Personal invitations surface (1B.12 — backend shipped):
- [ ] Staff-invite tab under Members: list pending/accepted/revoked (calls
GET /v1/organizations/{id}/staff-invitations?status=), invite-by-email form (POST /v1/organizations/{id}/staff-invitationsbody{email, role_code, expires_in_days?}, gated byorganizations.manage_members), inline revoke + resend actions per row (POST .../invitations/{inviteId}/revoke|resend). - [ ] Patient-invite list page under Patients: same shape but gated by
patients.manage, body uses{email, patient_tier_id?, expires_in_days?}, endpoint pair is.../patient-invitations. - [ ] Both surfaces should show "pending" / "accepted" / "consumed" / "revoked" / "expired" filter chips driven by the
statusquery param.
- [ ] Staff-invite tab under Members: list pending/accepted/revoked (calls
- [ ] Patient share-links surface (1B.12 — backend shipped): mint form with optional tier picker +
max_uses+expires_at+note(POST /v1/organizations/{id}/share-links, gated byorganizations.manage_share_links); list (GET .../share-links) with copy-code button + QR-code rendering (the public landing URL ishttps://{slug}.portal.restartix.pro/join/{code}); revoke per row (POST .../share-links/{id}/revoke); audit trail filterable to share-link redemptions in the per-org audit view. - [ ]
/welcomelanding page (1B.12 — backend shipped): the redirect target after auth-provider sign-up for staff who accepted an invite. The auth middleware'sOnAuthHookhas already created the membership row by the time the page loads (works for both new-user and existing-user flows); the page just shows "Welcome to Acme Clinic — you're now a Specialist" with links to relevant org surfaces. - [ ] Custom domains section (L3): list, add, verify, remove (gated by
organizations.manage_domains). - [ ] Roles section (L7, system-roles-only at 1D close): list cloned system roles + their permissions (read-only); permissions catalog rendered as grouped checklist. Custom-role CRUD deferred per gap #7 (1D.0 deferred items) — no
rolesmutation surface exists in any handler. Addsorganizations.manage_rolespermission seeding now so the UI can be wired against the future mutations without a permission migration later. - [ ] Per-org audit log viewer (L21, read-only, filterable) — consumes 1A.1 writes; surfaces break-glass reads against the org with
action_context='break_glass'filter. - [x] Legal documents editor (from 1B.10) — handles BOTH
org_termsandorg_privacy_noticethrough one editor surface. List page at/legal-documents; per-document editor at/legal-documents/[type]with structured form (one input perrequired_placeholderskey, one checkbox pertoggleable_sections[].key, defaults from template). Save Draft / Publish-with-confirmation; publish triggers re-consent for every existing patient. Dashboard task card surfaces unpublished documents to admins. Gated byorganizations.manage_privacy_notice. - [ ] Settings page (L2, from 1B.2): marketing prefs, retention override, support locale, telerehab toggle (entitlement-mirrored, read-only) (gated by
organizations.update_settings). Per D-6,feature_flagsJSONB is engineering-internal and NOT exposed in the form — thePATCH .../settingsendpoint may accept it for engineering use, but the UI does not surface or edit it. - [ ] Billing page (L4, from 1B.2, read-only): current plan, period dates, billing contact, upcoming renewal — write access lives in Console.
- [ ] Locations section (L5, from 1B.5): list, create, update, close (status), delete (gated by
locations.manage). - [ ] Patient Tiers section (L17, from 1B.4): list, add, edit, archive tiers; default-tier toggle (gated by
patient_tiers.manage). - [ ] Patients list + detail (L11, L12): search, paginate, sort, archive; per-patient detail composing consents (L13), subscription (L14), impersonation history (L15, L16).
- [ ] Per-patient consents view (L13, from 1B.9): list a patient's consent history at this clinic, current state per purpose, withdrawal as staff-action when needed (gated by
consents.view_org/consents.manage). - [ ] Outbound webhook subscriptions (L18, Cat C — 1C.4): create / list / get / update / revoke / rotate-secret / list-deliveries / fire-test. Gated by
organizations.manage_webhooks+EnforceLimit(max_webhook_subscriptions). Sidebar entry/webhooks. - [ ] Connected accounts (L19, Cat B — 1C.5 framework-only): per-org create / list / get / update / delete / test against the
integration_servicescatalog. Gated byorganizations.manage_integrations. Likely stub-only at 1D close — first Cat B catalog row + Connector impl + OAuth callback handler ships with first F-tier consumer (per 1C.5). Sidebar entry/connectorsships with stub copy until then. - [ ] Break-glass session banner (L22): when a platform staff break-glass session is open against this org, show an in-app banner with who/when/scope/reason; recent (closed within 30d) sessions listed in a "Platform support access" section.
Removed from Clinic admin scope (decisions 2026-05-07):
- F-tier sidebar stubs (
/calendar,/treatment-plans,/exercises,/forms,/specialists,/services,/specialties,/segments,/custom-fields,/pdf-templates,/automations,/reports,/billing-invoicing,/video-calls) — D-12, removed at 1D close. They reappear when their backends ship. - [ ] Staff impersonation oversight (1B.13 — backend shipped): list of all
patient_impersonation_sessionsacross the clinic — filterable by staff member, patient, date range. ReadsGET /v1/organizations/{id}/patient-impersonation-sessions[?staff_principal_id=&patient_id=&only_active=&limit=&offset=](RLS-gated bypatients.manage). DataTable foundation (1D.4). Per-patient impersonation history shown alongside the per-patient consents view. Backend contract is stable — no Go changes needed when this lands.
1D.3 Patient Self-Service (Portal)
Portal must be non-empty for a logged-in patient with no medical features. Mirror of "manage my org" for the patient audience.
Open:
- [ ] Sign-up consent block (consumes 1B.9): clean checkbox UX for
platform_terms+platform_privacy_notice(required); plus the org-scope required purposes (org_termsif the clinic published one,org_privacy_notice); plus optionalmarketing_email/marketing_sms/analytics/ai_processing/profile_sharingtoggles. Submission writes theconsentsrows in the same transaction asPOST /v1/portal/onboard(1B.8). - [ ] Onboarding form ("needs onboarding" UX): when
is_patient_at_current_org=false, show form that posts toPOST /v1/portal/onboard(already includes the consent block). Redirect to dashboard on success. - [x] Patient-invite banner on
/onboard(1B.12): when the auth middleware'sOnAuthHookhas bound a patient invite for the current org, the page surfaces a "you've been invited to Acme Clinic" banner above the onboarding form. Discovery usesGET /v1/me/pending-invitations— admin-pool projection narrowly scoped to the calling principal's accepted-but-not-consumed invites. Works for both new-user (bind fires on first sign-in) and cross-clinic existing-user flows (bind fires on every authenticated request). - [ ]
/join/{code}share-link landing page (1B.12 — backend shipped): anonymous landing page that readscodefrom the URL, callsGET /v1/public/share-links/{code}(per-IP rate-limited, no auth) to render branded "Join {org_name} — {tier_name}" CTA. Click → Clerk sign-up flow. After sign-up, the portal stashesshare_link_codein a cookie (or URL state) and the/onboardpage submits it onPOST /v1/portal/onboard. 410 from public resolve renders "this link is no longer active"; 404 renders "link not found." Needs branding tokens from the org (logo, name) — fetch via the resolve response. - [x] Re-consent modal: blocking dialog when a
consent_purpose_versionsbump means the patient hasn't accepted the latest version. The portal(patient)layout probesGET /v1/me/required-consents(discovery endpoint mounted outside theRequireCurrentConsents412 gate); non-empty result rendersReconsentModalalongside the page. Modal is genuinely blocking — no escape, outside-click, or close button — only Accept dismisses. Accept callsacceptRequiredConsentsserver action which re-grants every missing purpose; consents service supersedes the v1 active grant withwithdrawal_reason='superseded_by_v{N}'(theorg_termscascade trigger correctly skips this path so re-acceptance does NOT trigger leave-clinic). - [ ] My profile page: view + edit
patient_profilesfields (encrypted phone via 1A.3); read-only fields surface for what the org-side admin owns. - [ ] My subscription page: view active
patient_subscriptions+ tier features/limits; status; period dates. - [ ] My consents page (consumes 1B.9 trail view): full per-org and platform-level history with current state per purpose; toggles for withdrawable purposes (
marketing_*,analytics,ai_processing,profile_sharing); "delete account to revoke" affordance for non-withdrawable platform purposes; "leave clinic" affordance for non-withdrawable org purposes (org_terms,org_privacy_notice— setspatients.deleted_atat that clinic). - [ ] My clinics page (P9 / overlaps with A4): consumes the shipped
GET /v1/me/clinicsendpoint (D-8, commit58dc1c4) — returns one row per clinic the patient is at with{org_id, name, slug, primary_contact, dpo_email, ...}for DSAR routing without crossing the processor boundary. Same handler as 1D.5's A4; rendered with Portal chrome here, with platform chrome at 1D.5. - [ ] Out of 1D scope: Data export request (GDPR Art. 15/20) — F11 backend, listed for completeness only. UI work waits for F11.
- [ ] Out of 1D scope: Account deletion request (full GDPR erasure across all orgs) — F11.1 backend, listed for completeness only. UI work waits for F11.1.
- [ ] Access history view (1B.13 — backend shipped): per-clinic list of staff impersonation sessions on this patient — who opened it, when, the reason text, duration. Reads
GET /v1/me/patient-impersonation-sessions[?organization_id=&only_active=&limit=&offset=](RLS self-read onpatient_impersonation_sessionscascades throughcurrent_human_patient_profile_ids()to span every clinic the patient is at; cross-org account surface 1D.5 consumes the unfiltered shape). Foundation-tier scope is session metadata only; per-action drill-down ("what entities were touched") is deferred to the futurepatient_account_activityprojection (see Deferred Foundation Extensions) — patients never get SELECT onaudit_logdirectly. - [x] Sign-out + locale selector + theme (P5).
Removed from Patient Portal scope (decisions 2026-05-07):
- F-tier sidebar stubs (
/appointments,/exercises,/treatment-plan,/forms, plus any others) — D-12, removed at 1D close. They reappear when their backends ship.
1D.4 Shared UI Patterns
Per D-4: this subsection ships FIRST as one
packages/uiPR before any per-app 1D.1 / 1D.2 / 1D.3 / 1D.5 surface starts. Every per-app consumer composes against the same versioned primitives. Stricter than parallel-prototype because prototypes hardened on one app's first surface diverge from the version a second app starts against.
Status: partially shipped. Open items at the bottom — all four un-built primitives ship in the same PR.
Already shipped:
- [x]
DataTable(TanStack-backed, server-driven sort + filter + pagination + Sheet detail),MultiSelectFilter,AsyncMultiSelectFilter,DateRangeFilter. - [x] App shell + brand theme (sidebar/inset, OKLCH brand tokens, Poppins, light/dark sidebar,
min-w-0boundary). - [x] Listing-page pattern (
fillmode, sticky toolbar, edge-to-edge tables). - [x] Branded 404 pages with i18n.
- [x] Re-consent modal (1B.10) — first instance of the persistent-blocking-banner pattern.
Open (one PR, ships before any per-app surface):
- [x] Empty / loading / error states standardised in
packages/ui/patterns/—EmptyState(icon + title + description + action;card/barevariants),LoadingState(Skeleton-based;table/card/pagevariants withrole="status"+aria-live),ErrorState(matchesEmptyStateshape so list pages swap one for the other on load result). - [x] Toast / notification system for action results —
sonner-backed.<Toaster />mounts once at root layout;toast.success() / toast.error()from@workspace/ui/components/sonner. Ephemeral by design — never the canonical surface for form errors (<FormError />) or persistent state (<PersistentBanner />). - [x] Server-validation rendering (422 with field errors → form-level error display) —
FormErrorStateshape{error?, field_errors?}extends the existinguseActionState{error}pattern.<FormError state />renders form-level;fieldError(state, name)returns the per-field message forFormField'serrorprop. No form-library lock-in. - [x] Permission-aware UI helper:
<RequirePermission code="..." />wrapper for routes, buttons, table actions +useHasPermissionhook +<PermissionsProvider>(Context, fed from/v1/me'scurrent_permissions+is_superadmin). Superadmins bypass every per-org gate. Authoritative server-side gate is unchanged — this is UX, not security. - [x] Generalize the persistent banner pattern beyond re-consent —
<PersistentBanner variant="info|warning|destructive|security" title description action? icon?>inpatterns/persistent-banner.tsx. Consumers: C11 / L22 break-glass active banner, future impersonation banners, future compliance-review banners. - [x] QR-code component in
packages/ui/components/—<QRCode value size? level?>thin wrapper overqrcode.reactSVG renderer. Consumed by L10 share-link mint UI. - [x] Edit locks primitive (backend + frontend; see Edit Locks below) — shipped 2026-05-10.
internal/core/locks/package (Store + Service + Handler +RequireLockHeldmiddleware +ResourceDefregistry), 4 HTTP endpoints under/v1/organizations/{id}/locks/{resource}/{resourceId},useEditLockhook +<EditLockBanner />in@workspace/ui, integration tests ininternal/core/locks/store_integration_test.go, P54 documented in patterns.md. Foundation ships the framework — the registry starts empty; F-tier consumers register their resource types in their owninit()functions. - [x] Documented as canonical patterns in
packages/ui/README.md— toast usage, form validation rendering, permission-gated UI, empty/loading/error states, persistent banners, edit locks, QR codes, plus the "adding a new primitive" workflow.
1D.4 Edit Locks (Design)
Pessimistic edit locks prevent two staff from concurrently editing the same record. WooCommerce-style: the first staff to open a detail page acquires a TTL'd Redis lock; subsequent openers see a read-only banner ("Maria is editing — since 14:32") and a "Take over" button. Mutations to lockable resources are guarded server-side: if the caller doesn't hold the lock, the write returns 409 Conflict.
versioncolumns on mutable tables are kept as defense-in-depth — locks are UX, version columns are correctness.Generalises the appointment-slot Redis hold pattern: same primitive (TTL'd Redis key, owner-bound, atomic acquire), broader scope (any lockable resource type).
Decisions settled (2026-05-08):
- Granularity: per-record. Each domain registers its lockable resource type with the lock middleware. Patient detail's sub-tabs (appointments / forms / treatment plan) lock independently — Maria can edit a consent while Andrei edits an appointment on the same patient.
- Lock identity: per-principal. Same staff member with two tabs open does NOT lock themselves out. Self-races between tabs are caught by the
versioncolumn on the underlying table, not the lock. - TTL: 120s lock / 45s heartbeat. Lower API chatter than WooCommerce's 150/15; recovers within ~2 min of tab close.
- Takeover: allowed, audited. Second user clicks "Take over" → Redis key is overwritten with the new holder, the original session gets booted on next heartbeat (which now fails with
lock_lost). Audit row:lock.takeoverwith(resource_type, resource_id, prior_holder, new_holder). - Read-only banner for non-holders — page loads but inputs are disabled with a banner showing holder + acquired-at + "Take over" button. Friendlier than a hard refusal.
- Audit scope: takeover + write-blocked only.
lock.takeoverandlock.write_blocked(guard rejection) are security-significant. Acquire / heartbeat / release are operational metadata, exempt per CLAUDE.md "operational-metadata bumps are exempt". - Defense-in-depth: keep
versioncolumns on every mutable table the lock protects. Lock prevents the common case; version catches Redis hiccups, expired-mid-save races, and self-races between tabs. - No bypass. Org admins don't get a force-release shortcut — takeover is the escape hatch and it's audited. Keeps the model simple.
Backend (internal/core/locks/) — shipped 2026-05-10:
- [x] Redis-backed lock store with atomic
Acquire(SET NX + holder-returning conflict shape),Heartbeat(Lua-script EXPIRE-only-if-still-mine),Release(Lua-script DEL-only-if-still-mine),Takeover(unconditional SET, returns prior holder),Get,HeldBy. Key shapelock:{org_id}:{resource_type}:{resource_id}; lock value JSON{principal_id, acquired_at}. Default TTL 120s. - [x] HTTP endpoints
POST/PATCH/DELETE/GET /v1/organizations/{id}/locks/{resource}/{resourceId}. Mounted under the per-org route group so P47 URL ≡ scope guard inherits. - [x] Mutate-guard middleware
locks.RequireLockHeld(svc, resourceType, paramName)— extracts URL param, checks Redis, returns 409 with{holder_principal_id, acquired_at}in error envelope context. Applied viar.With(...)chain on PATCH/DELETE routes after the permission gate. - [x] Resource-type registry:
locks.RegisterResource(ResourceDef{Type, Permission, Description})at init time. Foundation registry starts empty; F-tier consumers register their resource types in their owninit()functions. Handler validates URLresourcesegment against the registry; unknown types → 400. - [x] Audit on takeover:
LOCK_TAKEOVERaction verb (extending the open-endedaudit.Actionconst),EntityType: "edit_lock",Before/Aftercarrying both holders. Emitted via the standardaudit.Recordpath inside the request tx (the takeover IS a write that commits, so the audit row commits with it). - [x] Audit on write-blocked —
audit.RecordOutOfTx(ctx, event)shipped (internal/core/audit/recorder.go). Opens an isolated AdminPool tx so the audit row commits independently of the request tx (which rolls back on 409). The locks service'sCheckHeldwrite-blocked path now emits aLOCK_WRITE_BLOCKEDaudit row withentity_type = "edit_lock",entity_id = resource_id, Before/After carrying holder + caller principal. Slog stays as the operational debug signal; audit is the forensic source of truth.
Frontend (@workspace/ui/hooks/use-edit-lock) — shipped 2026-05-10:
- [x]
useEditLockhook — acquires on mount, heartbeats every 45s, releases on unmount, polls every 10s whileheld_by_other, best-effortnavigator.sendBeacononpagehide(consumer-supplied URL). Returns a typed status (acquiring | held_by_self | held_by_other | lost | error) the form switches on. Action-agnostic — accepts caller-suppliedacquire/heartbeat/release/getcallbacks so the hook stays in@workspace/ui(no Next.js coupling). - [x]
<EditLockBanner />component in@workspace/ui/components/edit-lock-banner— two variants: amberheld_by_other(with optional "Take over" button) + destructivelost(with "Try to acquire again" button). Caller supplies localised label strings (next-intl in scope at the consumer). - [ ] Form-level integration patterns — first F-tier consumer wires the hook + banner into its detail page (deferred until F1+).
- [x] Server-action 409 handling — the api-client throws
ApiErrorwitherror.status === 409anderror.code === "lock_held_by_other" | "lock_lost"; the consumer's server-action wrapper catches and surfaces via the action's typed result souseEditLockcan transition state cleanly.
Migration touch:
- [ ] Add
version INTEGER NOT NULL DEFAULT 1column to mutable tables that will be edit-locked. Pre-prod, so edit the originalCREATE TABLEmigration in place. List of affected tables determined per-feature when the lock subscribes — foundation lands the primitive; each per-app surface in 1D.1/1D.2/1D.3 wires its detail page through the hook and adds the column to the table it edits. - [x] Pattern entry P54 (Edit Locks) added to patterns.md.
Out of scope here:
- Realtime presence ("Maria is editing this" surfacing without page-level acquire). HTTP polling on
GET /v1/locks/{resource}/{id}covers the read-only banner refresh; WebSocket-driven presence is a Layer 6+ concern if it ever ships. - Collaborative-edit merge (Figma-style). Hard non-goal — the platform is staff-blocking-staff, not concurrent-edit.
- Locks across browser sessions for the same principal — per-principal identity means the same staff can edit from multiple tabs without locking themselves out; self-races are version-column-only.
1D.5 Cross-Org Account Surface (Patient Platform-Level View)
Each clinic portal scopes the patient to that clinic's view by RLS —
demo.portal.restartix.proshows Demo's data,acme.portal.restartix.proshows Acme's, never blended. That posture is correct (the platform is processor for each clinic separately; blending is joint controllership per Art. 26). But a patient enrolled at multiple clinics still needs one place to see the union of their own data, manage cross-org actions (account deletion, DSAR routing per clinic), and discover which clinics they're at.This surface is the answer. It runs on a platform-owned hostname (
account.restartix.proor similar) with no clinic branding, noX-Organization-IDheader, and no per-org RLS context. The session is patient-portable:set_app_principal(no org), so existing RLS policies (consents_select_self,patients_select_self,patient_subscriptions_select_self, etc.) return the cross-org union via thecurrent_app_org_id() IS NULLbranch they already carry.Why this is a foundation item, not a feature: the Portal-per-clinic policies were tightened during 1B's RLS hardening to prevent Demo→Acme bleed; the cross-org "see everything you have across all your clinics" UX is the corresponding patient-side affordance. Without it, a patient at multiple clinics has no single place to manage account-wide concerns. Must ship before staging cuts over (1E.3) so 1E.2's setup-a-clinic acceptance test can validate the full multi-clinic flow against real hostnames.
Open:
- [ ] Hostname provisioning:
account.restartix.pro(or final name) — DNS, ACM cert, Route53 entry; identical infra shape toconsole.restartix.pro. - [ ] Frontend app: new Next.js app under
apps/account/(or extend an existing one with a new layout). Auth via Clerk same as portals. No org-resolver — the proxy does NOT setX-Organization-ID. - [ ] Middleware composition: the
account.*host hits/v1/me/*routes throughRequirePrincipalRLSonly, withCurrentOrganizationID = uuid.Nilenforced.attachRLSConndispatches toset_app_principal(P)(no org context) — patient-side RLS policies then return the cross-org union. - [ ] My consents (cross-org view): consume the same
GET /v1/me/consentsendpoint with no org header. Returns every active + withdrawn consent across every clinic the patient has ever been at, plus platform-scope rows. Toggles for self-withdrawable purposes work per-row (the row carriesorganization_id; the withdraw call is org-attributed correctly because the row's column drives the cascade). - [ ] My clinics page (A4): consumes the shipped
GET /v1/me/clinicsendpoint (D-8, commit58dc1c4) — returns each clinic's name, primary contact, and DPO email (field-filtered subset oforganization_billing; no other billing data leaks) for DSAR routing without crossing the processor boundary. Same handler as Portal's P9; rendered with platform chrome here. - [ ] My profile (portable): view + edit
patient_profiles(the portable identity, noorganization_id). Same handler as Portal's/v1/me/patient-profile— read returns the portable row; edit propagates to every clinic the patient is at without per-org duplication. - [ ] Account deletion request: full-account erasure entry-point (consumes F11.1's job pipeline when it ships; trigger + UI + queued job ship in 1B's account-deletion subset). Distinct from "leave clinic X" (which lives in the clinic's own portal under
org_termswithdraw). - [ ] Cross-org data export request: GDPR Art. 15/20 pull spanning every clinic the patient is at. Each clinic's slice routes to that clinic for fulfilment via the same per-clinic queue used by the portal-side export request.
- [ ] Access history view, cross-org (1B.13 — backend shipped): every staff impersonation session against this patient across every clinic. Reads
GET /v1/me/patient-impersonation-sessions(noorganization_idfilter — RLS self-read returns the union); patient sees who at which clinic accessed their record, when, and why. Foundation-tier scope is session metadata only; per-action drill-down via the futurepatient_account_activityprojection if/when patient UX needs more — patients do NOT get SELECT onaudit_log. - [ ] Locale + theme + sign-out: same pattern as Portal.
- [ ] Acceptance test: a patient enrolled at two clinics signs in to
account.restartix.pro, sees both clinics in their list, sees consents from both clinics in one view, withdraws a marketing consent at clinic A and verifies it stays granted at clinic B (per-clinic scope preserved despite the unified view), triggers a cross-org export, and signs out. Add to 1E.2's setup-a-clinic acceptance test before staging cuts over.
Out of scope here: clinical data (treatment plans, exercises, telerehab) — those are per-clinic by design and the patient sees them at the clinic's portal. Cross-org medical surfaces are a Layer 5+ concern if they ever ship.
1E. Foundation Gate
Three closes happen here: documentation, acceptance test, staging deployment.
1E.1 Foundation Gate Documentation
Status: shipped. STOP callout, Phase Discipline section in CLAUDE.md, gate referenced from /new-domain + /new-migration skills, four CI gates wired into make check (check-classification, check-soup, check-migrations, check-events).
1E.2 Setup-a-Clinic Acceptance Test (was 1.23)
Status: shipped end-to-end against local. Scenarios cover org provisioning + companion fan-out, plan management (Free → Pro, add-on stacking, override grant/revoke), capability flag flip, tier management (add/flip default/inactive), patient signup + portal onboarding, four-gate middleware paths, audit attribution (human actor, system actor). Detail: setup_clinic_test.go.
- [ ] Re-run against staging once 1E.3 ships.
1E.3 AWS Staging Deployment
Foundation isn't done until it works in the target environment. Custom-domain TLS via Cloudflare for SaaS, two-pool RLS against managed Postgres, multi-subdomain cookies, Clerk in production mode, KMS-backed encryption, S3 round-trips — none of this validates on
*.localhost. Single-AZ staging only; full prod hardening (Multi-AZ, autoscaling ceilings tuned to real traffic, alerting fan-out) is the production deploy in F11. 1E.3 is the foundation gate — staging only, no real patients. Real-clinic launch is the separate operational gate at production-launch-readiness.md.
The full topology, sizing, and cost shape are in aws-infrastructure.md; the deploy mechanics in deployment.md; the Terraform module layout in iac-layout.md; the architectural rationale (why ECS Fargate over App Runner, why Aurora Serverless v2 for staging, why Cloudflare for SaaS for custom domains, why Terraform) in decisions.md. 1E.3 is the work that takes those documents from "specified" to "running."
Stack settled (2026-05-07). ECS Fargate everywhere · Aurora Serverless v2 (single-AZ, scale-to-zero) for staging Postgres · ElastiCache Redis (single-node) · S3 + KMS + Secrets Manager + ECR · SES (production identity verification opens here) · Cloudflare for DNS, CDN, WAF, and per-tenant custom-domain TLS via Cloudflare for SaaS · Terraform as the IaC tool with state in S3 and native conditional-write locking (use_lockfile = true; no DynamoDB).
Provisioning (Terraform-first, no Console clicks for anything reproducible):
- [ ] Create Terraform module skeleton per iac-layout.md:
infra/modules/network,infra/modules/database,infra/modules/ecs-service,infra/modules/cache,infra/modules/storage,infra/modules/observability;infra/envs/stagingandinfra/envs/productionconsume them - [x] State backend: S3 bucket
restartix-tfstate(encrypted, versioned, public-blocked) with native conditional-write locking (use_lockfile = trueper env). No DynamoDB table needed. Shipped 2026-05-13 viainfra/envs/bootstrapapply; S3 native locking migration landed same day (replacing the initial DynamoDB-lock-table setup from7f9470b). - [x] GitHub Actions OIDC provider + deploy IAM role with least-privilege Terraform-apply policy. Shipped 2026-05-13. Two roles:
restartix-deploy-staging(trustsrepo:RestartiX/restartix-platform:ref:refs/heads/master) andrestartix-deploy-production(trustsrepo:RestartiX/restartix-platform:environment:production, gated by GitHub Environment approval). ARNs ininfra/envs/bootstrapoutputs. - [ ] Provision staging VPC: 2 public + 2 private subnets, t4g.nano NAT instance (single AZ — staging accepts the SPOF), VPC endpoints for S3 / ECR / Secrets Manager / KMS / CloudWatch Logs
- [ ] Provision Aurora Serverless v2 cluster in staging: 0.5–2 ACU range, scale-to-zero enabled, single-AZ, 1-day backup retention, parameter group with
rds.force_ssl=1+shared_preload_libraries=pg_stat_statements, extensions per 1A.16 - [ ] Provision ElastiCache Redis (cache.t4g.micro single node, encryption in transit + at rest)
- [ ] Provision ECR repos with lifecycle policy (last 20 tagged, untagged > 7d deleted)
- [ ] Provision S3 buckets:
restartix-uploads-staging,restartix-audit-archive-stagingwith versioning + block public access + lifecycle policies (audit archive: Standard → Glacier IA at 90d → Deep Archive at 365d) - [ ] Provision customer-managed KMS key in
eu-central-1, used as Secrets Manager envelope key forrestartix/{env}/encryption(column-encryption keyring + pg_dump envelope key). Key policy: Fargate task role getskms:Decryptagainst the SM context; operations role gets full lifecycle. DirectkmsKeyring(per-data-key KMS calls) is Phase 2 — not wired here. - [ ] Provision Secrets Manager secrets per aws-infrastructure.md → Secrets management
- [ ] Provision ALB with ACM wildcard cert for
*.restartix.pro(DNS validation via Cloudflare TXT records) + listener rules for host-based routing across all services - [ ] Provision ECS cluster + task definitions + services for: Core API, Telemetry API (sizing per aws-infrastructure.md → Telemetry sub-stack once those decisions land), clinic, portal, console, pgbouncer
- [ ] Provision EventBridge Scheduler rules for
audit-partition-roll,usage-quota-reset,usage-summary-rollup,check-providers,expired-sessions-sweep - [x] Provision SES production identity for the staging sender domain — DKIM + SPF + DMARC records added in Cloudflare; sandbox-exit ticket opened with AWS Support (24–48h turnaround). Shipped 2026-05-13 for
restartix.pro(platform sender domain; same identity serves staging + production). Sandbox was already exited at the account level; no ticket needed. - [ ] Provision baseline CloudWatch alarms per monitoring.md → ECS Fargate & CloudWatch Monitoring
- [ ] Cloudflare for SaaS configured: zone settings, custom-hostname API token, SSL/TLS edge certificate origin pointing at ALB
Application-layer prereqs (Core API code, lands before or with the IaC):
- [x] Cloudflare for SaaS Custom Hostnames Go client — new package
services/api/internal/integration/cloudflare-saas/wrapping the Custom Hostnames API (POST /custom_hostnamesto register,GET /custom_hostnames/{id}to poll provisioning status,DELETE /custom_hostnames/{id}to deregister). Auth via API token from Secrets Manager (restartix/{env}/cloudflare). Shipped 2026-05-13. Hand-rollednet/http+httptest-based tests; 16 test cases covering happy paths + each sentinel error + the structured Cloudflare error envelope. Config env vars (CLOUDFLARE_SAAS_API_TOKEN,CLOUDFLARE_ZONE_ID, optionalCLOUDFLARE_API_BASE_URL) added toconfig.go. Console handler atPOST /v1/admin/organizations/{id}/custom-domainnot in scope here — that consumer ships when the custom-domain admin UI lands. Status polling strategy (inline UI vs scheduled task) is the consumer's call; the client exposes the primitives.
Deploy + validation:
- [ ] CI/CD pipeline operational per deployment.md: branch protection on
master, GitHub Actions builds + pushes to ECR + runs migrations as one-shot ECS task + triggers ECS rolling deploys, with a manual approval gate before production deploys (production environment isn't built yet at 1E.3 — the gate exists in the workflow definition for the F11 production rollout) - [ ] Deploy Core API + Telemetry API + Clinic + Portal + Console to staging
- [ ] Re-run 1E.2's setup-a-clinic test list against staging end-to-end
- [ ] Custom-domain end-to-end via Cloudflare for SaaS: org adds real custom domain in Console → backend calls Cloudflare for SaaS Custom Hostnames API → returns CNAME target → record set in clinic-side DNS → Cloudflare provisions Let's Encrypt cert → app renders at the custom domain → cookies and Clerk both work
- [ ] Two-pool RLS validated against Aurora Serverless v2: admin + restricted-role pool both acquire connections cleanly under synthetic load; restricted role's default privileges enforce RLS as designed
- [ ] Multi-subdomain cookies:
org-idset onclinic.restartix.prois read on{slug}.clinic.restartix.pro - [ ] Clerk in production mode (not test keys); sign-in / sign-up / JWT verification / blocked-user 403 all work
- [ ] HTTPS / HSTS verified end-to-end (Cloudflare → ALB → Fargate); security headers present in responses
- [ ] Customer-managed CMK envelopes
restartix/{env}/encryption; Core API boots successfully and round-trips ciphertext through the in-memory keyring loaded from the KMS-protected SM secret (verifies the SM → keyring → AES-GCM path end-to-end against real KMS) - [ ] S3 bucket from 1A.8 wired (org-scoped uploads work in real env via signed URLs)
- [ ] SES production identity verified; FakeChannel replaced with real EmailChannel pointing at SES; foundation
MemberInvite+BreakGlassOpenedtemplates send through to a real inbox - [ ] Scheduled tasks (
audit-partition-rolletc.) firing on schedule, audited correctly - [ ] Make.com end-to-end smoke test for outbound webhook subscriptions (1C.4 closing item)
- [ ] Documentation reflects the deployed staging shape: aws-infrastructure.md, deployment.md, iac-layout.md, scaling-architecture.md, monitoring.md, backup-disaster-recovery.md
Out of scope here (closes in F11 production deploy):
- Multi-AZ posture (RDS Multi-AZ, NAT Gateway with HA, two pgbouncer tasks across AZs)
- Production RDS Postgres instance — staging stays on Aurora Serverless v2; production is a separate
infra/envs/productionapply - Read replicas
- Production-grade alarming + dashboards beyond the staging baseline
- Cloudflare WAF rule tuning beyond the managed ruleset
- Auto-scaling ceilings tuned to real traffic
- Sentry production project (staging Sentry project is enough for 1E.3)
- Automated cross-region replication for backups (Layer 3 of backup-disaster-recovery.md)
Cost target: under $100/mo idle (currently estimated ~$97/mo). Telemetry is not part of the 1E.3 staging gate — it ships as a Layer 2 service after foundation closes (~+$7/mo when added). See aws-infrastructure.md → Cost: staging and Telemetry sub-stack.
Locked: consent and processor design
The substantive design for consents, controllership, and break-glass access is recorded in decisions.md → Why clinic is controller, platform is processor. Summary of what landed:
- Single
consentsledger spanning platform-scope and org-scope purposes, with alegal_basisdiscriminator (contract/legitimate_interest/consent/legal_obligation/vital_interest) and awithdrawablederived flag. Ships in 1B.9. - Privacy notice template + clinic fill-in instead of a fixed platform notice. Ships in 1B.10.
- Break-glass access for any identifiable cross-tenant patient data in Console — per-org scope, time-bound, justification-required, always-on clinic notification, audited. Ships in 1B.11.
- Cross-tenant features anonymise by default — codified as a foundation principle. Joint controllership (Art. 26) is the failure mode this rule prevents.
- DSAR routing flows through the clinic, never the platform. The platform's role is auto-respond + portal self-service ("your clinics" list); break-glass is the last-resort path for orphaned requests.
- Tier B medical consents (telemedicine, video recording, biometric capture, treatment-specific) layer on top of 1B.9 in F3.5 — same table,
source='form'rows, multi-modal signature capture (in-portal click, drawn-on-tablet, sent-to-phone).
Deferred Foundation Extensions
Designed but not in any sub-phase's checklist — explicit so they don't get forgotten when the trigger arrives. Each entry names: what, current status, the trigger that lights it up, where the design lives.
Platform-level non-human actors (observability agents, cross-org metric aggregators, cross-org audit aggregators)
- Status:
principals.organization_idis nullable so a platform-level agent / service-account row CAN exist; what's missing is the grant mechanism —platform_membershipsis human-only by CHECK constraint. - Trigger: first observability or cross-org operational feature.
- Decision when triggered: drop the human-only CHECK on
platform_membershipsand add non-superadmin platform role codes (e.g.metrics_observer). One table, expanded to non-humans when needed — no separate grant table. - Design ref: data-model.md → Area 1 future-sibling note; the 1B.1 ADR's discussion of platform-level actors.
- Status:
Service account authentication flow (the entire integration-auth surface — not just per-key scoping)
- Status: schema ships in 1B.1 (
service_accountstable withapi_key_hash,api_key_prefix, lifecycle columns). Operational flow does not exist:- No endpoint to create a service account or generate an API key
- No middleware that resolves an inbound API key to a
principal_id - No revocation / rotation endpoints
- No Console / Clinic admin UI surface
- No per-key scoping (one key per service account, scope = principal's role)
- No per-key rate-limit knobs
- Trigger: first concrete external integration that needs service-account auth (Zapier, EHR sync, custom backend integration).
- Decision when triggered: design the full lifecycle — key creation endpoint that returns the secret once + stores the hash, auth middleware that resolves
Authorization: Bearer sa_live_*keys to aprincipal_id, revoke/rotate endpoints, admin UI. Per-key narrowing (allowed_scopes TEXT[]or a junction table) and per-key rate-limits land in the same wave. Pin the scoping shape against the integration's actual needs rather than designing speculatively. - Design ref: data-model.md → service_accounts.
- Status: schema ships in 1B.1 (
Delegation feature using
parent_principal_id(the column ships today, semantics defined per-feature)- Status: column exists on
principals; nothing reads or writes it. - Trigger: first feature requiring "principal acts on behalf of human X" attribution — most likely an AI-agent feature in F-tier.
- Decision when triggered: define delegation semantics (parent's permissions cap the child's? time-bound? per-resource?), surface them in audit reads, add UI for granting/revoking delegation.
- Design ref: principals ADR ("Scope kept tight"); data-model.md → principals row.
- Status: column exists on
Patient-facing account-activity projection (
patient_account_activityor similar) — curated activity feed surfaced in Portal + cross-org account surface (1D.5). Foundation principle codified by this entry: patient transparency surfaces are projections, never rawaudit_logexposure.audit_logstays staff/forensic-only; patients see purpose-built feeds with privacy-appropriate framing.- Status: not built. Today, patients see their own data via per-table self-read RLS —
consents(consent trail at /me/consents),patient_impersonation_sessions(access history at /me/patient-impersonation-sessions when 1B.13 lands),break_glass_sessionsindirectly via the clinic admin banner. This covers foundation-tier transparency. A unified "things that happened on my account" feed crossing all of these is not. - Trigger: first concrete patient UX that needs more than a single source-of-truth table self-read — e.g., a unified activity dashboard, or a per-action drill-down on impersonation sessions ("what entities did the staff touch during this session"). Patient-side audit-row drill-downs land here, not via direct
audit_logSELECT. - Decision when triggered: design the projection table (likely
patient_account_activitypartitioned monthly per the events-partitioned/state-not rule, populated by triggers fromaudit_log+ sessions + consents + login events). Filter at the trigger to patient-relevant rows only; skip operational-metadata bumps. Keep the surface tight — no rawrequest_id,ip_address,user_agent, or technical audit columns; only "X happened on Y date" framing. - Design ref: this section; the 1B.13 design discussion that surfaced the principle ("audit_log is technical and raw, patient-facing wants account-activity framing").
- Status: not built. Today, patients see their own data via per-table self-read RLS —
Session permission-revocation sweep (covers 1B.11 break-glass + 1B.13 impersonation)
- Status: gap intentionally accepted in foundation. When a principal who has an open session (break-glass or impersonation) loses the permission that authorised it (membership removed, role demoted, custom role edited to drop the permission), open sessions are NOT auto-closed — they live until
expires_at(max 4h). The middleware re-checks the session row, not the permission, on each request. Same gap exists for both primitives. - Trigger: compliance review (Romanian DPA, clinic procurement) flags the residual window, or a real incident makes the gap concrete.
- Decision when triggered: hook on
organization_membershipsUPDATE/DELETE androle_permissionsUPDATE → close any open sessions for the affected(principal × org)withclosed_at = NOW()and a system-close reason. Cheap to bolt on later; the partial unique index on active sessions and the close-path service-layer logic are already in place. - Why deferred: max 4h cap bounds the residual window; product impact is low; foundation discipline argues against speculation. The gap is documented here so future incident review or a deliberate hardening pass can find it.
- Design ref: 1B.11 break-glass middleware + 1B.13 impersonation middleware; this entry.
- Status: gap intentionally accepted in foundation. When a principal who has an open session (break-glass or impersonation) loses the permission that authorised it (membership removed, role demoted, custom role edited to drop the permission), open sessions are NOT auto-closed — they live until
Marketplace Mediation (Patient → Clinic Payments via Platform) — strategic future product offering where the platform mediates patient-to-clinic payments end-to-end (patient pays platform; platform pays clinic minus a fee). Distinct from Option A (clinic uses their own payment provider for patient billing; platform never touches the money — supported today via Cat C webhooks). Both options coexist: clinics with existing payment infrastructure stay on Option A; clinics that want a turnkey solution opt into marketplace mediation when it ships.
- Status: strategic design pending. Foundation accommodates via four Cat A capability skeletons declared in 1C.1 (
payment.Provider,invoicing.Provider,patient_payment.Provider,clinic_payout.Provider) — no implementations yet. Foundation does NOT lock the engine. - Trigger: company is ready to make the strategic move (legal review of payment institution licensing, fee model decision, capacity to onboard each clinic with KYB/KYC, dedicated payment-ops engineering work). Likely 12+ months post-launch; pace driven by demand from clinics that lack their own payment infrastructure.
- Decision when triggered: pick mediation provider per market (Stripe Connect for international + EU; Romanian alternative TBD — Netopia has marketplace features, may need legal review; PSD2 considerations for EU). Define fee model (per-transaction percentage vs. flat vs. hybrid; per-tier differentiation). Build patient-facing payment UI in portal (Cat A
patient_payment.Providerimpl). Build clinic onboarding flow for KYB/KYC (Stripe Connect Express or Custom). Build payout management (clinic_payout.Providerimpl + payout scheduling). Define refund/dispute handling (platform reverses; clinic balance debited). Romanian-specific: VAT + e-Factura split between platform's fees and clinic's revenue (separate ADR with accounting/legal review). - What foundation reserves to keep this option open (no schema or code change today; just principles to honor):
patient_subscriptions(1B.7) stays informational at foundation but the schema accommodates extension with payment metadata when marketplace ships.- The four billing capability interfaces declared at foundation (1C.1 list) cover the marketplace use cases —
patient_payment.Providerfor patient-facing payment,clinic_payout.Providerfor sending money to clinics. Real impls slot in via the same Cat A pattern as everything else. organization_integrations.config(Cat B, 1C.5) accommodates clinic-side integrations like a clinic's own Stripe/Netopia account FOR Option A — webhook subscriptions push subscription state into our system from the clinic's payment provider (Cat C / Cat D mix depending on direction).
- Design ref: glossary → Marketplace mediation; patterns.md eventual P-entry when first impl ships; this entry.
- Status: strategic design pending. Foundation accommodates via four Cat A capability skeletons declared in 1C.1 (
Tenant Isolation Foundation (
tenancy_modereservation) — foundation establishes the tenancy-topology discriminator and the draft-state lifecycle reservation. The only structural axis foundation locks in is identity (per-tenant Clerk org); per-tenant storage and per-tenant encryption are deferred entitlements, not foundation work, and ship later in one PR each alongside the operational mechanism they depend on. Both modes (sharedanddedicated) target SMB clinics; hospital networks and dedicated-infrastructure tiers (per-tenant RDS/Redis/CloudHSM) are permanently out of scope (see CLAUDE.md → Project Overview).- Status: architectural commitment settled. Today only
sharedmode is sellable end-to-end;dedicatedis a schema reservation — no creation flow accepts it and no provisioning code provisions per-tenant Clerk orgs. Full spec at features/platform/tenant-isolation.md. Runtime dedicated-mode build deferred until a paying contract funds the operational setup. - Foundation pre-work in 1B (shipped) —
humanspartition column ready for per-tenant identity namespace:- [x] Add
humans.provider_org_id TEXT NULLcolumn to migrations/core/000002_tenancy_rbac.up.sql (no FK initially — the FK target lands with the dedicated-mode runtime feature). - [x] Replace
humans.email NOT NULL UNIQUEwithUNIQUE (email, provider_org_id) NULLS NOT DISTINCTin the same migration. Functionally identical for shared-mode tenants today (all have NULLprovider_org_id); future-proofs for dedicated mode where the same email can exist once per auth-provider tenant.
- [x] Add
- Foundation pre-work in 1E (shipped) —
organizations.tenancy_mode+ draft-state reservation:- [x] Add
organizations.tenancy_mode TEXT NOT NULL DEFAULT 'shared' CHECK (tenancy_mode IN ('shared', 'dedicated'))to migrations/core/000002_tenancy_rbac.up.sql. Single-enum topology discriminator; see decisions.md → Whytenancy_modeis a single enum, not multi-axis. - [x] Add
organizations.activated_at TIMESTAMPTZ NULLto the same migration. NULL = draft (org row exists but unroutable from public endpoints); non-NULL = active. Every creation path today setsactivated_at = NOW()in the same transaction as the INSERT; the NULL state is a reservation for the future dedicated-mode async provisioner. See decisions.md → Whyactivated_atas the org draft-state mechanism. - [x] Add partial index
CREATE INDEX idx_organizations_draft ON organizations(id) WHERE activated_at IS NULLfor cheap draft-org lookups when the future provisioner needs them. - [x] Column-classification registry entries for
tenancy_modeandactivated_at(data-classification.md → organizations). - [x] Column entries in data-model.md → organizations.
- [x] Public org resolve handler (
GET /v1/public/organizations/resolve) gates onactivated_at IS NOT NULL— returns 404 for draft orgs. - [x] Owner-welcome dispatch gated on
activated_at IS NOT NULL— welcome email is fired by activation transition, not by raw INSERT.
- [x] Add
- Deferred (out of foundation scope; ships when the first paying dedicated contract closes — see features/platform/tenant-isolation.md → Deferred design surface for the canonical narrative):
- Per-tenant Clerk org provisioner (Clerk Backend API integration; writes the dedicated
provider_org_idper tenant). - Re-introduced finalize-provisioning endpoint with proper preconditions (Clerk org exists,
platform_service_providersoverrides written) that flipsactivated_at = NOW()and queues the welcome email. - Addons via entitlements catalog:
own_s3_bucket(ships with the exit / portability tool) andown_cmk(ships with the documented crypto-shred runbook). Available on either tenancy mode; not coupled totenancy_mode = 'dedicated'. - Terraform module for per-tenant infrastructure (S3 bucket + CMK + IAM bindings + Clerk org wiring).
- Operational templating for dedicated mode (custom DNS / ACM cert / per-tenant SES / SMS / Daily.co domain — the universal branding pieces work on shared mode too).
- Dedicated-mode DPA template (legal-counsel work; can run in parallel).
- Pricing model: one-time setup fee + premium MRR uplift + termination service fee.
- Per-tenant Clerk org provisioner (Clerk Backend API integration; writes the dedicated
- Design ref: features/platform/tenant-isolation.md; decisions.md → Why
tenancy_modeis a single enum, not multi-axis; decisions.md → Why tenant-isolation has its own controllership story; decisions.md → Whyactivated_atas the org draft-state mechanism.
- Status: architectural commitment settled. Today only