Skip to content

AI Models Registry

Foundation 1C.8 deliverable. Engineer-facing reference for the AI model registry — ai_models + ai_model_pricing_history — and the wiring contract every AI provider impl follows.

The registry, the metering reservation flow, the audit + provenance same-tx write, and the cost snapshot together implement P53 AI Capability Provenance and Variable-Cost Metering. Read that pattern first; this doc is the operational how-to.

When to add a row

The AI feature consumer that wires a real provider impl (Anthropic, OpenAI, Voyage, Deepgram, Google Vision, …) seeds the corresponding ai_models row + initial ai_model_pricing_history row in the same PR. Every UUID returned by aimodels.Repository.LookupActivePricing must point at a registered model — the cmd/check-ai-models CI guard enforces the import-graph invariant; the live audit_ai_provenance.model_id FK enforces the runtime invariant.

Foundation seeds zero rows. The five AI capability skeleton packages (internal/core/ai/{llm,embeddings,transcription,vision,classification}) ship interface declarations + Fake test doubles only — no provider impls.

Registering a model

Two writes in one transaction: the ai_models row + the initial ai_model_pricing_history row. The Console superadmin endpoint POST /v1/admin/ai-models does both in one tx; programmatic registration goes through aimodels.Service.Create.

Required fields:

  • model_provider — e.g. "anthropic", "openai", "voyage". Free-form (the CI guard validates against the registration path, not an enum).
  • model_name — e.g. "claude-opus", "text-embedding-3-large".
  • model_version — e.g. "4.7", "v3". Combined with provider + name to form the UNIQUE key.
  • capability — one of text_generation, embedding, transcription, vision, classification. CHECK constraint enforces.
  • unit_type — what usage_records.unit_type will be set to when this model is invoked. Conventionally "tokens" for LLM/embeddings/classification, "audio_seconds" for transcription, "images" (or "tokens" if multimodal) for vision.
  • validation_statusexperimental for newly registered models; validated once the model passes the per-(model, task) validation regimen the medical-device-readiness rule from CLAUDE.md requires for clinical features.
  • cost_per_input_unit_cents — NUMERIC string (foundation accepts fractional cents like "0.0003" for Anthropic-style pricing).
  • cost_per_output_unit_cents — same shape; set equal to input rate or "0" for capabilities with no output direction (embeddings, transcription).

The pricing row is created with effective_from = NOW() and effective_to = NULL (current). Future rows (price changes) close it.

Changing pricing

POST /v1/admin/ai-models/{id}/price-change is the only path. The endpoint runs in one transaction:

  1. Closes the model's current pricing row (sets effective_to = effective_from of the new row).
  2. Inserts the new pricing row with effective_to = NULL.

The partial unique index ai_model_pricing_history_current (ON model_id WHERE effective_to IS NULL) guarantees at most one current row per model. A future price change (caller-supplied effective_from > NOW()) is a permitted shape — the close happens at that future timestamp, and LookupActivePricing(modelID, NOW()) continues to return the old row until the boundary passes.

changed_by_principal_id is the superadmin who issued the change; recorded for the audit trail. notes is free-form context ("Anthropic raised input rate 2025-08-15 per their announcement").

Looking up active pricing at call time

The AI provider impl reads the row in effect at call time and snapshots into usage_records.cost_cents so closed-period billing reconstruction is accurate forever:

go
pricing, err := aiModels.LookupActivePricing(ctx, modelID, time.Now().UTC())
if err != nil { return ... }

inputCost  := computeCost(usage.InputTokens,  pricing.CostPerInputUnitCents)
outputCost := computeCost(usage.OutputTokens, pricing.CostPerOutputUnitCents)

reservation.Settle(ctx, capabilities.SettleResult{
    Entries: []capabilities.SettleEntry{
        {UnitType: "input_tokens",  Units: usage.InputTokens,  CostCents: &inputCost},
        {UnitType: "output_tokens", Units: usage.OutputTokens, CostCents: &outputCost},
    },
    PrincipalID: principalID,
})

The two-row Settle is the canonical shape for capabilities with split-direction pricing. Single-direction capabilities (embeddings only have an input direction) emit one entry.

Recording AI provenance with the audit row

When the call involves an AI model, the AuditFunc closure of the wrap helper calls audit.RecordWithProvenance instead of audit.Record. Both rows commit together — same-tx FK + lifecycle:

go
prov := audit.AIProvenance{
    ModelID:    modelID,
    InputsHash: sha256OfCanonicalisedPrompt(req),
    Confidence: usage.ConfidencePtr(),
}
auditID, err := audit.RecordWithProvenance(ctx, audit.Event{...}, prov)

InputsHash is the platform's "is this the prompt we ran?" evidence — compliance auditors need it without storing the raw prompt content (clinical features may include patient PII). The hash is computed against a canonicalised representation of the prompt the impl chooses; the choice is per-capability and documented in the impl's package comment.

Confidence is provider-supplied where available (some Anthropic Claude responses surface it; many providers don't). NULL is valid.

Validation status and clinical features

experimental models cannot be invoked for clinical features. The capability impl checks the validation status from ai_models before resolving the provider; an experimental model returns capabilities.ErrProviderUnavailable with a typed reason. Validation status flips to validated only after the per-(model, task) validation regimen passes — exact requirements live in the medical-device-readiness rule in CLAUDE.md and per-feature SaMD docs.

deprecated models still callable but warned (telemetry surfaces a deprecation flag); first AI feature consumer wires the warning. retired models are refused at the capability seam — the validation_status CHECK constraint + the retired_at consistency CHECK ensure the row's lifecycle is internally coherent.

Cross-references