Exercise Content Pipeline

Operational reference for how raw exercise primitives land in S3, how the composer turns them into rendered videos on Bunny Stream, and how to add a new exercise to the platform.

For the compositional model itself (asset bundle shape, recipe contract, variant model), see features/exercise-library/composition.md. For the architectural pattern, see P56.

The pipeline at a glance

  Filming / audio team
        │
        │  prepares manifest.json + names files per convention
        ▼
  aws s3 sync
        │
        ▼
  s3://restartix-exercise-assets-{env}/{exercise}/   ← raw primitives, our control
        │
        │  composer downloads bundle when called
        ▼
  exercise-composer service                          ← stateless, Fargate task
        │
        │  ffmpeg bake (intro → sets → outro), upload to Bunny
        ▼
  Bunny Stream library                                ← transcoded HLS, CDN delivery
        │
        │  bunny_video_id stored in Core API DB
        ▼
  Patient app                                         ← plays via HLS URL
        plays HLS playlist via Bunny's player or HLS.js

The composer is read-only on S3 and write-only on Bunny. The patient app reads only from Bunny.

AWS S3: source-of-truth buckets

Env	Bucket	Region	Provisioning	Composer access	Versioning
dev	`restartix-exercise-assets-dev`	eu-central-1	Manual (Console)	local-dev IAM user, read+write	off
staging	`restartix-exercise-assets-staging`	eu-central-1	Terraform (`storage-s3` module)	Fargate task role, read-only	on
production	`restartix-exercise-assets-production`	eu-central-1	Terraform (`storage-s3` module)	Fargate task role, read-only	on

All three: SSE-S3 encryption, public access blocked (all 4 toggles), no lifecycle rules, no Object Lock. Composer never writes back to S3 by design — outputs go to Bunny, not back into the assets bucket.

Why composer is read-only

Renders are outputs, not derivations stored back into the source bucket. Bunny owns the render lifecycle. Constraining composer to read-only on S3 makes the data-flow direction explicit and prevents accidental corruption of the source-of-truth primitives.

Object layout under `s3://{bucket}/`

{exercise-slug}/
├── manifest.json
├── intro-video-{1,2,3}.mp4
├── pauza-video-{1,2,3}.mp4
├── outro-video-{1,2,3}.mp4
├── rep-left-video-{1,2,3}.mp4
├── rep-right-video-{1,2,3}.mp4
└── audio/{lang}/
    ├── intro-vo-{1,2,3}.mp3
    ├── pauza-vo-{1,2,3}.mp3
    ├── outro-vo-{1,2,3}.mp3
    ├── rep-left-vo-{1,2,3}.mp3
    └── rep-right-vo-{1,2,3}.mp3

The composer auto-discovers variant counts from filenames matching {slot}-{n}.{ext}. Missing files don't crash the composer — they just narrow the variant pool for that slot. (A render request that needs a slot with zero variants fails validation.)

Composer's IAM policy (locked shape)

When the composer ships on Fargate (post-1E close), its task role attaches this policy per env:

json

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "ExerciseAssetsRead",
      "Effect": "Allow",
      "Action": ["s3:GetObject", "s3:ListBucket"],
      "Resource": [
        "arn:aws:s3:::restartix-exercise-assets-{env}",
        "arn:aws:s3:::restartix-exercise-assets-{env}/*"
      ]
    }
  ]
}

No s3:PutObject, no s3:DeleteObject, no s3:PutObjectTagging. By design.

Local dev credentials

The restartix-platform-local-dev IAM user has read+write on the dev bucket only. Its keys live in services/api/.env.local (and are reused verbatim by services/exercise-composer/.env.local). Reach them via the user's password manager under "RestartiX AWS local dev". No LocalStack — local dev hits the real dev bucket.

Bunny Stream: rendered video delivery

Env	Library	Region	Provisioning	Credentials
dev	manually provisioned (e.g. `restartix-exercise-renders-dev`)	Frankfurt	Bunny dashboard	local `.env.local`
staging	manually provisioned	Frankfurt	Bunny dashboard	AWS Secrets Manager `restartix/staging/bunny-bootstrap`
production	manually provisioned	Frankfurt	Bunny dashboard	AWS Secrets Manager `restartix/production/bunny-bootstrap`

No Terraform Bunny provider. Bunny libraries are provisioned manually in Bunny's dashboard, one-time bootstrap per env. Each library has its own API key — separate libraries per env keep production reputation and quota separate from staging tests.

What the composer needs from each library

Field	Where in Bunny dashboard	Composer env var
Library ID (numeric)	Library overview / URL	`BUNNY_STREAM_LIBRARY_ID`
API Key (full access)	Library → API tab → "API Key" (not "Read-only API Key")	`BUNNY_STREAM_API_KEY`
CDN hostname	Library → API tab → "CDN hostname" (e.g. `vz-{token}.b-cdn.net`)	`BUNNY_STREAM_CDN_HOSTNAME`

Replication

For dev/staging libraries: only Frankfurt is enabled as a region. The Singapore / LA / NY replicas Bunny pre-selects by default add cost ($0.005/GB per replica per delivery) without serving real users at those stages. Production library replication should be decided based on patient geography at launch — likely Frankfurt + London for EU coverage.

Replicas cannot be removed after the library is created. Start lean.

Cost shape

For a 200-exercise library at 3 variants per slot, ~10 common prescriptions per exercise, 1 language:

Encoding (one-time per render): ~$0.025/min × ~3 min × 2,000 renders = ~$150 one-time
Storage: ~10 GB (transcoded HLS ladder) × $0.03/GB-month = ~$0.30/month
CDN delivery: ~$0.005/GB; a patient session pulls ~50 MB = $0.00025/session

Per-language overhead: only the audio side (~6 GB across 200 exercises). Adding a 2nd language ≈ +$0.18/month storage.

Trivial at platform scale.

Adding a new exercise

Two flows, depending on the exercise's kind:

reps_based — primitives in S3 + composer renders per recipe. The flow below covers this case end-to-end.
duration_based — a single pre-baked MP4. See Duration-based import at the end of this section.

For both, no admin UI exists today — both flows are manual.

1. Receive content from filming/audio team (reps_based)

The filming team delivers a folder of files for the new exercise. Typical raw delivery:

Detensionari Lombare/
├── 001 Detensionari Lombare - Intro.mp4
├── 001 Detensionari Lombare - Outro.mp4
├── 001 Detensionari Lombare - Pauza.mp4
├── 001 Detensionari Lombare 5 Stanga.mp4
├── 001 Detensionari Lombare 5 Dreapta.mp4
└── 001 Detensionari Lombare 1-20 Stanga.mp3
└── 001 Detensionari Lombare 1-20 Dreapta.mp3

(Filenames may vary; the team will deliver whatever naming convention they use internally.)

2. Rename files to the composer's convention

Reorganise into the layout the composer expects:

lumbar-detensioning/
├── manifest.json
├── intro-video-1.mp4          ← was "001 ... - Intro.mp4"
├── pauza-video-1.mp4          ← was "001 ... - Pauza.mp4"
├── outro-video-1.mp4          ← was "001 ... - Outro.mp4"
├── rep-left-video-1.mp4       ← was "001 ... 5 Stanga.mp4"
├── rep-right-video-1.mp4      ← was "001 ... 5 Dreapta.mp4"
└── audio/ro/
    ├── rep-left-vo-1.mp3      ← was "001 ... 1-20 Stanga.mp3"
    └── rep-right-vo-1.mp3     ← was "001 ... 1-20 Dreapta.mp3"

For the framing VO (intro/pauza/outro) which is currently baked into the framing video's audio track: extract it via ffmpeg into separate mp3s so it can be language-swapped later:

bash

ffmpeg -hide_banner -y -i intro-video-1.mp4 -vn -c:a libmp3lame -b:a 192k audio/ro/intro-vo-1.mp3
ffmpeg -hide_banner -y -i pauza-video-1.mp4 -vn -c:a libmp3lame -b:a 192k audio/ro/pauza-vo-1.mp3
ffmpeg -hide_banner -y -i outro-video-1.mp4 -vn -c:a libmp3lame -b:a 192k audio/ro/outro-vo-1.mp3

This is until the filming team delivers intro/pauza/outro as silent video + separate VO per language (the locked future state per the composition spec). For exercises where the framing video has dialogue baked in, the lip-sync mismatch with future-language VO will surface; those exercises will need re-shooting.

3. Add 2nd and 3rd variants (when available)

If the filming team has delivered 3 variants per slot, name them -1, -2, -3. If only 1 variant exists today, placeholder by duplicating:

bash

cp intro-video-1.mp4 intro-video-2.mp4
cp intro-video-1.mp4 intro-video-3.mp4
# … same for every slot

The composer happily picks among "3 variants" that are byte-identical until real variants replace them. The render works, just without anti-repetitiveness benefit until the real 2nd and 3rd variants arrive.

4. Write the manifest

json

{
  "exercise": "lumbar-detensioning",
  "reps_per_video_block": 5,
  "counts_per_audio_master": 20,
  "sides": ["left", "right"],
  "languages": ["ro"]
}

For bilateral exercises that don't switch sides (e.g. forward fold, plank), set "sides": ["bilateral"] instead of ["left", "right"] — the composer's variant model extends naturally, but the recipe shape would need adjustment. (Today only ["left", "right"] is implemented.)

5. Sync to the dev bucket

bash

aws s3 sync lumbar-detensioning/ \
  s3://restartix-exercise-assets-dev/lumbar-detensioning/ \
  --exclude '.DS_Store' \
  --exclude '*.tmp'

aws s3 sync does delta uploads (only changed files), doesn't delete remote files by default. Run again after updating any file.

Verify:

bash

aws s3 ls s3://restartix-exercise-assets-dev/lumbar-detensioning/ --recursive

Should show all expected files. Counts:

1 manifest.json
9 framing videos (3 per intro/pauza/outro)
6 rep videos (3 per side × 2 sides)
9 framing VO mp3s per language
6 rep VO mp3s per language

Total: ~30 files per language.

6. Test a render

Composer running locally on port 9400:

bash

curl -s -X POST http://localhost:9400/v1/compose \
  -H 'Content-Type: application/json' \
  -d '{
    "exercise": "lumbar-detensioning",
    "language": "ro",
    "sets": [
      {"side":"left","reps":5},
      {"side":"right","reps":5}
    ],
    "seed": 42
  }' | jq .

Returns a bunny_video_id after ~5-10s. Open the returned playback_hls_url in Safari (or VLC, or the Bunny dashboard's library view) — Bunny needs ~30-60s to transcode after upload, so the first hit may show "not ready".

If the render fails, check:

Composer logs (slog output) for the specific error
That aws s3 ls shows the expected files (composer fails fast if manifest is missing or required variants are absent)
That the manifest's languages array includes the requested language
That every set's reps is a multiple of 5 in [5, 20]

Duration-based import

duration_based exercises are single pre-baked MP4s — no primitives, no S3 asset bundle, no composer involvement. They arrive as a single video file (currently: imported from the old platform's legacy library; future: any new exercise authored as a fixed-content video instead of composed primitives).

The import workflow:

Upload the MP4 to Bunny Stream directly (Bunny dashboard or API), into the per-slug collection ({exercise-slug}/). Note the resulting bunny_video_id + duration_seconds.
Insert the exercises row with kind='duration_based', status='draft', no asset_version requirement (set to 1 by default).
Insert one exercise_renders row with recipe_hash='_imported', language='ro' (or whichever language the video carries), recipe=NULL, bunny_video_id, status='ready', asset_version=1. The row is the catalog preview by default.
Update exercises.catalog_render_id to point at the render row.

No aws s3 sync — there are no S3 primitives for duration_based exercises. The Bunny video IS the entire content.

When more languages are added later, upload another MP4 to the same Bunny collection and INSERT another exercise_renders row with the new language.

If the underlying video needs to be replaced (a re-encode, a correction): re-upload to Bunny, get the new bunny_video_id, UPDATE the existing render row's bunny_video_id + bump asset_version. The cache stays consistent because there's only one row.

A migration script for the legacy import wave (hundreds of pre-existing videos) lives in services/api/migrations/ once the F9.1 import pass happens. For now, manual.

Bunny dashboard pointers

For debugging or browsing renders:

Library overview: https://dash.bunny.net/stream/{library-id} — list of all videos, status, sizes, encoded variants
Per-video page: shows embed code, HLS URL, MP4 download links, encoding logs, view counts
API tab: where credentials live; "Webhook URL" for transcode-complete callbacks (not wired today)

Bunny videos can be deleted from the dashboard if you want to start clean during testing.

Asset versioning

The exercise_renders cache table shipped in commit 3d95e38 (F9.1 Phase 1). Each exercise carries an asset_version INT column; each render row records the asset_version it was baked against. The workflow:

Filming team updates a variant (e.g. re-shoots rep-left-video-2.mp4 with better lighting).
You aws s3 sync the updated file to the bucket.
You bump exercises.asset_version for that exercise.
Cache rows with the old asset_version become stale; the next prescription request that's a cache miss (or a stale-version match — implementation detail in Service.EnsureRender) triggers a re-render. Old bunny_video_ids linger in Bunny for a grace period before cleanup.

Today (Phase 1 shipped):

✅ Schema: asset_version column on exercises, recorded on every exercise_renders row
✅ Cache lookup: Service.EnsureRender matches on recipe_hash + language + asset_version; mismatched version is treated as a miss
❌ Bump endpoint: no Console action yet — manual UPDATE exercises SET asset_version = asset_version + 1 WHERE slug = '...' for now (this is a Phase 2 task; see composition.md → Phase 2 backlog)
❌ Eager catalog preview re-render on bump: same Phase 2 task

For Phase 1 testing without the bump endpoint, the manual SQL is fine. Patient-facing flows aren't live yet, so there's no risk of mid-session staleness.

What's NOT in this pipeline

Patient recordings — the platform may eventually record sessions (telemetry, supervised reviews). Those are a separate concern, separate storage, separate provider — not in this pipeline.
Live video calls — Daily.co handles those, not Bunny. Different concern.
Marketing / brand video — not in this bucket. If we ever ship those, they need their own bucket (or live on whatever marketing CDN is chosen).
AI-generated content — out of scope for v1.

Cross-references

P56 Exercise Video Composition Pipeline — the architectural pattern
features/exercise-library/composition.md — the compositional model spec
services/exercise-composer/README.md — service implementation details (build, env vars, code structure)
experiments/exercise-composer/README.md — sandbox for iterating on editorial decisions
reference/external-providers.md — Bunny Stream provider entry
reference/file-storage.md — the platform's other S3 bucket (for user-uploaded files; distinct from this exercise-assets bucket)

Exercise Content Pipeline ​

The pipeline at a glance ​

AWS S3: source-of-truth buckets ​

Why composer is read-only ​

Object layout under s3://{bucket}/ ​

Composer's IAM policy (locked shape) ​

Local dev credentials ​

Bunny Stream: rendered video delivery ​

What the composer needs from each library ​

Replication ​

Cost shape ​

Adding a new exercise ​

1. Receive content from filming/audio team (reps_based) ​

2. Rename files to the composer's convention ​

3. Add 2nd and 3rd variants (when available) ​

4. Write the manifest ​

5. Sync to the dev bucket ​

6. Test a render ​

Duration-based import ​

Bunny dashboard pointers ​

Asset versioning ​

What's NOT in this pipeline ​

Cross-references ​

Exercise Content Pipeline

The pipeline at a glance

AWS S3: source-of-truth buckets

Why composer is read-only

Object layout under `s3://{bucket}/`

Composer's IAM policy (locked shape)

Local dev credentials

Bunny Stream: rendered video delivery

What the composer needs from each library

Replication

Cost shape

Adding a new exercise

1. Receive content from filming/audio team (reps_based)

2. Rename files to the composer's convention

3. Add 2nd and 3rd variants (when available)

4. Write the manifest

5. Sync to the dev bucket

6. Test a render

Duration-based import

Bunny dashboard pointers

Asset versioning

What's NOT in this pipeline

Cross-references