Skip to content

PDF Caching and Performance Optimization

Overview

PDF generation is an on-demand process, but signed (immutable) documents are cached to S3 to improve performance and reduce resource usage. This document describes the caching strategy, invalidation rules, and performance considerations.

Caching Strategy

When to Cache

1. Form is signed → PDF can be cached
2. First PDF request → generate, upload to S3 as:
   s3://{bucket}/documents/{org_id}/{document_id}/{audience}.pdf
3. Subsequent requests → serve cached S3 file via signed URL
4. Cache key: document_id + audience + template_id
5. Cache invalidation: only if pdf_template is updated (admin action)

When NOT to Cache

For unsigned/in-progress forms, PDFs are generated on every request (preview mode). Not cached.

Form StatusCaching BehaviorRationale
pendingNot cachedForm data incomplete, preview only
in_progressNot cachedForm data changing, preview only
completedNot cachedForm not yet signed, may still change
signedCachedForm immutable, PDF deterministic

Cache Key Structure

go
type CacheKey struct {
    OrganizationID int64  // Organization context
    DocumentID     int64  // Unique document
    Audience       string // "patient", "specialist", "admin"
    TemplateID     int64  // Template version
}

S3 Storage Path

s3://{bucket}/documents/{org_id}/{document_id}/{audience}.pdf

Example:

s3://restartix-documents/org-1/doc-123/patient.pdf
s3://restartix-documents/org-1/doc-123/specialist.pdf
s3://restartix-documents/org-1/doc-123/admin.pdf

Why Three Separate Files?

Each audience sees different data:

  • Patient: Private fields excluded
  • Specialist: All fields included
  • Admin: All fields included

Caching all three versions ensures fast serving for all user types.

Cache Flow

Cache Hit (Signed Document)

Client: GET /v1/reports/123/pdf?audience=patient


Handler: Check form.status = 'signed'


Service: Build cache key (org_id, doc_id, audience, template_id)


Cache: Check S3 for existing file

        ▼ (CACHE HIT)

Service: Generate pre-signed S3 URL (15-min expiry)


Handler: Return 302 Redirect to S3 URL


Client: Downloads PDF directly from S3

Cache Miss (First Request)

Client: GET /v1/reports/123/pdf?audience=patient


Handler: Check form.status = 'signed'


Service: Build cache key


Cache: Check S3 (file not found)

        ▼ (CACHE MISS)

Service: Generate PDF


Cache: Upload PDF to S3


Service: Generate pre-signed S3 URL


Handler: Return 302 Redirect to S3 URL


Client: Downloads PDF from S3

No Cache (Unsigned Document - Preview)

Client: GET /v1/reports/123/pdf?audience=specialist


Handler: Check form.status = 'in_progress'


Service: Skip cache check (preview mode)


Service: Generate PDF


Handler: Stream PDF directly (200 OK, application/pdf)


Client: Receives PDF inline

Cache Invalidation

Automatic Invalidation Triggers

  1. Template Update: When a pdf_template is updated via PUT /v1/pdf-templates/{id}, all cached PDFs using that template are invalidated.
go
func (s *TemplateService) UpdateTemplate(ctx context.Context, id int64, updates TemplateUpdate) error {
    // Update template in database
    if err := s.store.UpdateTemplate(ctx, id, updates); err != nil {
        return err
    }

    // Invalidate all cached PDFs using this template
    return s.cache.InvalidateByTemplate(ctx, id)
}
  1. Manual Invalidation: Admins can manually invalidate cache for a specific document via:
DELETE /v1/documents/{id}/cache

Invalidation Implementation

go
// Invalidate all cached versions of a document
func (c *DocumentCache) Invalidate(ctx context.Context, documentID int64) error {
    audiences := []string{"patient", "specialist", "admin"}
    for _, audience := range audiences {
        key := buildS3Key(documentID, audience)
        if err := c.s3.DeleteObject(ctx, key); err != nil {
            return err
        }
    }
    return nil
}

// Invalidate all documents using a specific template
func (c *DocumentCache) InvalidateByTemplate(ctx context.Context, templateID int64) error {
    // Query all document_ids using this template
    docIDs, err := c.store.GetDocumentIDsByTemplate(ctx, templateID)
    if err != nil {
        return err
    }

    // Invalidate each document
    for _, docID := range docIDs {
        if err := c.Invalidate(ctx, docID); err != nil {
            return err
        }
    }
    return nil
}

Performance Metrics

Expected Cache Hit Ratio

ScenarioCache Hit RatioNotes
Production (typical)~80%Most requests are for signed documents
Development/Testing~20%Frequent template changes, preview requests
Initial rollout~0%Cold cache, all first requests

PDF Generation Performance

MetricTargetTypical
Cache hit response time< 100ms50-80ms (S3 redirect)
Cache miss (first generation)< 1s400-700ms
Preview (unsigned, no cache)< 1s400-700ms
Concurrent render limit3Prevents OOM

Resource Usage

ResourcePer RenderNotes
Memory50-100MBChrome tab overhead
CPU1-2 coreschromedp render
Time200-500msTypical document
S3 storage~100KBAverage PDF size

Concurrency Control

Semaphore-Based Limiting

To prevent resource exhaustion from concurrent PDF generation:

go
const maxConcurrentRenders = 3

var renderSemaphore = make(chan struct{}, maxConcurrentRenders)

func (s *Service) GeneratePDF(ctx context.Context, ...) ([]byte, error) {
    // Acquire semaphore
    renderSemaphore <- struct{}{}
    defer func() { <-renderSemaphore }()

    // Generate PDF (chromedp render)
    return s.renderer.RenderPDF(ctx, html, opts)
}

This ensures:

  • Maximum 3 concurrent PDF renders
  • Additional requests wait in queue
  • Prevents memory exhaustion
  • Prevents Chrome process overload

Cache Warming

On Form Signing

When a form is signed, proactively generate and cache PDFs for all audiences:

go
func (s *FormService) SignForm(ctx context.Context, formID int64) error {
    // Update form status to 'signed'
    if err := s.store.SignForm(ctx, formID); err != nil {
        return err
    }

    // Get associated document
    doc, err := s.store.GetDocumentByFormID(ctx, formID)
    if err != nil {
        return err
    }

    // Warm cache for all audiences (async)
    go s.warmDocumentCache(context.Background(), doc.ID)

    return nil
}

func (s *FormService) warmDocumentCache(ctx context.Context, docID int64) {
    audiences := []string{"patient", "specialist", "admin"}
    for _, audience := range audiences {
        // Generate and cache PDF (errors logged, not returned)
        if _, err := s.documentService.GeneratePDF(ctx, docID, audience); err != nil {
            log.Error("cache warming failed", "doc_id", docID, "audience", audience, "error", err)
        }
    }
}

On Template Update

After updating a template, optionally warm cache for recent documents:

go
func (s *TemplateService) UpdateTemplate(ctx context.Context, id int64, updates TemplateUpdate) error {
    // Update template
    if err := s.store.UpdateTemplate(ctx, id, updates); err != nil {
        return err
    }

    // Invalidate existing cache
    if err := s.cache.InvalidateByTemplate(ctx, id); err != nil {
        return err
    }

    // Warm cache for recently accessed documents (optional)
    go s.warmTemplateCache(context.Background(), id)

    return nil
}

Monitoring and Alerts

Key Metrics to Track

  1. Cache hit ratio: cache_hits / (cache_hits + cache_misses)
  2. Average generation time: P50, P95, P99 of PDF generation duration
  3. Cache invalidations: Number of invalidations per day
  4. Failed generations: Count of PDF generation errors
  5. S3 storage usage: Total size of cached PDFs
  • Cache hit ratio drops below 70%
  • P95 generation time exceeds 1 second
  • Failed generations exceed 1% of requests
  • Concurrent render queue length exceeds 10

Cache Cleanup

Automatic Cleanup

Implement a periodic cleanup job to remove:

  • Cached PDFs for deleted documents
  • Cached PDFs older than retention policy (e.g., 90 days for unpublished drafts)
go
func (c *DocumentCache) Cleanup(ctx context.Context, retentionDays int) error {
    cutoff := time.Now().AddDate(0, 0, -retentionDays)

    // List all objects in cache prefix
    objects, err := c.s3.ListObjects(ctx, "documents/")
    if err != nil {
        return err
    }

    // Delete objects older than cutoff
    for _, obj := range objects {
        if obj.LastModified.Before(cutoff) {
            if err := c.s3.DeleteObject(ctx, obj.Key); err != nil {
                log.Error("cleanup failed", "key", obj.Key, "error", err)
            }
        }
    }

    return nil
}

Best Practices

  1. Cache signed documents only - Don't cache preview PDFs for in-progress forms
  2. Invalidate on template change - Ensure cached PDFs reflect latest template
  3. Warm cache on signing - Pre-generate PDFs when form is signed
  4. Monitor cache hit ratio - Target 80%+ for production workloads
  5. Limit concurrent renders - Prevent resource exhaustion
  6. Use S3 lifecycle policies - Auto-delete old cached PDFs
  7. Track generation time - Alert on performance degradation
  8. Test cache invalidation - Verify invalidation works correctly

Future Optimizations

Not in scope for initial release:

  1. CDN caching - CloudFront in front of S3 for global distribution
  2. Progressive rendering - Stream PDF generation for faster time-to-first-byte
  3. Pre-rendering - Generate PDFs in background immediately after form completion
  4. Compression - Compress PDFs with Ghostscript or similar
  5. Lazy image loading - Only embed images when generating specific audience version
  6. Template compilation cache - Cache compiled Go templates in memory
  7. Browser context pooling - Reuse Chrome browser contexts across renders