Skip to content
Courtix
Healthtech

HIPAA-aligned OCR and extraction pipeline for a clinical lab network

A multi-tenant document intelligence platform that ingests scanned lab test requisitions, extracts structured data with OCR and LLM-assisted mapping, and returns auditable, HIPAA-aligned records to the client's lab information system.

TypeScriptReactPythonDjangoCeleryPostgreSQLRedisAWS TextractAWS S3AWS KMS
Client
Clinical laboratory group (US)
Year
2025
Duration
9 months
Outcome
HIPAA-aligned pipeline handling requisitions across every US state

The challenge

A clinical laboratory network was manually keying hundreds of scanned test requisitions per day into their lab information system. Every form came from a different clinic on a different template, in wildly varying quality: handwritten notes, faxes, photos taken on a phone at a patient bedside. Staff turnover and transcription errors were pushing operational cost up and making every HIPAA audit harder to pass.

They needed a platform that could ingest any document, extract structured data reliably, and produce an auditable record for every field decision — without sending PHI anywhere it wasn't supposed to go.

Our approach

  • Designed a multi-tenant architecture where every client organisation is fully isolated at the database, storage and audit-log layer.
  • Built a staged pipeline: preprocessing (deskew, denoise, normalise), OCR extraction via AWS Textract, LLM-assisted field mapping through a managed inference layer, and structured output validation with schema-level constraints.
  • Wrapped every step in a Celery job with retry, idempotency and per-document audit trails. Every field that lands in the lab system carries a trace back to the source region of the scanned document.
  • Hardened the platform against the HIPAA Security Rule from day one: encryption at rest via KMS, signed-URL S3 access, tenant-scoped access control, tamper-evident audit logging, and a documented data retention and deletion process.

Architecture highlights

  • Django REST API + React/TypeScript frontend
  • Celery + Redis for durable, retryable document processing
  • PostgreSQL with per-tenant schema isolation and row-level access checks
  • AWS Textract + managed LLM APIs behind a pluggable inference layer
  • Append-only audit log with per-document traceability
  • Infrastructure as code, environment parity from dev to prod

Outcome

  • HIPAA-aligned architecture reviewed by the client's compliance team and operating in production
  • Every extracted field is traceable back to the scanned source region, signed off by user or automated rule
  • Throughput capacity matched the lab network's sustained daily volume, with headroom for seasonal spikes
  • Staff time on manual data entry dropped dramatically; operations team moved to exception handling instead of keying

(Specific accuracy and throughput numbers are redacted per engagement; reference calls available on request.)

Ready when you are

Let's build something that ships.

Tell us about your project. A senior engineer will reply within one business day, no pitches, no forms-before-forms.