All work
Applied AI · Clinical Documentation

Clinical GenAI
Agent & Analytics

Designed to turn messy dictated rehab notes into structured recovery metrics — with validation, audit, and human review built in.

Dictation
API
Extract
Storage

01 · Problem

Recovery data is trapped inside free text.

Rehab documentation is largely dictated. Abbreviations, incomplete sentences, and inconsistent terminology mean recovery progress sits in prose, not in fields a team can analyze.

Manual review of those notes is slow and inconsistent. Operational and clinical teams can't easily compare patient progress, visit outcomes, or therapy indicators across sites or time.

Free-text dictation

No structured analytics

Slow manual review

Lloyd worked 9+ years inside hospital rehab and clinical operations — the documentation patterns and workflow constraints are first-hand domain knowledge.

02 · System architecture

From dictation to dashboard, with humans in the loop.

A pipeline that respects clinical context — validation gates, audit logs, and a review queue before anything goes downstream.

The spine

Dictated note

React intake

FastAPI

Python orchestration

Preprocess

De-id + chunk

LLM extraction

Schema-constrained

Validation

Rules + confidence

PostgreSQL

Time-series + audit

Review queue

Low-confidence

Analytics

Recovery trends + KPIs

Reads downstreamRecovery KPIsTime-series trendsAudit trailReview backlog

Backend

FastAPI + Pydantic. Async orchestration; every stage gates the next.

Privacy

De-identification strips identifiers before any model call. Synthetic data only here.

LLM layer

Schema-constrained extraction with per-field confidence; structured outputs reduce retries.

Storage

PostgreSQL with normalized metrics, time-series observations, and an append-only audit log.

Designed pipeline. No real patient data; all examples in this case study are synthetic.

03 · Extraction in action

Watch a messy note become structured data.

Three stages, one synthetic note. Step through it or let it run.

Dictated note

Patient 042 · synthetic · Session 04

Pt ambulated 30m with rolling walker, contact-guard assist. Pain 2/10 at rest, 3/10 with weight-bearing. Tolerated full 45-min session, no signs of fatigue. Transfers sit-to-stand independent. Plan: progress to single-point cane next visit, discharge planning to begin within two weeks.

Structured extraction

JSON output

Mobility levelambulatory · walker0.94
Assistance requiredcontact-guard0.92
Gait distance (m)300.96
Pain score30.90
Therapy tolerancetolerated full session0.91
Discharge readinessapproaching · 2 weeks0.88

Validation & routing

Validation

Schema valid
Ranges valid
Required fields
Confidence ≥ 0.85

Routing

→ PostgreSQL

1. Note A dictated rehab note arrives via the intake form. Free text, dictation patterns, abbreviations.

Interactive prototype · synthetic example · no PHI.

04 · Data model & analytics

Time-series recovery, modeled for analysis.

Normalized observation tables, append-only audit, and a review queue — designed so recovery trends are a query, not a project.

patients_demo

Demographic anchor — synthetic only.

idcohortenrolled_at

rehab_notes

Raw dictated note bodies.

idpatient_iddictated_atsource

extracted_metrics

Per-note structured output.

note_idfieldvalueconfidence

metric_observations

Time-series flatten for analytics.

patient_idmetricvalueobserved_at

extraction_audit_log

Append-only audit trail.

note_idstagemodel_versiontokensat

review_queue

Routed low-confidence extractions.

note_idreasonstatusrouted_at

Recovery trends

Patient 042 · synthetic · 8-session window

Avg pain score

2.8

-1.4

Gait distance Δ

+22m

+18m

Therapy tolerance

94%

+11%

Discharge readiness

0.81

+0.22

Recovery curve

Composite recovery index, sessions 1–8 (synthetic).

FiltersCohort · ortho post-opSessions 1–8Confidence ≥ 0.6

Mock dashboard — synthetic data. The real pipeline targets Tableau or Power BI as the analytics surface.

05 · Cost-aware design

Engineered for fewer, cheaper calls.

Token cost is a design constraint, not an afterthought.

Prompt compression

Strip filler, normalize abbreviations, drop redundant context before the call.

Caching repeated patterns

Hash-keyed cache for recurring note shapes — same input, no second call.

Split extraction from validation

Validation runs on structured output, not by re-prompting the model.

Schema-constrained output

Structured outputs cut retries; malformed responses are caught at parse, not by another call.

Design choices, not measured savings. The pipeline is designed to reduce unnecessary token usage and repeated model calls.

06 · Engineering challenges

Where the work actually was.

Five concrete problems this pipeline is built to handle.

01

Messy clinical language

Dictated notes contain abbreviations, incomplete sentences, and inconsistent phrasing. The extractor has to be tolerant without inventing data.

02

Data integrity

Invalid or missing metrics are flagged before being stored. Validation gates run on every extraction.

03

Analytics readiness

Free-text recovery descriptions become normalized metrics suitable for time-series analysis.

04

LLM cost control

Caching, compressed prompts, and structured outputs are layered to reduce unnecessary API calls.

05

Human review

Low-confidence extractions route to a review queue. The pipeline never treats unsure as fact.

07 · What this demonstrates

One project, eight capabilities.

Designed end-to-end so each layer is legible in isolation.

Healthcare AI
Healthcare data engineering
Clinical workflow understanding
PostgreSQL modeling
Python backend (FastAPI)
LLM system design
Analytics dashboarding
Responsible AI design

9+ yrs

Hospital rehab + clinical ops background

8

Pipeline stages, human-in-the-loop

0

PHI — synthetic data only

Operator audiencesRehab clinicsTherapy networksHealthcare AI teams

Documentation that knows when to ask for a human.

Happy to walk through the extraction schema, the validation policy, and what a production FHIR integration would look like.