← All posts
Healthcare AI

Building clinical AI that passes regulatory review

Clinical AI doesn't fail review because the model is weak. It fails because the system around it can't answer the questions a reviewer asks: how was it validated, what happens when it's uncertain, and who is accountable.

Reviewers don't grade accuracy — they grade evidence

A high benchmark score is table stakes. What review actually probes:

  • Intended use — narrow, written, and matched by your validation set.
  • Validation — on representative data, with subgroup performance, not one aggregate number.
  • Human oversight — where a clinician stays in the loop, and how.
  • Traceability — every decision reconstructable after the fact.

Design for the audit from day one

Retrofitting traceability onto a shipped model is the most expensive mistake I see.

Bake it in: versioned models and prompts, immutable decision logs, and a clear "uncertain → defer to human" path. If you can't replay why the system said what it said, you don't have a clinical system — you have a liability.

The shortlist

  1. Write the intended-use statement first; let it constrain everything.
  2. Validate on representative data, report subgroups.
  3. Log every input, output, and model version, immutably.
  4. Make the uncertainty path a feature, not an afterthought.

Pass these and the framework — CQC, FDA, CE — becomes paperwork around a system that was already built to be defensible.