Human-in-the-loop: why AI should advise doctors, never replace them

Every few months, a headline appears: “AI outperforms doctors at diagnosing X.” The story usually cites a study where a model achieved 94% accuracy on a curated dataset while a panel of physicians averaged 87%. The implication is clear — the machines are better now. Hand over the stethoscope.

We think this framing is dangerously wrong.

Not because the accuracy numbers are fabricated — they are usually real. But because accuracy on a test set and clinical judgement are not the same thing. They are not even close.

What a benchmark does not measure

A diagnostic benchmark gives a model a clean image or a structured set of symptoms and asks: what is the most likely condition? The model is very good at this. Pattern matching at scale is exactly what neural networks were designed to do.

But a real clinical encounter is not a pattern matching exercise. A real encounter includes a 67-year-old woman who insists her chest pain is “just indigestion” because she is terrified of hospitals. It includes a teenager who will not make eye contact and whose symptoms do not add up until you realise they are afraid to talk in front of their parent. It includes the subtle tremor in someone's hand that is not in their chart and is not part of their complaint but tells an experienced physician something important.

No model captures this. No benchmark measures it.

The copilot model

At HealNote, we use what we call the copilot model. The AI processes structured data — intake forms, symptom patterns, medical histories, lab values — and presents the physician with a clear, organised view. It may highlight patterns. It may surface relevant literature. It may flag that three of the patient's symptoms commonly co-occur with a condition the doctor has not yet considered.

But it does not diagnose. It does not prescribe. It does not decide.

The doctor reads the AI's analysis the way a pilot reads an instrument panel — as one source of information among several, filtered through years of training and the irreplaceable context of being in the room with another human being.

The best clinical AI is not the one that is most accurate. It is the one that makes the doctor most effective.

Why this is harder to build

Building an autonomous diagnostic engine is, paradoxically, easier than building a good copilot. An autonomous system just needs to be right. A copilot needs to be right, and trustworthy, and transparent, and fast, and non-intrusive, and aware of its own uncertainty.

When HealNote's AI is not confident about something, it says so. Explicitly. We do not hide uncertainty behind a polished interface. If the model thinks there are three plausible interpretations of a symptom set, it shows the doctor all three with its reasoning. The doctor then applies the context that no model has — the patient sitting across from them — and makes the call.

Confidence scores are visible, not hidden
Differential suggestions are ranked but never singular
The AI explains its reasoning in plain language
Doctors can override, annotate, and teach the system

The trust equation

There is a practical reason for this approach beyond ethics: trust. A doctor who does not trust their tools will not use them. And they should not. A physician who blindly follows an AI's suggestion is not practising medicine — they are rubber-stamping an algorithm.

We have watched doctors interact with HealNote in early deployments. The ones who trust it most are the ones who have seen it be transparent about uncertainty. When a tool admits what it does not know, professionals trust it more with what it does know. That is how trust works between humans too.

The line we will not cross

There will be pressure — from investors, from the market, from competitors — to make the AI more autonomous. To let it order tests. To let it suggest prescriptions. To move from “copilot” to “autopilot.”

We have decided where the line is, and we are not crossing it. The physician decides. Always. The AI informs, organises, surfaces, highlights, and explains. The human heals.

This is not a limitation of our technology. It is the foundation of our philosophy. Medicine is a relationship between two people. AI's job is to make that relationship better — not to replace half of it.