Building AI diagnostics that clinicians actually trust

There is a graveyard of clinical AI products that were technically impressive and completely unused. They had strong papers behind them. They had accuracy numbers that would make any researcher proud. And they sat untouched in clinic workflows because nobody trusted them enough to change how they worked.

We studied these failures obsessively before building our own diagnostic tools. The pattern was always the same: brilliant model, terrible integration, zero understanding of how doctors actually think.

How doctors actually think

Clinical reasoning is not a decision tree. It is not “if symptom A and symptom B, then diagnosis C.” It is a process of progressive refinement — doctors form a mental model early, then update it continuously as new information arrives. They hold multiple hypotheses in parallel. They weigh not just probability but consequence: a 5% chance of something deadly gets more attention than a 60% chance of something benign.

Most clinical AI ignores this. It takes inputs, runs inference, and outputs a ranked list of diagnoses. The doctor looks at the list, thinks “I already considered that,” and ignores it. Or worse — the AI suggests something the doctor had not considered, but gives no reasoning, so the doctor has no way to evaluate it. It is just a label with a confidence score. That is not useful. That is a black box wearing a white coat.

What we built instead

HealNote's diagnostic assistance works differently. Instead of outputting answers, it outputs reasoning.

When a patient's intake data is processed, the system generates what we call a “clinical brief.” This is not a diagnosis — it is a structured summary of what the data suggests, organised the way a physician thinks:

Primary symptom cluster and duration
Relevant patterns from medical history
Differential considerations with supporting evidence
Flags for urgent or time-sensitive presentations
Explicit gaps — what the data does not tell us

The last point is the most important one. When the AI highlights what it does not know — “no family history was provided,” “duration of symptoms is ambiguous,” “medication list may be incomplete” — it gives the doctor specific questions to ask. It turns uncertainty into actionable next steps.

The most useful thing an AI can tell a doctor is not what it thinks. It is what it does not know — and what the doctor should ask next.

The speed problem

A diagnostic tool that takes thirty seconds to load is a diagnostic tool that no one uses. In a clinic seeing forty patients a day, every second matters. This is not hyperbole — we timed it. Doctors will wait about three seconds for a tool to respond before they move on and do things the old way.

Our clinical brief generates in under two seconds. It is ready before the doctor opens the patient's chart. By the time they sit down with the patient, they have already scanned the summary and have a starting framework for the conversation.

This is the difference between a tool that interrupts a workflow and one that enhances it. The doctor does not have to stop, navigate to a separate screen, wait for inference, and interpret results. The information is simply there, in the right place, at the right time.

Learning from disagreement

When a doctor's assessment differs from the AI's suggestion, that is not a failure — it is a learning opportunity. We track these disagreements carefully. Sometimes the doctor saw something the model missed. Sometimes the model surfaced something the doctor had not considered but, upon reflection, finds valid.

Over time, this feedback loop makes the system better — not by overriding clinical judgement, but by learning what kinds of patterns are most useful to highlight and what kinds of suggestions are most often ignored.

The system gets smarter by watching doctors be doctors. Not by replacing them.

Clinical AI that is not trusted is clinical AI that does not exist. We would rather build something that doctors use every day and find genuinely helpful than something that wins benchmarks and gathers dust. The measure of our success is not our accuracy score — it is whether the doctor's day got better.