Doing well what shouldn't be done
Economists say there's nothing worse than doing well something that shouldn't be done. In the healthcare sector, the classic example is "cutting off the ears of the entire population." It's something that shouldn't be done, but it could be done, and done well.
That is, cutting off the ears of literally everyone, from birth, and doing it without complications, with excellent anesthesia and surgical technique, without infections or disfiguring scars, and reaching the entire population, even in the most remote corners of the country.
We can imagine the pride of politicians, managers, and healthcare professionals for efficiently carrying out such a complex task. Without caution, this example could soon be applied in the healthcare sector itself with the use of "artificial intelligence."
"Scribes"
In Catalonia (and other places around the world), the use of "scribes" is being implemented.
These "artificial intelligence" programs are capable of recording the consultation conversation between a healthcare professional and a patient, saving time and automatically recording it in the electronic medical record. But what's the point of recording this conversation in the electronic medical record? Barely 5% of the record of the entire doctor-patient encounter is useful. The rest is garbage that floods electronic medical records, hindering the best healthcare.
"Scribes" can increase the volume of garbage and noise in clinical records and lead to worse health outcomes.
Diagnoses
Doctors face various difficulties in diagnosing, and in many cases, "artificial intelligence" can help improve the process and achieve more and better diagnoses. But physicians should use only accurate and timely diagnoses and know how to "not diagnose" when diagnosing does not improve care or prognosis.
For example, in hospital emergency rooms, almost 40% of abdominal pain cases are "resolved" without a final diagnosis. And obtaining a diagnosis, for example, in nonspecific abdominal pain in adolescence, can lead to tests that border on cruelty and cruelty, without adding anything to the patient's progress.
"Improving the diagnostic process" with "artificial intelligence" can be harmful to patients and populations. We have a serious problem of "overdiagnosis" in clinical care, and the use of "artificial intelligence" can lead us to exacerbate the problem through the resulting unnecessary "therapeutic cascades." This is what we call "the tyranny of diagnosis."
More Isn't Always Better
Technological fascination is leading to an uncritical acceptance of the use of "artificial intelligence" in medicine, with a disturbing assumption that more is better. When the first automobiles began to circulate, they were seen as "modernity" and occupied public spaces with the approval of authorities and citizens. Over the years, they became a public health problem due to environmental pollution and the reduction of living spaces.
Today, restrictions on automobiles are common in cities, and it is astonishing how easily it was accepted that more is better in this sector of transportation. This rejection has led, for example, to Paris promoting car-free streets with trees and gardens, with up to 60% of the total area being car-free, and 40% of the population having given up car ownership.
Will We Learn?
Artificial intelligence has its applications, but it's important to learn from other "modern technologies" that have preceded it and be rational in its use. The benefits promised by artificial intelligence can be harmful illusions. Technological dazzle must not fool us into believing that more is better when it comes to artificial intelligence applications in the healthcare sector.
NOTE
"Artificial intelligence," like all resources in medicine, has a rational use. Considering, of course, advantages and disadvantages, for example, regarding "scribes" in medical records
And not attributing real "intelligence" to it, which it lacks
In the rush and blindness to the medical applications of "artificial intelligence," including scribes, follow the money: USA "The global market for AI in healthcare, including medical scribes, is projected to reach $45.2 billion by 2026" https://www.deepcura.com/post/navigating-the-future-insights-into-the-ai-medical-scribe-market
Addendum
7 August 2025
UK. GPs should report all suspected inaccuracies caused by AI to the regulator’s yellow card scheme, which is used to pharmacovigilance of adverse incidents and safety concerns over medicines and medical devices. (here).
NOTE
On December 6, 2025, we received an email which we are reproducing with the author's permission, changing the necessary elements to protect her identity. It reads:
"I'm Rosa Muñoz, you may not remember me. I'm a Family Physician. We've exchanged greetings before the pandemic, and I've participated in several Primary Care Innovation Seminars, but I don't think we've crossed paths since then. Anyway, since I read your post on the blog 'Health, Money, and Primary Care' from June 7, 2025, about 'Artificial Intelligence,' I felt like saying hello. And while I'm at it, I wanted to let you know that I'm currently working in a hospital where a scribe is mandatory. It's having problems, as you can imagine." From not coding primary care diagnoses to "hallucinating" and transcribing things that weren't said, while omitting important data—because it records bio data fairly well, but ignores psychosocial aspects—I also don't like that all the case histories now seem the same, and I can't identify my clinical "style." I found your point about only 5% of the record of the entire doctor-patient encounter being useful, and the rest being "garbage or noise," quite interesting. I also found the idea of using a yellow card to report adverse effects of medical devices, in addition to medications, very interesting.
From Twitter (X), 17 Dec 2025 https://twitter.com/sdho/ status/2001328800647561216
My doctor at Allina [a nonprofit health care system based in Minneapolis, Minnesota, United States] showed me a new AI tool that they're using to summarize medical histories. It can review your whole chart and provide a human-readable summary. He copy-pasted the summary into my notes for me to look at later.
The errors were astounding.
The AI bot falsely claimed I had a history of sleep apnea (nothing even resembling that).
It claimed I was "diagnosed with heart disease" on a specific date, because that was a date a test to *rule out heart disease* was ordered (and it was ruled out).
It mischaracterized a real spinal issue with the wrong diagnosis.
Because I once had antibiotics for a bullseye tick bite, it said I had a clinically significant "history of rash".
I requested he remove or correct this in my note, but I am really concerned how errors like this could compound over time.
This visit's misinformed summary goes into my record that will then be further misunderstood the next time the bot reviews my info.
To clarify, this was not an issue with the voice-based note-taking app.
This was specifically a tool that reviews chart history and summarizes.
There's no record in the note what's AI-generated versus human-written, so I don't have faith it's going to identify this as particularly untrustworthy.
The errors were astounding.
The AI bot falsely claimed I had a history of sleep apnea (nothing even resembling that).
It claimed I was "diagnosed with heart disease" on a specific date, because that was a date a test to *rule out heart disease* was ordered (and it was ruled out).
It mischaracterized a real spinal issue with the wrong diagnosis.
Because I once had antibiotics for a bullseye tick bite, it said I had a clinically significant "history of rash".
I requested he remove or correct this in my note, but I am really concerned how errors like this could compound over time.
This visit's misinformed summary goes into my record that will then be further misunderstood the next time the bot reviews my info.
To clarify, this was not an issue with the voice-based note-taking app.
This was specifically a tool that reviews chart history and summarizes.
There's no record in the note what's AI-generated versus human-written, so I don't have faith it's going to identify this as particularly untrustworthy.
On May 12, 2026, the Ontario (Canada) Auditor General's Office published a report on the use of "artificial intelligence" (AI) in medical consultation.
Sick and wrong: Ontario auditors find doctors' AI note takers routinely blow basic facts 60% of evaluated AI Scribe systems mixed up prescribed drugs in patient notes, auditors say Brandon Vigliarolo 38 Published Thu 14 May 2026 // 20:50 UTC The AI systems approved for Ontario healthcare providers routinely missed critical details, inserted incorrect information, and hallucinated content that neither patients nor clinicians mentioned, according to a provincial audit of 20 approved vendors’ systems.
An auditor for the Ontario, Canada government found that AI agents tasked with turning doctor/patient conversations into structured notes routinely hallucinated false treatments, replaced drug names with entirely different drugs, and missed crucial information.
An auditor for the Ontario, Canada government found that AI agents tasked with turning doctor/patient conversations into structured notes routinely hallucinated false treatments, replaced drug names with entirely different drugs, and missed crucial information.
Carl Heneghan and Tom Jefferson. May 23, 2026
The AI Scribe Illusion: Why More Technology May Mean More Work for Doctors
Every hallucination, omission and error still belongs to the doctor
https://trusttheevidence. substack.com/p/the-ai-scribe- illusion-why-more-technology
The two old geezers in the TTE Office have long understood that medicine is not merely the accumulation of information, but the disciplined filtering of it. The experienced GP does not write down every utterance in a consultation; he selects the salient features, interprets any ambiguity and discards the noise. Precisely why the current enthusiasm for AI medical scribes should worry us.
We are being sold a fantasy of frictionless efficiency: a doctor speaks as the machine listens, producing a perfect record of the consultation, and time is saved. Yet the evidence emerging from early procurement exercises and audits suggests the opposite occurs: longer notes, more inaccuracies, greater medico-legal risk, and ultimately more work for clinicians.
The recent Ontario Auditor General’s report on artificial intelligence in government offers a glimpse into this future.
The review evaluated 20 approved AI scribe systems during the procurement process and found that each demonstrated at least one significant form of inaccuracy. Nine produced outright hallucinations, inventing clinical details or treatment suggestions that had never been discussed, while 12 generated incorrect information, including recording the wrong medications. Seventeen systems omitted key mental health details in at least one test scenario, and six produced persistently incomplete notes across both evaluations.
Far from demonstrating a mature and reliable technology, the findings exposed a strikingly high error rate in systems intended to generate formal medical records, raising serious concerns about patient safety, clinical reliability and the hidden workload created by the need for meticulous human checking of every generated note.
This matters because medical notes are not neutral artefacts; they determine referrals, shape future care, and increasingly form the basis of litigation. A hallucinated phrase stating “no masses found” when no examination occurred is not a trivial software glitch; once entered into the notes, it acquires authority merely by existing as part of the permanent record.
The policy advocates of AI scribes rarely confront the central paradox: every error generated by the machine requires human verification. The doctor, therefore, becomes not merely a clinician but a proofreader of machine-generated prose.
The problem is further compounded by the fact that human beings are poor at detecting subtle inaccuracies embedded in fluent language. A short, concise note written directly by the clinician is often safer than a verbose AI transcript, which requires line-by-line checking. Yet the AI systems are incentivised toward comprehensiveness because developers fear omission more than excess.
A patient may spend ten minutes discussing irrelevant background before, almost casually, revealing the one symptom that truly matters. The experienced GP identifies the signal within the noise, whereas AI systems, by contrast, are designed as statistical pattern recognisers and predictors of likely text sequences.
Consequently, the notes become swollen with conversational debris, as the AI cannot reliably distinguish between a passing thought and a definitive diagnosis.
There is another problem that enthusiasts also prefer to ignore: AI scribes learn from existing documentation, but much of contemporary medical documentation is already poor. Defensive medicine has produced notes saturated with boilerplate, copied phrases, and medico-legal padding. If AI systems are trained on bloated, defensive records, they will only reproduce and amplify them: garbage in, garbage out.
Indeed, one can imagine an escalating cycle in which AI-generated notes grow longer and less clinically meaningful. Junior clinicians trained on these notes develop poorer documentation habits, and the next generation of AI models then trains on this degraded corpus. Over time, the medical record risks becoming less useful precisely because it contains too much superfluous information.
The Ontario audit exposed another uncomfortable reality: these systems were not rigorously evaluated before rollout. Some vendors failed to provide independent security reports or privacy assessments, yet were still approved. Others were assessed using low-weighted criteria for accuracy and bias. Evaluators even noted that systems inaccurately transcribe conversations involving medication names and mental health discussions.
The implications are profound. Medicine depends heavily on language nuance: Accents, idioms, hesitations and emotional tone all carry diagnostic significance. An experienced GP, hearing a patient quietly say, “I’m just not coping anymore,” recognises the importance immediately. An AI scribe may either miss it entirely or over-interpret it into a psychiatric diagnosis that was never intended.
There is also the illusion of cost reduction. Policymakers see doctors spending hours on administrative work and assume that automation will reduce staffing pressures. Yet if clinicians must carefully review every AI-generated note, workload may actually increase.
The central question is not whether the AI made the mistake, but who is responsible when it impacts the patient.
In medicine, responsibility cannot be outsourced to software. The clinician who signs the note, prescribes the drug, or makes the referral still carries professional, ethical, and legal accountability, which changes the economic argument around AI scribes. If the GP remains liable for every hallucination, omission or misleading phrase, then every AI-generated note must be reviewed with the same care as if it had been written from scratch. The machine may type faster, but the doctor still has to think, interpret and verify.
The irony is that the best medical notes are often the shortest: “Chest pain worse on exertion, radiates to jaw, ECG normal, safety-netted,” concisely reflects judgement, prioritisation and clinical synthesis. Something AI struggles with because synthesis requires understanding what not to include.
The promise of AI scribes rests on a misunderstanding of what clinicians actually do. The danger is not that AI scribes will replace doctors; it is that they will slowly bury doctors beneath mountains of plausible but unreliable prose while politicians extol it as efficiency savings.
This was post was written by two old geezers, not by an AI scribe.
Every hallucination, omission and error still belongs to the doctor
https://trusttheevidence.
The two old geezers in the TTE Office have long understood that medicine is not merely the accumulation of information, but the disciplined filtering of it. The experienced GP does not write down every utterance in a consultation; he selects the salient features, interprets any ambiguity and discards the noise. Precisely why the current enthusiasm for AI medical scribes should worry us.
We are being sold a fantasy of frictionless efficiency: a doctor speaks as the machine listens, producing a perfect record of the consultation, and time is saved. Yet the evidence emerging from early procurement exercises and audits suggests the opposite occurs: longer notes, more inaccuracies, greater medico-legal risk, and ultimately more work for clinicians.
The recent Ontario Auditor General’s report on artificial intelligence in government offers a glimpse into this future.
The review evaluated 20 approved AI scribe systems during the procurement process and found that each demonstrated at least one significant form of inaccuracy. Nine produced outright hallucinations, inventing clinical details or treatment suggestions that had never been discussed, while 12 generated incorrect information, including recording the wrong medications. Seventeen systems omitted key mental health details in at least one test scenario, and six produced persistently incomplete notes across both evaluations.
Far from demonstrating a mature and reliable technology, the findings exposed a strikingly high error rate in systems intended to generate formal medical records, raising serious concerns about patient safety, clinical reliability and the hidden workload created by the need for meticulous human checking of every generated note.
This matters because medical notes are not neutral artefacts; they determine referrals, shape future care, and increasingly form the basis of litigation. A hallucinated phrase stating “no masses found” when no examination occurred is not a trivial software glitch; once entered into the notes, it acquires authority merely by existing as part of the permanent record.
The policy advocates of AI scribes rarely confront the central paradox: every error generated by the machine requires human verification. The doctor, therefore, becomes not merely a clinician but a proofreader of machine-generated prose.
The problem is further compounded by the fact that human beings are poor at detecting subtle inaccuracies embedded in fluent language. A short, concise note written directly by the clinician is often safer than a verbose AI transcript, which requires line-by-line checking. Yet the AI systems are incentivised toward comprehensiveness because developers fear omission more than excess.
A patient may spend ten minutes discussing irrelevant background before, almost casually, revealing the one symptom that truly matters. The experienced GP identifies the signal within the noise, whereas AI systems, by contrast, are designed as statistical pattern recognisers and predictors of likely text sequences.
Consequently, the notes become swollen with conversational debris, as the AI cannot reliably distinguish between a passing thought and a definitive diagnosis.
There is another problem that enthusiasts also prefer to ignore: AI scribes learn from existing documentation, but much of contemporary medical documentation is already poor. Defensive medicine has produced notes saturated with boilerplate, copied phrases, and medico-legal padding. If AI systems are trained on bloated, defensive records, they will only reproduce and amplify them: garbage in, garbage out.
Indeed, one can imagine an escalating cycle in which AI-generated notes grow longer and less clinically meaningful. Junior clinicians trained on these notes develop poorer documentation habits, and the next generation of AI models then trains on this degraded corpus. Over time, the medical record risks becoming less useful precisely because it contains too much superfluous information.
The Ontario audit exposed another uncomfortable reality: these systems were not rigorously evaluated before rollout. Some vendors failed to provide independent security reports or privacy assessments, yet were still approved. Others were assessed using low-weighted criteria for accuracy and bias. Evaluators even noted that systems inaccurately transcribe conversations involving medication names and mental health discussions.
The implications are profound. Medicine depends heavily on language nuance: Accents, idioms, hesitations and emotional tone all carry diagnostic significance. An experienced GP, hearing a patient quietly say, “I’m just not coping anymore,” recognises the importance immediately. An AI scribe may either miss it entirely or over-interpret it into a psychiatric diagnosis that was never intended.
There is also the illusion of cost reduction. Policymakers see doctors spending hours on administrative work and assume that automation will reduce staffing pressures. Yet if clinicians must carefully review every AI-generated note, workload may actually increase.
The central question is not whether the AI made the mistake, but who is responsible when it impacts the patient.
In medicine, responsibility cannot be outsourced to software. The clinician who signs the note, prescribes the drug, or makes the referral still carries professional, ethical, and legal accountability, which changes the economic argument around AI scribes. If the GP remains liable for every hallucination, omission or misleading phrase, then every AI-generated note must be reviewed with the same care as if it had been written from scratch. The machine may type faster, but the doctor still has to think, interpret and verify.
The irony is that the best medical notes are often the shortest: “Chest pain worse on exertion, radiates to jaw, ECG normal, safety-netted,” concisely reflects judgement, prioritisation and clinical synthesis. Something AI struggles with because synthesis requires understanding what not to include.
The promise of AI scribes rests on a misunderstanding of what clinicians actually do. The danger is not that AI scribes will replace doctors; it is that they will slowly bury doctors beneath mountains of plausible but unreliable prose while politicians extol it as efficiency savings.
This was post was written by two old geezers, not by an AI scribe.
Authors
Juan Gérvas, retired rural doctor, Equipo CESCA, Madrid, Spain.
jjgervas@gmail.com www.equipocesca.org @JuanGrvas @juangrvas.bsky.social
Mercedes Pérez-Fernández, retired rural doctor, Equipo CESCA, Madrid, Spain.

¡Que verdad!
ResponderEliminarThe concept of "the tyranny of diagnosis" is genuinely eye-opening. It's frightening how AI can compound errors in medical charts without human nuance. While digital tools aim to improve your seo and ppc for doctors, we must ensure technology doesn't replace genuine clinical judgment and empathy.
ResponderEliminarThe metaphor about “doing well what shouldn’t be done” really stuck with me, especially when applied to AI scribes. The idea that efficiency can quietly erase clinical judgment feels real. It made me think about how even a primary care clinic nyc might struggle balancing speed, nuance, and actual patient meaning.
ResponderEliminar