With Great Power Comes Great Error
- Nitish Kannan

- 7 hours ago
- 5 min read
The Illusion of Progress
We like to assume AI is perfecting modern medicine as it churns through pathology slides and scans. But the numbers don’t tell the whole story. Beneath the massive output lies a reliability crisis we are ignoring for the sake of convenience. We have built a healthcare system that relies on “black box” models that can’t explain themselves. That opacity transforms simple glitches into dangerous failures. Three hundred million daily images is an impressive stat (World Health Organization, 2025). It implies progress. But until we solve the trust problem, we are just guessing at scale.
To Outperform
AI dominates where human pattern recognition hits a wall. These models dig through medical images to spot tiny irregularities like lung nodules with a sensitivity that hits 98% in controlled tests, which significantly outperforms the 70% average of human radiologists (Topol, 2019). The software doesn’t need sleep. It processes data instantly.
The early wins are genuinely impressive. We see algorithms that can grade diabetic eye disease as accurately as top specialists, potentially saving vision for millions (Ardila et al., 2019). For a healthcare system running on fumes, this technology looks like a savior. It offers precision without the burnout.
The Dataset Trap
But the polished marketing hides a messier reality. These systems fail when they leave the lab. A famous Google Health initiative in Thailand boasted 90% accuracy in theory. But when deployed in rural clinics, it rejected over 20% of patient images simply because the lighting was poor or the internet was slow. The model didn’t just work less well, it stopped working entirely. This is the dataset trap. We train algorithms on pristine scans. The moment they meet the messy reality of a busy hospital, the “superhuman” doctor panics (Beam & Kohane, 2018).
A Step Toward Self-Aware Software
In 2025, researchers at Johns Hopkins Medicine tried to tackle this trust gap with a new framework called MIGHT. The idea is simple but critical: it forces AI to wave a red flag when it sees data it doesn’t recognize which is a step toward building self-aware software that understands its own limitations (Johns Hopkins Medicine, 2025).
Even so, these advances remain vulnerable to data drift which are gradual changes in medical imaging equipment or population health trends that erode model performance over time (Ross et al., 2022). Reliability in diagnostics, as it turns out, is not a one-time achievement but a continuous process of recalibration.
Regulators in the Rearview Mirror
Regulators claim they are keeping pace with the tech industry, but the reality looks more like a scramble to stay relevant. The FDA points to a stack of 500 approved AI medical devices as proof of progress, but don’t let the numbers fool you. Most of these systems are frozen in time. They are “locked” algorithms, forbidden from learning or adapting once they hit the market (U.S. Food and Drug Administration, 2025). While this keeps the software predictable, it also guarantees the technology remains stunted.
The alternative is a regulator’s nightmare: autonomous models that evolve on their own. That introduces the risk of “drift,” where a diagnostic tool quietly changes its behavior in ways no one predicted and no one is watching.
Europe’s Red-Tape Response
Across the Atlantic, Europe is trying to strangle the risk with red tape. The Medical Device Regulation (MDR) framework treats many of these AI tools as high risk entities. It slaps them with labels and demands that developers open up their black boxes to prove the training data isn’t garbage (Minssen & Gerke, 2024).
It sounds like a solid plan for accountability. In practice, however, it’s a slow moving attempt to police an industry that rewrites its own rules every few weeks. By the time oversight committees finish their reviews, the software they are judging is already ancient history.
The Human Cost of Blind Trust
We are constantly told that trust is a numbers game, but that is a total misconception. Patients never see the technology churning in the background, they only face the consequences when an algorithm decides their fate. When a machine claims it found a tumor on a clear X-ray, it isn’t just making a calculation error, it is also actively changing the doctor’s judgment. The clinician stops relying on their eyes and starts deferring to the ghost in the machine.
Ethicists like to throw around big words like “interpretability,” yet the problem is more tangible. It comes down to basic liability. A doctor cannot justify slicing into a patient just because a black box told them to do it. Unless these systems are forced to show their work, they will erode medical confidence. We don’t need perfect code nearly as much as we need a way to keep human accountability from disappearing entirely (Mittelstadt et al., 2025).
Training Tomorrow’s Referees
For medical students, the arrival of AI is both a curriculum update and a warning shot. Tomorrow’s doctors will be stuck refereeing arguments between biological facts and algorithmic guesses. If they don’t learn to sniff out bad math now, they become dangerous in the clinic.
Schools are finally scrambling to catch up, tossing in ethics modules after realizing that sending graduates out without tech literacy is a liability. But sitting in a lecture isn’t enough. We need a generation of doctors who learn to break these tools and demand hard evidence rather than blindly accepting the shiny new toy the hospital admin bought.
Coding Confidence Into the Curriculum
You can see this panic turning into strategy at top institutions. Harvard Medical School is rolling out an AI in Medicine PhD track for “computationally enabled” students who can write the code themselves which is an attempt to create specialists who don’t need to trust the black box because they built it.
UVA School of Medicine is taking an equally rigorous but clinically focused approach. Their curriculum integrates AI directly into diagnostic reasoning exercises with fictitious patients. The goal is integrating coding skills and mastering the output. They are teaching students to engineer prompts and scrutinize results, ensuring that the physician remains the pilot rather than a passenger.
The Real Test of Intelligence
The promise of medical AI is not to replace clinicians but to rather improve their judgment. It has the potential to become a second opinion grounded in data rather than instinct. Yet for that partnership to work, trust must be earned and not assumed. Efforts like the MIGHT framework and evolving global regulations show that reliability should be a continuous commitment.
We keep hearing that AI is just here to “refine” clinical judgment. But a second opinion is only valuable if it isn’t hallucinating. Trust isn’t something we should hand over just because the software is fast. It has to be earned through rigorous, borderline paranoid validation. The real test is whether we can build a system transparent enough that we never have to take its word on blind faith.
Nitish Kannan studies Biology at the University of Virginia and likes to think of science as a kind of translation between cells and stories as well as signals and noise. He writes to make that translation a little clearer, hoping to bridge the gap between how we understand science and how we live it.
References
Ardila, D., Kiraly, A. P., Bharadwaj, S., et al. (2019). End-to-end lung cancer screening with three-dimensional deep learning on low-dose chest CT. Nature Medicine, 25(6), 954–961.
Beam, A. L., & Kohane, I. S. (2018). Big data and machine learning in health care. JAMA, 319(13), 1317–1318.
Johns Hopkins Medicine. (2025, August). New method advances reliability of AI with applications in medical diagnostics.Retrieved from https://www.hopkinsmedicine.org/news/newsroom/news-releases/2025/08/new-method-advances-reliability-of-ai-with-applications-in-medical-diagnostics
Minssen, T., & Gerke, S. (2024). Regulatory frameworks for AI in medical devices: Current gaps and future directions.Journal of Law and the Biosciences, 11(2), lsae007.
Mittelstadt, B. D., et al. (2025). The ethics of trustworthy AI in healthcare. Neurocomputing, 602, 128–140.
Ross, J., et al. (2022). Challenges to implementing artificial intelligence in healthcare. BMC Health Services Research, 22, 1284.
U.S. Food and Drug Administration. (2025). Artificial intelligence and machine learning (AI/ML)-enabled medical devices. Retrieved from https://www.fda.gov/medical-devices/artificial-intelligence-and-machine-learning-software-medical-devices
World Health Organization. (2025). AI in medical imaging: Global update report 2025. Retrieved from https://www.who.int




Comments