Gurpreet Dhaliwal sat onstage in a hotel ballroom in Minneapolis. The gray curtains behind him were illuminated by bright blue lights, giving the slightest hint of performance at an otherwise typical medical conference. The presentation was among the most anticipated at the Society to Improve Diagnosis in Medicine’s 2022 meeting. The attendees were there to watch a kind of showcase: a complex diagnosis in action.
Dhaliwal, a professor of medicine at UC San Francisco, was given the details of a patient he had never seen before. As another physician slowly revealed pieces of the case, Dhaliwal narrated his thinking out loud: why he was considering one possibility and rejecting another, and what each new clue revealed for him. Eventually, he decided that the patient was likely suffering from a dangerous buildup of pressure in her abdomen. Left untreated, she could experience organ failure. It was the correct diagnosis, and the audience responded with applause.
Dhaliwal is regarded as one of the country’s most gifted diagnosticians. Colleagues have praised not only his command of physiology but also his ability to make his reasoning legible—to turn clinical uncertainty into something teachable. “To observe him at work is like watching Steven Spielberg tackle a script or Rory McIlroy a golf course,” a New York Times reporter wrote in 2012.
“I appreciate the designation but sort of reject it, only because of my own philosophical stance, which is that it’s very hard to master the diagnostic process,” Dhaliwal told me when I talked with him for my book about diagnosis. He considers himself a student of diagnosis, committed to getting better. “To me, the concept of the master diagnostician is that you’re never good enough.”
That belief puts Dhaliwal on one side of a core question of medicine: Are some doctors inherently better diagnosticians than others, or is diagnostic excellence a skill that any clinician can achieve? Doctors usually get it right—some estimates suggest about 90 percent of the time. But with roughly 1 billion physician-office visits each year in America, even a low error rate can still affect a large number of people. A 2023 study estimated that 371,000 people die a year and 424,000 are disabled following a misdiagnosis.
In 2015, the National Academies of Sciences, Engineering, and Medicine published a seminal report on diagnostic error with a startling finding: Most people will experience at least one (such as a delayed, wrong, or missed diagnosis) in their lifetime, “sometimes with devastating consequences.” That report prompted a small but vocal group of physicians and other health providers to look inward. They argue that the number of diagnostic errors is unacceptable and must be improved. Dhaliwal has been part of the movement to figure out how.
Some research suggests that many, if not most, diagnostic errors arise from failures in thinking—cognitive bias, premature closure, insufficient reflection. Accordingly, some researchers frame diagnostic error as largely a problem in clinical judgment: the ability to reason through uncertainty and weigh competing explanations in order to reach the right diagnosis and make decisions about care. “Regrettably, how to think in medicine has been a much‑neglected area for medical educators, who stalled somewhere in the Middle Ages, or a century or two earlier,” Pat Croskerry, a retired professor in emergency medicine at Dalhousie University in Canada who’s known for his work on cognitive errors in the diagnosis, told me.
Dhaliwal credits his own abilities to paying close attention to his own thinking. “I do think you can train yourself to be a better diagnostician,” he said. Early in his training, he closely observed the physicians he most admired. Some of them had a knack for identifying rare diseases that evaded their peers. Others mastered the diagnosis of common conditions so thoroughly that they could recognize every permutation of pneumonia. Dhaliwal wanted to excel at both.
But when he asked physicians how to become that kind of doctor, their advice was usually the same: See a lot. Read a lot. It felt unsatisfying. Every physician sees patients. Every physician reads. What, he wondered, truly separates an exceptional diagnostician from a competent one?
He hung on to this question, and about two years after finishing residency in 2003, during a yearlong faculty-development course for medical educators, he encountered a session on clinical reasoning—an emerging field at the time. The physician and medical historian Adam Rodman has described clinical reasoning as “the study of the ability for expert physicians to see what others don’t.” Researchers were beginning to investigate what actually happens in doctors’ minds when they make diagnoses: how they organize their knowledge and put it into practice. Dhaliwal quickly recognized this as the quality he had seen in his role models, even though “they didn’t have a term for it, and neither did I.” The idea of clinical reasoning helped clarify the process; the next question was how to get better at it.
Dhaliwal laid out the key steps of a doctor’s reasoning process: collecting data from a patient; synthesizing that information; accessing “files” in the mind, including the details about diseases and how they present; listing possible diagnoses; and choosing one over others. He also began studying the science of expertise and how people—whether Nobel laureates, Olympic swimmers, or mechanics—become exceptional in their field. “They seek out challenges, whereas most of us instinctively try to minimize challenges after we’re competent,” he said.
They also learn from their mistakes. In a 2017 paper, Dhaliwal wrote that ordinary people develop “extraordinary judgment by extracting as much wisdom as possible from their inevitable errors,” a lesson he drew from Philip Tetlock and Dan Gardner’s book, Superforecasting: The Art and Science of Prediction. But medicine doesn’t make that easy for doctors, who may treat a patient once and never see them again. If the patient’s condition worsens, or they receive a different diagnosis later on from someone else, that information may never make its way back to the first doctor. With these ideas in mind, Dhaliwal set out to sharpen his skills. Today, he works in the San Francisco VA Medical Center’s emergency room, where he sees a variety of illnesses and necessarily follows that early advice to see a lot of patients. But, crucially, he also started keeping track of his own cases so that he could follow up on what happened. When he discovers he was wrong, he tries to figure out why. Did he miss something important? Was he exhausted at the end of a long shift? Did he anchor himself to a particular conclusion too quickly?
“I started to get kind of addicted to it,” he said. He explained that the mind wants closure; without knowing the outcome, people tend to assume that things turned out well. His habit of tracking down a patient’s outcome echoes advice delivered more than a century ago by William Osler, one of modern medicine’s founding figures: “Learn to play the game fair, no self-deception, no shrinking from the truth; mercy and consideration for the other man, but none for yourself, upon whom you have to keep an incessant watch.” Diagnostic mastery, Dhaliwal illustrates, is not a mysterious gift bestowed on a talented few. It is the result of examining one’s own thinking and practice without mercy.
But the reasoning that goes into diagnosis may start to look very different. Since his third year of medical school, Dhaliwal has read The New England Journal of Medicine’s Clinicopathological Conference, or CPC. The CPC is a teaching exercise in which doctors are presented with a real patient’s case and asked to reason aloud toward a diagnosis, similar to Dhaliwal’s Minneapolis presentation. Last fall, Dhaliwal participated in a CPC that put him in competition with an AI agent called Dr. CaBot, a medical-education tool developed by researchers at Harvard Medical School.
Both Dhaliwal and Dr. CaBot reached the correct diagnosis and explained their reasoning step by step. They correctly concluded that the patient had a problem in the upper part of his digestive system, which caused a bacterial infection to trigger sepsis, among other complications. Dr. CaBot didn’t identify the cause of the problem, whereas Dhaliwal deduced, correctly, that the man had swallowed a toothpick, which poked through his gut and caused the infection. He had seen that kind of case before.
That Dr. CaBot’s problem-solving came as close as it did to Dhaliwal’s is both promising and disconcerting: It suggests that machines may be able to match the performance of elite diagnosticians. More formal evidence also indicates that large language models may be able to approximate the kind of clinical reasoning expected of physicians. One study published in July 2024 found that when OpenAI’s GPT‑4 examined the medical information of 100 patients in an emergency room, the AI was able to diagnose them with 97 percent accuracy, outperforming resident physicians. (OpenAI’s models have advanced since then.) Another study found that ChatGPT scored higher on a clinical-reasoning measure than internal-medicine residents and attending physicians at two academic medical centers. Other studies have been more mixed.
Serious concerns about reliability, sycophancy, and hallucinations remain. But in some ways, what a diagnostician does is not so different from what AI claims to do. Both use enormous amounts of information to recognize patterns in symptoms and diagnoses that tend to appear together. A doctor does this through medical education and personal experience; AI does it by predicting plausible explanations based on statistical patterns it has learned from its training materials.
“This is an electric moment in medicine,” Mark Graber, a physician and co-founder of the nonprofit Community Improving Diagnosis in Medicine, told me. “If you can come up with an AI agent that’s as good as Gurpreet Dhaliwal, that is an amazing accomplishment that will surpass the abilities of 99.9 percent of doctors.”
How medicine embraces any of this is an open question. Perhaps AI will strengthen clinicians’ reasoning and close the gap between the Dhaliwals and everyone else. Or it could become a crutch for clinicians, and lead them to lose skills. A 2025 study found that after just three months of using an AI tool to find precancerous growths during colonoscopies, doctors were less likely to identify the growths on their own.
For his part, Dhaliwal is equanimous. “I think AI is going to transform health care radically. I don’t think it’s going to change doctoring radically,” he said. He believes that AI is likely to perform best at the extremes of diagnosis: the very simple cases (such as a poison-ivy rash) and the very complex ones (rare or novel diseases). In the not-so-distant future, people may be able to get answers to routine medical questions at home—What’s this spot? Is my cough concerning? How’s my blood pressure?—without ever needing to see a physician. That may be entirely appropriate, because attending to these everyday concerns usually does not require sophisticated clinical judgment or nuanced decision making.
AI could also prove valuable in identifying conditions that a physician may never encounter in their career, or in helping diagnose patients that have stumped multiple clinicians. These cases tend to hinge on how encyclopedic a doctor’s knowledge of the medical literature is; AI can recognize obscure patterns across millions of cases and publications, and surface possibilities that may lie outside any single physician’s experience.
“What I think is less likely to change is sort of the muddy middle, which is what I think the vast majority of medical practice is,” Dhaliwal said. Much of medicine involves choosing between possibilities: Does a person have an infection, an allergic reaction, or an autoimmune disease? Is it a psychiatric or medical issue? AI could certainly help parse through the options. But medical judgment goes beyond identifying what’s most likely; it involves deciding what the diagnosis means for a particular patient. Two people diagnosed with the same cancer may desire different futures. One may want the most aggressive treatment available, whereas the other may decline interventions that would trade quality of life for longevity. These are value-laden decisions that, at least for now, still require something irreducibly human to navigate. An LLM can recite treatment options and survival rates, but it cannot share responsibility for the choices that follow.
Relying on AI for certain aspects of diagnosis could help free doctors to focus on those more human parts of the job. In the United States, more than 100 million people don’t have a primary-care provider, and the profession itself is dwindling. “If in some form AI is able to beat us, or help us improve our ability to do clinical reasoning, you don’t have to be the smartest person in the room to be a physician, which I think is better for the community,” Jeffrey Goddard, a medical student at the University of Iowa who uses chatbots in his training, told me. A diagnosis, most simply, is an answer to the question What is making me ill? But it can offer much more than that—reassurance, coherence, and, ultimately, relief. Not all of that can be outsourced.
This essay was adapted from Alexandra Sifferlin’s book, The Elusive Body: Patients, Doctors, and the Diagnosis Crisis, published today.

