ChatGPT Shows 'Impressive' Accuracy in Clinical Decision Making

A new study led by investigators from Mass General Brigham has found that ChatGPT was about 72 percent accurate in overall clinical decision making, from coming up with possible diagnoses to making final diagnoses and care management decisions. The large-language model (LLM) artificial intelligence chatbot performed equally well in both primary care and emergency settings across all medical specialties. The research team’s results are published in the Journal of Medical Internet Research.

"Our paper comprehensively assesses decision support via ChatGPT from the very beginning of working with a patient through the entire care scenario, from differential diagnosis all the way through testing, diagnosis, and management," said corresponding author Marc Succi, MD, associate chair of innovation and commercialization and strategic innovation leader at Mass General Brigham and executive director of the MESH Incubator. "No real benchmarks exists, but we estimate this performance to be at the level of someone who has just graduated from medical school, such as an intern or resident. This tells us that LLMs in general have the potential to be an augmenting tool for the practice of medicine and support clinical decision making with impressive accuracy."

Changes in artificial intelligence technology are occurring at a fast pace and transforming many industries, including health care. But the capacity of LLMs to assist in the full scope of clinical care has not yet been studied. In this comprehensive, cross-specialty study of how LLMs could be used in clinical advisement and decision making, Succi and his team tested the hypothesis that ChatGPT would be able to work through an entire clinical encounter with a patient and recommend a diagnostic workup, decide the clinical management course, and ultimately make the final diagnosis.

The study was done by pasting successive portions of 36 standardized, published clinical vignettes into ChatGPT. The tool first was asked to come up with a set of possible, or differential, diagnoses based on the patient's initial information, which included age, gender, symptoms, and whether the case was an emergency. ChatGPT was then given additional pieces of information and asked to make management decisions as well as give a final diagnosis - simulating the entire process of seeing a real patient. The team compared ChatGPT's accuracy on differential diagnosis, diagnostic testing, final diagnosis, and management in a structured blinded process, awarding points for correct answers and using linear regressions to assess the relationship between ChatGPT’s performance and the vignette’s demographic information.

The researchers found that overall, ChatGPT was about 72 percent accurate and that it was best in making a final diagnosis, where it was 77 percent accurate. It was lowest-performing in making differential diagnoses, where it was only 60 percent accurate. And it was only 68 percent accurate in clinical management decisions, such as figuring out what medications to treat the patient with after arriving at the correct diagnosis. Other notable findings from the study included that ChatGPT's answers did not show gender bias and that its overall performance was steady across both primary and emergency care.

"ChatGPT struggled with differential diagnosis, which is the meat and potatoes of medicine when a physician has to figure out what to do," said Succi. "That is important because it tells us where physicians are truly experts and adding the most value - in the early stages of patient care with little presenting information, when a list of possible diagnoses is needed."

The authors note that before tools like ChatGPT can be considered for integration into clinical care, more benchmark research and regulatory guidance is needed. Next, Succi's team is looking at whether AI tools can improve patient care and outcomes in hospitals’ resource-constrained areas.

The emergence of artificial intelligence tools in health has been groundbreaking and has the potential to positively reshape the continuum of care. Mass General Brigham, as one of the nation's top integrated academic health systems and largest innovation enterprises, is leading the way in conducting rigorous research on new and emerging technologies to inform the responsible incorporation of AI into care delivery, workforce support, and administrative processes.

"Mass General Brigham sees great promise for LLMs to help improve care delivery and clinician experience," said co-author Adam Landman, MD, MS, MIS, MHS, chief information officer and senior vice president of digital at Mass General Brigham. "We are currently evaluating LLM solutions that assist with clinical documentation and draft responses to patient messages with focus on understanding their accuracy, reliability, safety, and equity. Rigorous studies like this one are needed before we integrate LLM tools into clinical care."

Rao A, Pang M, Kim J, Kamineni M, Lie W, Prasad AK, Landman A, Dreyer K, Succi MD.
Assessing the Utility of ChatGPT Throughout the Entire Clinical Workflow: Development and Usability Study.
J Med Internet Res. 2023 Aug 22;25:e48659. doi: 10.2196/48659

Most Popular Now

Bayer Launches New Healthy-Aging Ecosyst…

Combining a scientifically formulated dietary supplement, a leading-edge wellness companion app, and a saliva-based a biological age test by Chronomics, Bayer is taking a big step in the emerging healthy-aging...

Airwave Healthcare Expands Team with Fra…

Patient stimulus technology provider Airwave Healthcare has appointed Francesca McPhail, who will help health and care providers achieve more from their media and entertainment systems for people receiving care. Francesca McPhail...

Scientists Use AI to Detect Chronic High…

Researchers at Klick Labs unveiled a cutting-edge, non-invasive technique that can predict chronic high blood pressure (hypertension) with a high degree of accuracy using just a person's voice. Just published...

New AI-Driven Tool could Revolutionize B…

Researchers at the Icahn School of Medicine at Mount Sinai have developed a noninvasive technique that could dramatically improve the way doctors monitor intracranial hypertension, a condition where increased pressure...

ChatGPT Outperformed Trainee Doctors in …

The chatbot ChatGPT performed better than trainee doctors in assessing complex cases of respiratory disease in areas such as cystic fibrosis, asthma and chest infections in a study presented at...

Former NHS CIO Will Smart Joins Alcidion

A former national chief information officer for health and social care in England, Will Smart will join the Alcidion Group board in a global role from October. He will provide...

The Darzi Review: The NHS "Is in Se…

Lyn Whitfield, content director at Highland Marketing, takes a look at Lord Darzi's review of the NHS, immediate reaction, and next steps. The review calls for a "tilt towards technology...

Can Google Street View Data Improve Publ…

Big data and artificial intelligence are transforming how we think about health, from detecting diseases and spotting patterns to predicting outcomes and speeding up response times. In a new study analyzing...

Healthcare Week Luxembourg: Second Editi…

1 - 2 October 2024, Luxembourg.Save the date: Healthcare Week Luxembourg is back on 1 and 2 October 2024 at Luxexpo The Box. Acclaimed last year by healthcare professionals from...

SPARK TSL Appoints David Hawkins as its …

SPARK TSL has appointed David Hawkins as its new sales director, to support take-up of the SPARK Fusion infotainment solution by NHS trusts and health boards. SPARK Fusion is a state-of-the-art...

AI Products Like ChatGPT can Provide Med…

The much-hyped AI products like ChatGPt may provide medical doctors and healthcare professionals with information that can aggravate patients' conditions and lead to serious health consequences, a study suggests. Researchers considered...

One in Five UK Soctors use AI Chatbots

A survey led by researchers at Uppsala University in Sweden reveals that a significant proportion of UK general practitioners (GPs) are integrating generative AI tools, such as ChatGPT, into their...