Researchers Test AI-Powered Chatbot's Medical Diagnostic Ability

In a recent experiment published in JAMA, physician-researchers at Beth Israel Deaconess Medical Center (BIDMC) tested one well-known publicly available chatbot's ability to make accurate diagnoses in challenging medical cases. The team found that the generative AI, Chat-GPT 4, selected the correct diagnosis as its top diagnosis nearly 40 percent of the time and provided the correct diagnosis in its list of potential diagnoses in two-thirds of challenging cases.

Generative AI refers to a type of artificial intelligence that uses patterns and information it has been trained on to create new content, rather than simply processing and analyzing existing data. Some of the most well-known examples of generative AI are so-called chatbots, which use a branch of artificial intelligence called natural language processing (NLP) that allows computers to understand, interpret and generate human-like language. Generative AI chatbots are powerful tools poised to revolutionize creative industries, education, customer service and more. However, little is known about their potential performance in the clinical setting, such as complex diagnostic reasoning.

"Recent advances in artificial intelligence have led to generative AI models that are capable of detailed text-based responses that score highly in standardized medical examinations," said Adam Rodman, MD, MPH, co-director of the Innovations in Media and Education Delivery (iMED) Initiative at BIDMC and an instructor in medicine at Harvard Medical School. "We wanted to know if such a generative model could 'think' like a doctor, so we asked one to solve standardized complex diagnostic cases used for educational purposes. It did really, really well."

To assess the chatbot’s diagnostic skills, Rodman and colleagues used clinicopathological case conferences (CPCs), a series of complex and challenging patient cases including relevant clinical and laboratory data, imaging studies, and histopathological findings published in the New England Journal of Medicine for educational purposes.

Evaluating 70 CPC cases, the artificial intelligence exactly matched the final CPC diagnosis in 27 (39 percent) of cases. In 64 percent of the cases, the final CPC diagnosis was included in the AI's differential - a list of possible conditions that could account for a patient’s symptoms, medical history, clinical findings and laboratory or imaging results.

"While Chatbots cannot replace the expertise and knowledge of a trained medical professional, generative AI is a promising potential adjunct to human cognition in diagnosis," said first author Zahir Kanjee, MD, MPH, a hospitalist at BIDMC and assistant professor of medicine at Harvard Medical School. "It has the potential to help physicians make sense of complex medical data and broaden or refine our diagnostic thinking. We need more research on the optimal uses, benefits and limits of this technology, and a lot of privacy issues need sorting out, but these are exciting findings for the future of diagnosis and patient care."

"Our study adds to a growing body of literature demonstrating the promising capabilities of AI technology," said co-author Byron Crowe, MD, an internal medicine physician at BIDMC and an instructor in medicine at Harvard Medical School. "Further investigation will help us better understand how these new AI models might transform health care delivery."

This work did not receive separate funding or sponsorship. Kanjee reports royalties for books edited and membership of a paid advisory board for medical education products not related to artificial intelligence from Wolters Kluwer, as well as honoraria for CME delivered from Oakstone Publishing. Crowe reports employment by Solera Health outside the submitted work. Rodman reports no conflicts of interest.

Kanjee Z, Crowe B, Rodman A.
Accuracy of a Generative Artificial Intelligence Model in a Complex Diagnostic Challenge.
JAMA. Published online June 15, 2023. doi: 10.1001/jama.2023.8288

Most Popular Now

Can Language Models Read the Genome? Thi…

The same class of artificial intelligence that made headlines coding software and passing the bar exam has learned to read a different kind of text - the genetic code. That code...

Bayer and Google Cloud to Accelerate Dev…

Bayer and Google Cloud announced a collaboration on the development of artificial intelligence (AI) solutions to support radiologists and ultimately better serve patients. As part of the collaboration, Bayer will...

Study Shows Human Medical Professionals …

When looking for medical information, people can use web search engines or large language models (LLMs) like ChatGPT-4 or Google Bard. However, these artificial intelligence (AI) tools have their limitations...

Shared Digital NHS Prescribing Record co…

Implementing a single shared digital prescribing record across the NHS in England could avoid nearly 1 million drug errors every year, stopping up to 16,000 fewer patients from being harmed...

North West Anglia Works with Clinisys to…

North West Anglia NHS Foundation Trust has replaced two, legacy laboratory information systems with a single instance of Clinisys WinPath. The trust, which serves a catchment of 800,000 patients in North...

Ask Chat GPT about Your Radiation Oncolo…

Cancer patients about to undergo radiation oncology treatment have lots of questions. Could ChatGPT be the best way to get answers? A new Northwestern Medicine study tested a specially designed ChatGPT...

Can AI Techniques Help Clinicians Assess…

Investigators have applied artificial intelligence (AI) techniques to gait analyses and medical records data to provide insights about individuals with leg fractures and aspects of their recovery. The study, published in...

AI Makes Retinal Imaging 100 Times Faste…

Researchers at the National Institutes of Health applied artificial intelligence (AI) to a technique that produces high-resolution images of cells in the eye. They report that with AI, imaging is...

SPARK TSL Acquires Sentean Group

SPARK TSL is acquiring Sentean Group, a Dutch company with a complementary background in hospital entertainment and communication, and bringing its Fusion Bedside platform for clinical and patient apps to...

Standing Up for Health Tech and SMEs: Sh…

AS the new chair of the health and social care council at techUK, Shane Tickell talked to Highland Marketing about his determination to support small and innovative companies, by having...

GPT-4 Matches Radiologists in Detecting …

Large language model GPT-4 matched the performance of radiologists in detecting errors in radiology reports, according to research published in Radiology, a journal of the Radiological Society of North America...

ChatGPT Extracts Data for Ischaemic Stro…

In an ischaemic stroke, an artery in the brain is blocked by blood clots and the brain cells can no longer be supplied with blood as a result. Doctors must...