NIH Findings Shed Light on Risks and Benefits of Integrating AI into Medical Decision-Making

Researchers at the National Institutes of Health (NIH) found that an artificial intelligence (AI) model solved medical quiz questions - designed to test health professionals’ ability to diagnose patients based on clinical images and a brief text summary - with high accuracy. However, physician-graders found the AI model made mistakes when describing images and explaining how its decision-making led to the correct answer. The findings, which shed light on AI's potential in the clinical setting, were published in npj Digital Medicine. The study was led by researchers from NIH’s National Library of Medicine (NLM) and Weill Cornell Medicine, New York City.

"Integration of AI into health care holds great promise as a tool to help medical professionals diagnose patients faster, allowing them to start treatment sooner," said NLM Acting Director, Stephen Sherry, Ph.D. "However, as this study shows, AI is not advanced enough yet to replace human experience, which is crucial for accurate diagnosis."

The AI model and human physicians answered questions from the New England Journal of Medicine (NEJM)'s Image Challenge. The challenge is an online quiz that provides real clinical images and a short text description that includes details about the patient’s symptoms and presentation, then asks users to choose the correct diagnosis from multiple-choice answers.

The researchers tasked the AI model to answer 207 image challenge questions and provide a written rationale to justify each answer. The prompt specified that the rationale should include a description of the image, a summary of relevant medical knowledge, and provide step-by-step reasoning for how the model chose the answer.

Nine physicians from various institutions were recruited, each with a different medical specialty, and answered their assigned questions first in a "closed-book" setting, (without referring to any external materials such as online resources) and then in an "open-book" setting (using external resources). The researchers then provided the physicians with the correct answer, along with the AI model's answer and corresponding rationale. Finally, the physicians were asked to score the AI model's ability to describe the image, summarize relevant medical knowledge, and provide its step-by-step reasoning.

The researchers found that the AI model and physicians scored highly in selecting the correct diagnosis. Interestingly, the AI model selected the correct diagnosis more often than physicians in closed-book settings, while physicians with open-book tools performed better than the AI model, especially when answering the questions ranked most difficult.

Importantly, based on physician evaluations, the AI model often made mistakes when describing the medical image and explaining its reasoning behind the diagnosis - even in cases where it made the correct final choice. In one example, the AI model was provided with a photo of a patient's arm with two lesions. A physician would easily recognize that both lesions were caused by the same condition. However, because the lesions were presented at different angles-causing the illusion of different colors and shapes - the AI model failed to recognize that both lesions could be related to the same diagnosis.

The researchers argue that these findings underpin the importance of evaluating multi-modal AI technology further before introducing it into the clinical setting. ­­

"This technology has the potential to help clinicians augment their capabilities with data-driven insights that may lead to improved clinical decision-making," said NLM Senior Investigator and corresponding author of the study, Zhiyong Lu, Ph.D. "Understanding the risks and limitations of this technology is essential to harnessing its potential in medicine."

The study used an AI model known as GPT-4V (Generative Pre-trained Transformer 4 with Vision), which is a ‘multimodal AI model’ that can process combinations of multiple types of data, including text and images. The researchers note that while this is a small study, it sheds light on multi-modal AI’s potential to aid physicians’ medical decision-making. More research is needed to understand how such models compare to physicians’ ability to diagnose patients.

Jin Q, Chen F, Zhou Y, Xu Z, Cheung JM, Chen R, Summers RM, Rousseau JF, Ni P, Landsman MJ, Baxter SL, Al'Aref SJ, Li Y, Chen A, Brejt JA, Chiang MF, Peng Y, Lu Z.
Hidden flaws behind expert-level accuracy of multimodal GPT-4 vision in medicine.
NPJ Digit Med. 2024 Jul 23;7(1):190. doi: 10.1038/s41746-024-01185-7

Most Popular Now

AI for Real-Rime, Patient-Focused Insigh…

A picture may be worth a thousand words, but still... they both have a lot of work to do to catch up to BiomedGPT. Covered recently in the prestigious journal Nature...

A "Chemical ChatGPT" for New M…

Researchers from the University of Bonn have trained an AI process to predict potential active ingredients with special properties. Therefore, they derived a chemical language model - a kind of...

Siemens Healthineers co-leads EU Project…

Siemens Healthineers is joining forces with more than 20 industry and public partners, including seven leading stroke hospitals, to improve stroke management for patients all over Europe. With a total...

In 10 Seconds, an AI Model Detects Cance…

Researchers have developed an AI powered model that - in 10 seconds - can determine during surgery if any part of a cancerous brain tumor that could be removed remains...

Does AI Improve Doctors' Diagnoses?

With hospitals already deploying artificial intelligence to improve patient care, a new study has found that using Chat GPT Plus does not significantly improve the accuracy of doctors' diagnoses when...

AI Analysis of PET/CT Images can Predict…

Dr. Watanabe and his teams from Niigata University have revealed that PET/CT image analysis using artificial intelligence (AI) can predict the occurrence of interstitial lung disease, known as a serious...

MEDICA and COMPAMED 2024: Shining a Ligh…

11 - 14 November 2024, Düsseldorf, Germany. Christian Grosser, Director Health & Medical Technologies, is looking forward to events getting under way: "From next Monday to Thursday, we will once again...

New Medical AI Tool Identifies more Case…

Investigators at Mass General Brigham have developed an AI-based tool to sift through electronic health records to help clinicians identify cases of long COVID, an often mysterious condition that can...

NIH-Developed AI Algorithm Successfully …

Researchers from the National Institutes of Health (NIH) have developed an artificial intelligence (AI) algorithm to help speed up the process of matching potential volunteers to relevant clinical research trials...

Jane Stephenson Joins SPARK TSL as Chief…

Jane Stephenson has joined SPARK TSL as chief executive as the company looks to establish the benefits of SPARK Fusion with trusts looking for deployable solutions to improve productivity. Stephenson joins...

MEDICA 2024 and COMPAMED 2024: Medical T…

11 - 14 November 2024, Düsseldorf, Germany. "Meet Health. Future. People." is MEDICA's campaign motto for the future in the new trade fair year 2025. The aptness of the motto...

500 Patient Images per Second Shared thr…

The image exchange portal, widely known in the NHS as the IEP, is now being used to share as many as 500 images each second - including x-rays, CT, MRI...