Fine-Tuned LLMs Boost Error Detection in Radiology Reports

A type of artificial intelligence (AI) called fine-tuned large language models (LLMs) greatly enhances error detection in radiology reports, according to a new study published in Radiology, a journal of the Radiological Society of North America (RSNA). Researchers said the findings point to an important role for this technology in medical proofreading.

Radiology reports are crucial for optimal patient care. Their accuracy can be compromised by factors like errors in speech recognition software, variability in perceptual and interpretive processes and cognitive biases. These errors can lead to incorrect diagnoses or delayed treatments, making the need for accurate reports urgent.

LLMs like ChatGPT are advanced generative AI models that are trained on vast amounts of text to generate human language. While they offer great potential in proofreading, their application in the medical field, particularly in detecting errors within radiology reports, remains underexplored.

To bridge this gap in knowledge, researchers evaluated fine-tuned LLMs for detecting errors in radiology reports during medical proofreading. A fine-tuned LLM is a pre-trained language model that is further trained on domain-specific data.

"Initially, LLMs are trained on large-scale public data to learn general language patterns and knowledge," said study senior author Yifan Peng, Ph.D., from the Department of Population Health Sciences at Weill Cornell Medicine in New York City. "Fine-tuning occurs as the next step, where the model undergoes additional training using smaller, targeted datasets relevant to particular tasks."

To test the model, Dr. Peng and colleagues built a dataset with two parts. The first consisted of 1,656 synthetic reports, including 828 error-free reports and 828 reports with errors. The second part comprised 614 reports, including 307 error-free reports from MIMIC-CXR, a large, publicly available database of chest X-rays, and 307 synthetic reports with errors.

The researchers used the synthetic reports to boost the amount of training data and fulfill the data-hungry needs of LLM fine-tuning.

"Synthetic reports can also increase the coverage and diversity, balance out the cases and reduce the annotation costs," said the study's first author, Cong Sun, Ph.D., from Dr. Peng's lab. "In radiology, or more broadly, the clinical domain, synthetic reports allow safe data-sharing without compromising patient privacy."

The researchers found that the fine-tuned model outperformed both GPT-4 and BiomedBERT, a natural language processing tool for biomedical research.

"The LLM that was fine-tuned on both MIMIC-CXR and synthetic reports demonstrated strong performance in the error detection tasks," Dr. Sun said. "It meets our expectations and highlights the potential for developing lightweight, fine-tuned LLM specifically for medical proofreading applications."

The study provided evidence that LLMs can assist in detecting various types of errors, including transcription errors and left/right errors, which refer to misidentification or misinterpretation of directions or sides in text or images.

The use of synthetic data in AI model building has raised concerns of bias in the data. Dr. Peng and colleagues took steps to minimize this by using diverse and representative samples of real-world data to generate the synthetic data. However, they acknowledged that synthetic errors may not fully capture the complexity of real-world errors in radiology reports. Future work could include a systematic evaluation of how bias introduced by synthetic errors affects model performance.

The researchers hope to study fine-tuning's ability to reduce radiologists' cognitive load and enhance patient care and find out if fine-tuning would degrade the model's ability to generate reasoning explanations.

"We are excited to keep exploring innovative strategies to enhance the reasoning capabilities of fine-tuned LLMs in medical proofreading tasks," Dr. Peng said. "Our goal is to develop transparent and understandable models that radiologists can confidently trust and fully embrace."

Sun C, Teichman K, Zhou Y, Critelli B, Nauheim D, Keir G, Wang X, Zhong J, Flanders AE, Shih G, Peng Y.
Generative Large Language Models Trained for Detecting Errors in Radiology Reports.
Radiology. 2025 May;315(2):e242575. doi: 10.1148/radiol.242575

Most Popular Now

Integrating Care Records is Good. Using …

Opinion Article by Dr Paul Deffley, Chief Medical Officer, Alcidion. A single patient record already exists in the NHS. Or at least, that’s a perception shared by many. A survey of...

Should AI Chatbots Replace Your Therapis…

The new study exposes the dangerous flaws in using artificial intelligence (AI) chatbots for mental health support. For the first time, the researchers evaluated these AI systems against clinical standards...

AI could Help Pathologists Match Cancer …

A new study by researchers at the Icahn School of Medicine at Mount Sinai, Memorial Sloan Kettering Cancer Center, and collaborators, suggests that artificial intelligence (AI) could significantly improve how...

AI Detects Early Signs of Osteoporosis f…

Investigators have developed an artificial intelligence-assisted diagnostic system that can estimate bone mineral density in both the lumbar spine and the femur of the upper leg, based on X-ray images...

AI Model Converts Hospital Records into …

UCLA researchers have developed an AI system that turns fragmented electronic health records (EHR) normally in tables into readable narratives, allowing artificial intelligence to make sense of complex patient histories...

AI Sharpens Pathologists' Interpret…

Pathologists' examinations of tissue samples from skin cancer tumours improved when they were assisted by an AI tool. The assessments became more consistent and patients' prognoses were described more accurately...

AI Tool Detects Surgical Site Infections…

A team of Mayo Clinic researchers has developed an artificial intelligence (AI) system that can detect surgical site infections (SSIs) with high accuracy from patient-submitted postoperative wound photos, potentially transforming...

Forging a Novel Therapeutic Path for Pat…

Rett syndrome is a devastating rare genetic childhood disorder primarily affecting girls. Merely 1 out of 10,000 girls are born with it and much fewer boys. It is caused by...

Mayo Clinic's AI Tool Identifies 9 …

Mayo Clinic researchers have developed a new artificial intelligence (AI) tool that helps clinicians identify brain activity patterns linked to nine types of dementia, including Alzheimer's disease, using a single...

AI Detects Fatty Liver Disease with Ches…

Fatty liver disease, caused by the accumulation of fat in the liver, is estimated to affect one in four people worldwide. If left untreated, it can lead to serious complications...

AI Matches Doctors in Mapping Lung Tumor…

In radiation therapy, precision can save lives. Oncologists must carefully map the size and location of a tumor before delivering high-dose radiation to destroy cancer cells while sparing healthy tissue...

Meet Your Digital Twin

Before an important meeting or when a big decision needs to be made, we often mentally run through various scenarios before settling on the best course of action. But when...