Evaluating the Performance of AI-Based Large Language Models in Radiation Oncology

A new study evaluates an artificial intelligence (AI)-based algorithm for autocontouring prior to radiotherapy in head and neck cancer. Manual contouring to pinpoint the area of treatment requires significant time, and an AI algorithm to enable autocontouring has been introduced. The study is published in the peer-reviewed journal AI in Precision Oncology.

Nikhil Thaker, from Capital Health and Bayta Systems, and coauthors, evaluated the performance of various LLMs, including OpenAI’s GPT-3.5-turbo, GPT-4, GPT-4-turbo, Meta’s Llama-2 models, and Google’s PaLM-2-text-bison.The LLMs were given an exam comprised of 300 questions, and the answers were compared to Radiation Oncology trainee performance.

The results showed that OpenAI’s GPT-4-turbo had the best performance, with 74.2% correct answers, and all three Llama-2 models under-performed. The LLMs tended to excel in the area of statistics, but to underperform in clinical areas, with the exception of GPT-turbo, which performed comparably to upper-level radiation oncology trainees and superiorly to lower-level trainees.

"Future research will need to evaluate the performance of models that are fine-tune trained in clinical oncology," concluded the investigators. "This study also underscores the need for rigorous validation of LLM-generated information against established medical literature and expert consensus, necessitating expert oversight in their application in medical education and practice."

"The study highlights the potential of generative AI to revolutionize radiation oncology education and practice. OpenAI's GPT-4-turbo demonstrates that AI can complement medical training, suggesting a future where AI aids in improving patient outcomes. It's essential, though, to validate these technologies rigorously and involve experts to ensure their reliable and effective use in healthcare," says Douglas Flora, MD, Editor-in-Chief of AI in Precision Oncology.

Nikhil G. Thaker, Navid Redjal, Arturo Loaiza-Bonilla, David Penberthy, Tim Showalter, Ajay Choudhri, Shirnett Williamson, Gautam Thaker, Chirag Shah, Matthew C. Ward, Mihir Thaker, Michael Arcaro.
Large Language Models Encode Radiation Oncology Domain Knowledge: Performance on the American College of Radiology Standardized Examination.
AI in Precision Oncology, 2024. doi: 10.1089/aipo.2023.0007

Most Popular Now

With Huge Patient Dataset, AI Accurately…

Scientists have designed a new artificial intelligence (AI) model that emulates randomized clinical trials at determining the treatment options most effective at preventing stroke in people with heart disease. The model...

Radboud University Medical Center and Ph…

Royal Philips (NYSE: PHG, AEX: PHIA), a global leader in health technology, and Radboud University Medical Center have signed a hospital-wide, long-term strategic partnership that delivers the latest patient monitoring...

GPT-4, Google Gemini Fall Short in Breas…

Use of publicly available large language models (LLMs) resulted in changes in breast imaging reports classification that could have a negative effect on patient management, according to a new international...

ChatGPT fails at heart risk assessment

Despite ChatGPT's reported ability to pass medical exams, new research indicates it would be unwise to rely on it for some health assessments, such as whether a patient with chest...

Virtual Reality Shows Promise in Fightin…

A new study published in JMIR Mental Health sheds light on the promising role of virtual reality (VR) in treating major depressive disorder (MDD). Titled "Examining the Efficacy of Extended...

AXREM and Highland Marketing Partner to …

AXREM represents member companies that collectively provide UK hospitals with most of their diagnostic medical imaging technology, and radiotherapy equipment. The association has seen substantial growth in recent years, with membership...

Virtual Reality Environment for Teens ma…

Social media. The climate crisis. Political polarization. The tumult of a pandemic and online learning. Teens today are dealing with unprecedented stressors, and over the past decade their mental health...

AI Predicts Tumor-Killing Cells with Hig…

Using artificial intelligence, Ludwig Cancer Research scientists have developed a powerful predictive model for identifying the most potent cancer killing immune cells for use in cancer immunotherapies. Combined with additional algorithms...

Somerset NHS Foundation Trust Works with…

Somerset NHS Foundation Trust is working with Oleeo to help to support its recruitment processes and deliver a better experience for recruitment managers and candidates. The trust, which employs 14,000 people...

Researchers Use Foundation Models to Dis…

Researchers at Mass General Brigham have harnessed the technology behind foundation models, which power tools like ChatGPT, to discover new cancer imaging biomarkers that could transform how patterns are identified...

Why Standards are Key to Building Trust …

Opinion Article by Dean Mawson, Clinical Director and Founder, DPM Digital Health Consultancy. There's considerable interest in the potential uses of AI in healthcare at the moment; but there is also...

AI Tool to Improve Heart Failure Care

UVA Health researchers have developed a powerful new risk assessment tool for predicting outcomes in heart failure patients. The researchers have made the tool publicly available for free to clinicians. The...