Evaluating the Performance of AI-Based Large Language Models in Radiation Oncology

A new study evaluates an artificial intelligence (AI)-based algorithm for autocontouring prior to radiotherapy in head and neck cancer. Manual contouring to pinpoint the area of treatment requires significant time, and an AI algorithm to enable autocontouring has been introduced. The study is published in the peer-reviewed journal AI in Precision Oncology.

Nikhil Thaker, from Capital Health and Bayta Systems, and coauthors, evaluated the performance of various LLMs, including OpenAI’s GPT-3.5-turbo, GPT-4, GPT-4-turbo, Meta’s Llama-2 models, and Google’s PaLM-2-text-bison.The LLMs were given an exam comprised of 300 questions, and the answers were compared to Radiation Oncology trainee performance.

The results showed that OpenAI’s GPT-4-turbo had the best performance, with 74.2% correct answers, and all three Llama-2 models under-performed. The LLMs tended to excel in the area of statistics, but to underperform in clinical areas, with the exception of GPT-turbo, which performed comparably to upper-level radiation oncology trainees and superiorly to lower-level trainees.

"Future research will need to evaluate the performance of models that are fine-tune trained in clinical oncology," concluded the investigators. "This study also underscores the need for rigorous validation of LLM-generated information against established medical literature and expert consensus, necessitating expert oversight in their application in medical education and practice."

"The study highlights the potential of generative AI to revolutionize radiation oncology education and practice. OpenAI's GPT-4-turbo demonstrates that AI can complement medical training, suggesting a future where AI aids in improving patient outcomes. It's essential, though, to validate these technologies rigorously and involve experts to ensure their reliable and effective use in healthcare," says Douglas Flora, MD, Editor-in-Chief of AI in Precision Oncology.

Nikhil G. Thaker, Navid Redjal, Arturo Loaiza-Bonilla, David Penberthy, Tim Showalter, Ajay Choudhri, Shirnett Williamson, Gautam Thaker, Chirag Shah, Matthew C. Ward, Mihir Thaker, Michael Arcaro.
Large Language Models Encode Radiation Oncology Domain Knowledge: Performance on the American College of Radiology Standardized Examination.
AI in Precision Oncology, 2024. doi: 10.1089/aipo.2023.0007

Most Popular Now

Philips Foundation 2024 Annual Report: E…

Marking its tenth anniversary, Philips Foundation released its 2024 Annual Report, highlighting a year in which the Philips Foundation helped provide access to quality healthcare for 46.5 million people around...

New AI Transforms Radiology with Speed, …

A first-of-its-kind generative AI system, developed in-house at Northwestern Medicine, is revolutionizing radiology - boosting productivity, identifying life-threatening conditions in milliseconds and offering a breakthrough solution to the global radiologist...

Scientists Argue for More FDA Oversight …

An agile, transparent, and ethics-driven oversight system is needed for the U.S. Food and Drug Administration (FDA) to balance innovation with patient safety when it comes to artificial intelligence-driven medical...

New Research Finds Specific Learning Str…

If data used to train artificial intelligence models for medical applications, such as hospitals across the Greater Toronto Area, differs from the real-world data, it could lead to patient harm...

Giving Doctors an AI-Powered Head Start …

Detection of melanoma and a range of other skin diseases will be faster and more accurate with a new artificial intelligence (AI) powered tool that analyses multiple imaging types simultaneously...

AI Agents for Oncology

Clinical decision-making in oncology is challenging and requires the analysis of various data types - from medical imaging and genetic information to patient records and treatment guidelines. To effectively support...

Patients say "Yes..ish" to the…

As artificial intelligence (AI) continues to be integrated in healthcare, a new multinational study involving Aarhus University sheds light on how dental patients really feel about its growing role in...

Brains vs. Bytes: Study Compares Diagnos…

A University of Maine study compared how well artificial intelligence (AI) models and human clinicians handled complex or sensitive medical cases. The study published in the Journal of Health Organization...

'AI Scientist' Suggests Combin…

An 'AI scientist', working in collaboration with human scientists, has found that combinations of cheap and safe drugs - used to treat conditions such as high cholesterol and alcohol dependence...

Start-ups in the Spotlight at MEDICA 202…

17 - 20 November 2025, Düsseldorf, Germany. MEDICA, the leading international trade fair and platform for healthcare innovations, will once again confirm its position as the world's number one hotspot for...