Evaluating the Performance of AI-Based Large Language Models in Radiation Oncology

A new study evaluates an artificial intelligence (AI)-based algorithm for autocontouring prior to radiotherapy in head and neck cancer. Manual contouring to pinpoint the area of treatment requires significant time, and an AI algorithm to enable autocontouring has been introduced. The study is published in the peer-reviewed journal AI in Precision Oncology.

Nikhil Thaker, from Capital Health and Bayta Systems, and coauthors, evaluated the performance of various LLMs, including OpenAI’s GPT-3.5-turbo, GPT-4, GPT-4-turbo, Meta’s Llama-2 models, and Google’s PaLM-2-text-bison.The LLMs were given an exam comprised of 300 questions, and the answers were compared to Radiation Oncology trainee performance.

The results showed that OpenAI’s GPT-4-turbo had the best performance, with 74.2% correct answers, and all three Llama-2 models under-performed. The LLMs tended to excel in the area of statistics, but to underperform in clinical areas, with the exception of GPT-turbo, which performed comparably to upper-level radiation oncology trainees and superiorly to lower-level trainees.

"Future research will need to evaluate the performance of models that are fine-tune trained in clinical oncology," concluded the investigators. "This study also underscores the need for rigorous validation of LLM-generated information against established medical literature and expert consensus, necessitating expert oversight in their application in medical education and practice."

"The study highlights the potential of generative AI to revolutionize radiation oncology education and practice. OpenAI's GPT-4-turbo demonstrates that AI can complement medical training, suggesting a future where AI aids in improving patient outcomes. It's essential, though, to validate these technologies rigorously and involve experts to ensure their reliable and effective use in healthcare," says Douglas Flora, MD, Editor-in-Chief of AI in Precision Oncology.

Nikhil G. Thaker, Navid Redjal, Arturo Loaiza-Bonilla, David Penberthy, Tim Showalter, Ajay Choudhri, Shirnett Williamson, Gautam Thaker, Chirag Shah, Matthew C. Ward, Mihir Thaker, Michael Arcaro.
Large Language Models Encode Radiation Oncology Domain Knowledge: Performance on the American College of Radiology Standardized Examination.
AI in Precision Oncology, 2024. doi: 10.1089/aipo.2023.0007

Most Popular Now

AI Tool Offers Deep Insight into the Imm…

Researchers explore the human immune system by looking at the active components, namely the various genes and cells involved. But there is a broad range of these, and observations necessarily...

Do Fitness Apps do More Harm than Good?

A study published in the British Journal of Health Psychology reveals the negative behavioral and psychological consequences of commercial fitness apps reported by users on social media. These impacts may...

AI Tool Beats Humans at Detecting Parasi…

Scientists at ARUP Laboratories have developed an artificial intelligence (AI) tool that detects intestinal parasites in stool samples more quickly and accurately than traditional methods, potentially transforming how labs diagnose...

Making Cancer Vaccines More Personal

In a new study, University of Arizona researchers created a model for cutaneous squamous cell carcinoma, a type of skin cancer, and identified two mutated tumor proteins, or neoantigens, that...

AI, Health, and Health Care Today and To…

Artificial intelligence (AI) carries promise and uncertainty for clinicians, patients, and health systems. This JAMA Summit Report presents expert perspectives on the opportunities, risks, and challenges of AI in health...

AI can Better Predict Future Risk for He…

A landmark study led by University' experts has shown that artificial intelligence can better predict how doctors should treat patients following a heart attack. The study, conducted by an international...

AI System Finds Crucial Clues for Diagno…

Doctors often must make critical decisions in minutes, relying on incomplete information. While electronic health records contain vast amounts of patient data, much of it remains difficult to interpret quickly...

A New AI Model Improves the Prediction o…

Breast cancer is the most commonly diagnosed form of cancer in the world among women, with more than 2.3 million cases a year, and continues to be one of the...

Improved Cough-Detection Tech can Help w…

Researchers have improved the ability of wearable health devices to accurately detect when a patient is coughing, making it easier to monitor chronic health conditions and predict health risks such...

Multimodal AI Poised to Revolutionize Ca…

Although artificial intelligence (AI) has already shown promise in cardiovascular medicine, most existing tools analyze only one type of data - such as electrocardiograms or cardiac images - limiting their...

New AI Tool Makes Medical Imaging Proces…

When doctors analyze a medical scan of an organ or area in the body, each part of the image has to be assigned an anatomical label. If the brain is...