Evaluating the Performance of AI-Based Large Language Models in Radiation Oncology

A new study evaluates an artificial intelligence (AI)-based algorithm for autocontouring prior to radiotherapy in head and neck cancer. Manual contouring to pinpoint the area of treatment requires significant time, and an AI algorithm to enable autocontouring has been introduced. The study is published in the peer-reviewed journal AI in Precision Oncology.

Nikhil Thaker, from Capital Health and Bayta Systems, and coauthors, evaluated the performance of various LLMs, including OpenAI’s GPT-3.5-turbo, GPT-4, GPT-4-turbo, Meta’s Llama-2 models, and Google’s PaLM-2-text-bison.The LLMs were given an exam comprised of 300 questions, and the answers were compared to Radiation Oncology trainee performance.

The results showed that OpenAI’s GPT-4-turbo had the best performance, with 74.2% correct answers, and all three Llama-2 models under-performed. The LLMs tended to excel in the area of statistics, but to underperform in clinical areas, with the exception of GPT-turbo, which performed comparably to upper-level radiation oncology trainees and superiorly to lower-level trainees.

"Future research will need to evaluate the performance of models that are fine-tune trained in clinical oncology," concluded the investigators. "This study also underscores the need for rigorous validation of LLM-generated information against established medical literature and expert consensus, necessitating expert oversight in their application in medical education and practice."

"The study highlights the potential of generative AI to revolutionize radiation oncology education and practice. OpenAI's GPT-4-turbo demonstrates that AI can complement medical training, suggesting a future where AI aids in improving patient outcomes. It's essential, though, to validate these technologies rigorously and involve experts to ensure their reliable and effective use in healthcare," says Douglas Flora, MD, Editor-in-Chief of AI in Precision Oncology.

Nikhil G. Thaker, Navid Redjal, Arturo Loaiza-Bonilla, David Penberthy, Tim Showalter, Ajay Choudhri, Shirnett Williamson, Gautam Thaker, Chirag Shah, Matthew C. Ward, Mihir Thaker, Michael Arcaro.
Large Language Models Encode Radiation Oncology Domain Knowledge: Performance on the American College of Radiology Standardized Examination.
AI in Precision Oncology, 2024. doi: 10.1089/aipo.2023.0007

Most Popular Now

AI in Personalized Cancer Medicine: New …

The application of AI in precision oncology has so far been largely confined to the development of new drugs and had only limited impact on the personalisation of therapies. New...

AI can Predict Brain Cancer Patients…

Artificial Intelligence (AI) can predict whether adult patients with brain cancer will survive more than eight months after receiving radiotherapy treatment. The use of the AI to successfully predict patient outcomes...

Max Planck Institute for Informatics and…

The Max Planck Institute for Informatics and Google deepen their strategic research partnership. With additional financial support from the U.S. IT company, the "Saarbrücken Research Center for Visual Computing, Interaction...

JMIR Medical Informatics Invites Submiss…

JMIR Publications has announced a new section titled, "AI Language Models in Health Care" in JMIR Medical Informatics. This leading peer-reviewed journal is indexed in PubMed and has a unique...

Paper Calls for Patient-First Regulation…

Ever wonder if the latest and greatest artificial intelligence (AI) tool you read about in the morning paper is going to save your life? A new study published in JAMA...

Could ChatGPT Help or Hurt Scientific Re…

Since its introduction to the public in November 2022, ChatGPT, an artificial intelligence system, has substantially grown in use, creating written stories, graphics, art and more with just a short...

Evaluating the Performance of AI-Based L…

A new study evaluates an artificial intelligence (AI)-based algorithm for autocontouring prior to radiotherapy in head and neck cancer. Manual contouring to pinpoint the area of treatment requires significant time...

Making AI a Partner in Neuroscientific D…

The past year has seen major advances in Large Language Models (LLMs) such as ChatGPT. The ability of these models to interpret and produce human text sources (and other sequence...

Chapman Scientists Code ChatGPT to Desig…

Generative artificial intelligence platforms, from ChatGPT to Midjourney, grabbed headlines in 2023. But GenAI can do more than create collaged images and help write emails - it can also design...

DMEA nova Award: Wanted - Visionary Solu…

9 - 11 April 2024, Berlin, Germany. The DMEA nova Award is being presented at DMEA 2024 for the first time. The award honours a digital health startup for an outstanding...

New Digital Therapy Reduces Anxiety and …

A therapist-guided digital cognitive behavioural therapy reduced distress in 89 per cent of participants living with long-term physical health conditions, a new King's College London study finds. Researchers at the Institute...

Europe's Digital Health Industry Me…

9 - 11 April 2024, Berlin, Germany. In just over two months, from 9 to 11 April 2024, DMEA, Europe's leading event for digitalisation of healthcare, will gather digital health experts...