ChatGPT fails at heart risk assessment

Despite ChatGPT's reported ability to pass medical exams, new research indicates it would be unwise to rely on it for some health assessments, such as whether a patient with chest pain needs to be hospitalized.

In a study involving thousands of simulated cases of patients with chest pain, ChatGPT provided inconsistent conclusions, returning different heart risk assessment levels for the exact same patient data. The generative AI system also failed to match the traditional methods physicians use to judge a patient’s cardiac risk. The findings were published in the journal PLOS ONE.

"ChatGPT was not acting in a consistent manner," said lead author Dr. Thomas Heston, a researcher with Washington State University's Elson S. Floyd College of Medicine. "Given the exact same data, ChatGPT would give a score of low risk, then next time an intermediate risk, and occasionally, it would go as far as giving a high risk."

The authors believe the problem is likely due to the level of randomness built into the current version of the software, ChatGPT4, which helps it vary its responses to simulate natural language. This same randomness, however, does not work well for healthcare uses that require a single, consistent answer, Heston said.

"We found there was a lot of variation, and that variation in approach can be dangerous," he said. "It can be a useful tool, but I think the technology is going a lot faster than our understanding of it, so it's critically important that we do a lot of research, especially in these high-stakes clinical situations."

Chest pains are common complaints in emergency rooms, requiring doctors to rapidly assess the urgency of a patient's condition. Some very serious cases are easy to identify by their symptoms, but lower risk ones can be trickier, Heston said, especially when determining whether someone should be hospitalized for observation or sent home and receive outpatient care.

Currently medical professionals often use one of two measures that go by the acronyms TIMI and HEART to assess heart risk. Heston likened these scales to calculators with each using a handful of variables including symptoms, health history and age. In contrast, an AI neural network like ChatGPT can assess billions of variables quickly, meaning it could potentially analyze a complex situation faster and more thoroughly.

For this study, Heston and colleague Dr. Lawrence Lewis of Washington University in St. Louis first generated three datasets of 10,000 randomized, simulated cases each. One dataset had the seven variables of the TIMI scale, the second set included the five HEART scale variables and a third had 44 randomized health variables. On the first two datasets, ChatGPT gave a different risk assessment 45% to 48% of the time on individual cases than a fixed TIMI or HEART score. For the last data set, the researchers ran the cases four times and found ChatGPT often did not agree with itself, returning different assessment levels for the same cases 44% of the time.

Despite the negative findings of this study, Heston sees great potential for generative AI in health care - with further development. For instance, assuming privacy standards could be met, entire medical records could be loaded into the program, and an in an emergency setting, a doctor could ask ChatGPT to give the most pertinent facts about a patient quickly. Also, for difficult, complex cases, doctors could ask the program to generate several possible diagnoses.

"ChatGPT could be excellent at creating a differential diagnosis and that's probably one of its greatest strengths," said Heston. "If you don’t quite know what's going on with a patient, you could ask it to give the top five diagnoses and the reasoning behind each one. So it could be good at helping you think through a problem, but it’s not good at giving the answer."

Heston TF, Lewis LM.
ChatGPT provides inconsistent risk-stratification of patients with atraumatic chest pain.
PLoS One. 2024 Apr 16;19(4):e0301854. doi: 10.1371/journal.pone.0301854

Most Popular Now

AI also Assesses Dutch Mammograms Better…

AI is detecting tumors more often and earlier in the Dutch breast cancer screening program. Those tumors can then be treated at an earlier stage. This has been demonstrated by...

Unlocking the 10 Year Health Plan

The government's plan for the NHS is a huge document. Jane Stephenson, chief executive of SPARK TSL, argues the key to unlocking its digital ambitions is to consider what it...

AI can Find Cancer Pathologists Miss

Men assessed as healthy after a pathologist analyses their tissue sample may still have an early form of prostate cancer. Using AI, researchers at Uppsala University have been able to...

Alcidion Grows Top Talent in the UK, wit…

Alcidion has today announced the addition of three new appointments to their UK-based team, with one internal promotion and two external recruits. Dr Paul Deffley has been announced as the...

How AI could Speed the Development of RN…

Using artificial intelligence (AI), MIT researchers have come up with a new way to design nanoparticles that can more efficiently deliver RNA vaccines and other types of RNA therapies. After training...

AI, Full Automation could Expand Artific…

Automated insulin delivery (AID) systems such as the UVA Health-developed artificial pancreas could help more type 1 diabetes patients if the devices become fully automated, according to a new review...

MIT Researchers Use Generative AI to Des…

With help from artificial intelligence, MIT researchers have designed novel antibiotics that can combat two hard-to-treat infections: drug-resistant Neisseria gonorrhoeae and multi-drug-resistant Staphylococcus aureus (MRSA). Using generative AI algorithms, the research...

Penn Developed AI Tools and Datasets Hel…

Doctors treating kidney disease have long depended on trial-and-error to find the best therapies for individual patients. Now, new artificial intelligence (AI) tools developed by researchers in the Perelman School...

AI Hybrid Strategy Improves Mammogram In…

A hybrid reading strategy for screening mammography, developed by Dutch researchers and deployed retrospectively to more than 40,000 exams, reduced radiologist workload by 38% without changing recall or cancer detection...

New Training Year Starts at Siemens Heal…

In September, 197 school graduates will start their vocational training or dual studies in Germany at Siemens Healthineers. 117 apprentices and 80 dual students will begin their careers at Siemens...

Routine AI Assistance may Lead to Loss o…

The introduction of artificial intelligence (AI) to assist colonoscopies is linked to a reduction in the ability of endoscopists (health professionals who perform colonoscopies) to detect precancerous growths (adenomas) in...

New AI Tool Addresses Accuracy and Fairn…

A team of researchers at the Icahn School of Medicine at Mount Sinai has developed a new method to identify and reduce biases in datasets used to train machine-learning algorithms...