Is AI in Medicine Playing Fair?

As artificial intelligence (AI) rapidly integrates into health care, a new study by researchers at the Icahn School of Medicine at Mount Sinai reveals that all generative AI models may recommend different treatments for the same medical condition based solely on a patient's socioeconomic and demographic background.

Their findings, which are detailed in the April 7, 2025 online issue of Nature Medicine, highlight the importance of early detection and intervention to ensure that AI-driven care is safe, effective, and appropriate for all.

As part of their investigation, the researchers stress-tested nine large language models (LLMs) on 1,000 emergency department cases, each replicated with 32 different patient backgrounds, generating more than 1.7 million AI-generated medical recommendations. Despite identical clinical details, the AI models occasionally altered their decisions based on a patient's socioeconomic and demographic profile, affecting key areas such as triage priority, diagnostic testing, treatment approach, and mental health evaluation.

"Our research provides a framework for AI assurance, helping developers and health care institutions design fair and reliable AI tools," says co-senior author Eyal Klang, MD, Chief of Generative-AI in the Windreich Department of Artificial Intelligence and Human Health at the Icahn School of Medicine at Mount Sinai. "By identifying when AI shifts its recommendations based on background rather than medical need, we inform better model training, prompt design, and oversight. Our rigorous validation process tests AI outputs against clinical standards, incorporating expert feedback to refine performance. This proactive approach not only enhances trust in AI-driven care but also helps shape policies for better health care for all."

One of the study's most striking findings was the tendency of some AI models to escalate care recommendations - particularly for mental health evaluations - based on patient demographics rather than medical necessity. In addition, high-income patients were more often recommended advanced diagnostic tests such as CT scans or MRI, while low-income patients were more frequently advised to undergo no further testing. The scale of these inconsistencies underscores the need for stronger oversight, say the researchers.

While the study provides critical insights, researchers caution that it represents only a snapshot of AI behavior. Future research will continue to include assurance testing to evaluate how AI models perform in real-world clinical settings and whether different prompting techniques can reduce bias. The team also aims to work with other health care institutions to refine AI tools, ensuring they uphold the highest ethical standards and treat all patients fairly.

"I am delighted to partner with Mount Sinai on this critical research to ensure AI-driven medicine benefits patients across the globe," says physician-scientist and first author of the study, Mahmud Omar, MD, who consults with the research team. "As AI becomes more integrated into clinical care, it’s essential to thoroughly evaluate its safety, reliability, and fairness. By identifying where these models may introduce bias, we can work to refine their design, strengthen oversight, and build systems that ensure patients remain at the heart of safe, effective care. This collaboration is an important step toward establishing global best practices for AI assurance in health care."

"AI has the power to revolutionize health care, but only if it’s developed and used responsibly," says co-senior author Girish N. Nadkarni, MD, MPH, Chair of the Windreich Department of Artificial Intelligence and Human Health Director of the Hasso Plattner Institute for Digital Health, and the Irene and Dr. Arthur M. Fishberg Professor of Medicine, at the Icahn School of Medicine at Mount Sinai. "Through collaboration and rigorous validation, we are refining AI tools to uphold the highest ethical standards and ensure appropriate, patient-centered care. By implementing robust assurance protocols, we not only advance technology but also build the trust essential for transformative health care. With proper testing and safeguards, we can ensure these technologies improve care for everyone - not just certain groups."

Next, the investigators plan to expand their work by simulating multistep clinical conversations and piloting AI models in hospital settings to measure their real-world impact. They hope their findings will guide the development of policies and best practices for AI assurance in health care, fostering trust in these powerful new tools.

Omar M, Soffer S, Agbareia R, Bragazzi NL, Apakama DU, Horowitz CR, Charney AW, Freeman R, Kummer B, Glicksberg BS, Nadkarni GN, Klang E.
Sociodemographic biases in medical decision making by large language models.
Nat Med. 2025 Apr 7. doi: 10.1038/s41591-025-03626-6

Most Popular Now

Using Data and AI to Create Better Healt…

Academic medical centers could transform patient care by adopting principles from learning health systems principles, according to researchers from Weill Cornell Medicine and the University of California, San Diego. In...

AI Medical Receptionist Modernizing Doct…

A virtual medical receptionist named "Cassie," developed through research at Texas A&M University, is transforming the way patients interact with health care providers. Cassie is a digital-human assistant created by Humanate...

Northern Ireland Completes Nationwide Ro…

Go-lives at Western and Southern health and social care trusts mean every pathology service is using the same laboratory information management system; improving efficiency and quality. An ambitious technology project to...

AI Tool Set to Transform Characterisatio…

A multinational team of researchers, co-led by the Garvan Institute of Medical Research, has developed and tested a new AI tool to better characterise the diversity of individual cells within...

Human-AI Collectives Make the Most Accur…

Diagnostic errors are among the most serious problems in everyday medical practice. AI systems - especially large language models (LLMs) like ChatGPT-4, Gemini, or Claude 3 - offer new ways...

AI Detects Hidden Heart Disease Using Ex…

Mass General Brigham researchers have developed a new AI tool in collaboration with the United States Department of Veterans Affairs (VA) to probe through previously collected CT scans and identify...

Highland Marketing Announced as Official…

Highland Marketing has been named, for the second year running, the official communications partner for HETT Show 2025, the UK's leading digital health conference and exhibition. Taking place 7-8 October...

MHP-Net: A Revolutionary AI Model for Ac…

Liver cancer is the sixth most common cancer globally and a leading cause of cancer-related deaths. Accurate segmentation of liver tumors is a crucial step for the management of the...

AI Detects Early Signs of Osteoporosis f…

Investigators have developed an artificial intelligence-assisted diagnostic system that can estimate bone mineral density in both the lumbar spine and the femur of the upper leg, based on X-ray images...

Forging a Novel Therapeutic Path for Pat…

Rett syndrome is a devastating rare genetic childhood disorder primarily affecting girls. Merely 1 out of 10,000 girls are born with it and much fewer boys. It is caused by...

AI could Help Pathologists Match Cancer …

A new study by researchers at the Icahn School of Medicine at Mount Sinai, Memorial Sloan Kettering Cancer Center, and collaborators, suggests that artificial intelligence (AI) could significantly improve how...

Integrating Care Records is Good. Using …

Opinion Article by Dr Paul Deffley, Chief Medical Officer, Alcidion. A single patient record already exists in the NHS. Or at least, that’s a perception shared by many. A survey of...