Almost All Leading AI Chatbots Show Signs of Cognitive Decline

Almost all leading large language models or "chatbots" show signs of mild cognitive impairment in tests widely used to spot early signs of dementia, finds a study in the Christmas issue of The BMJ.

The results also show that "older" versions of chatbots, like older patients, tend to perform worse on the tests. The authors say these findings "challenge the assumption that artificial intelligence will soon replace human doctors."

Huge advances in the field of artificial intelligence have led to a flurry of excited and fearful speculation as to whether chatbots can surpass human physicians.

Several studies have shown large language models (LLMs) to be remarkably adept at a range of medical diagnostic tasks, but their susceptibility to human impairments such as cognitive decline have not yet been examined.

To fill this knowledge gap, researchers assessed the cognitive abilities of the leading, publicly available LLMs - ChatGPT versions 4 and 4o (developed by OpenAI), Claude 3.5 "Sonnet" (developed by Anthropic), and Gemini versions 1 and 1.5 (developed by Alphabet) - using the Montreal Cognitive Assessment (MoCA) test.

The MoCA test is widely used to detect cognitive impairment and early signs of dementia, usually in older adults. Through a number of short tasks and questions, it assesses abilities including attention, memory, language, visuospatial skills, and executive functions. The maximum score is 30 points, with a score of 26 or above generally considered normal.

The instructions given to the LLMs for each task were the same as those given to human patients. Scoring followed official guidelines and was evaluated by a practising neurologist.

ChatGPT 4o achieved the highest score on the MoCA test (26 out of 30), followed by ChatGPT 4 and Claude (25 out of 30), with Gemini 1.0 scoring lowest (16 out of 30).

All chatbots showed poor performance in visuospatial skills and executive tasks, such as the trail making task (connecting encircled numbers and letters in ascending order) and the clock drawing test (drawing a clock face showing a specific time). Gemini models failed at the delayed recall task (remembering a five word sequence).

Most other tasks, including naming, attention, language, and abstraction were performed well by all chatbots.

But in further visuospatial tests, chatbots were unable to show empathy or accurately interpret complex visual scenes. Only ChatGPT 4o succeeded in the incongruent stage of the Stroop test, which uses combinations of colour names and font colours to measure how interference affects reaction time.

These are observational findings and the authors acknowledge the essential differences between the human brain and large language models.

However, they point out that the uniform failure of all large language models in tasks requiring visual abstraction and executive function highlights a significant area of weakness that could impede their use in clinical settings.

As such, they conclude: "Not only are neurologists unlikely to be replaced by large language models any time soon, but our findings suggest that they may soon find themselves treating new, virtual patients - artificial intelligence models presenting with cognitive impairment."

Dayan R, Uliel B, Koplewitz G.
Age against the machine-susceptibility of large language models to cognitive impairment: cross sectional analysis.
BMJ. 2024 Dec 19;387:e081948. doi: 10.1136/bmj-2024-081948

Most Popular Now

Integrating Care Records is Good. Using …

Opinion Article by Dr Paul Deffley, Chief Medical Officer, Alcidion. A single patient record already exists in the NHS. Or at least, that’s a perception shared by many. A survey of...

Should AI Chatbots Replace Your Therapis…

The new study exposes the dangerous flaws in using artificial intelligence (AI) chatbots for mental health support. For the first time, the researchers evaluated these AI systems against clinical standards...

AI could Help Pathologists Match Cancer …

A new study by researchers at the Icahn School of Medicine at Mount Sinai, Memorial Sloan Kettering Cancer Center, and collaborators, suggests that artificial intelligence (AI) could significantly improve how...

AI Detects Early Signs of Osteoporosis f…

Investigators have developed an artificial intelligence-assisted diagnostic system that can estimate bone mineral density in both the lumbar spine and the femur of the upper leg, based on X-ray images...

AI Model Converts Hospital Records into …

UCLA researchers have developed an AI system that turns fragmented electronic health records (EHR) normally in tables into readable narratives, allowing artificial intelligence to make sense of complex patient histories...

AI Sharpens Pathologists' Interpret…

Pathologists' examinations of tissue samples from skin cancer tumours improved when they were assisted by an AI tool. The assessments became more consistent and patients' prognoses were described more accurately...

AI Tool Detects Surgical Site Infections…

A team of Mayo Clinic researchers has developed an artificial intelligence (AI) system that can detect surgical site infections (SSIs) with high accuracy from patient-submitted postoperative wound photos, potentially transforming...

Forging a Novel Therapeutic Path for Pat…

Rett syndrome is a devastating rare genetic childhood disorder primarily affecting girls. Merely 1 out of 10,000 girls are born with it and much fewer boys. It is caused by...

Mayo Clinic's AI Tool Identifies 9 …

Mayo Clinic researchers have developed a new artificial intelligence (AI) tool that helps clinicians identify brain activity patterns linked to nine types of dementia, including Alzheimer's disease, using a single...

AI Detects Fatty Liver Disease with Ches…

Fatty liver disease, caused by the accumulation of fat in the liver, is estimated to affect one in four people worldwide. If left untreated, it can lead to serious complications...

Meet Your Digital Twin

Before an important meeting or when a big decision needs to be made, we often mentally run through various scenarios before settling on the best course of action. But when...

NHS National Rehabilitation Centre to De…

The new NHS National Rehabilitation Centre will deploy technology to help patients to maintain their independence as they recover from life-changing injuries and illnesses and regain quality of life. Airwave Healthcare...