Teaching AI to Ask Clinical Questions

Physicians often query a patient's electronic health record for information that helps them make treatment decisions, but the cumbersome nature of these records hampers the process. Research has shown that even when a doctor has been trained to use an electronic health record (EHR), finding an answer to just one question can take, on average, more than eight minutes.

The more time physicians must spend navigating an oftentimes clunky EHR interface, the less time they have to interact with patients and provide treatment.

Researchers have begun developing machine-learning models that can streamline the process by automatically finding information physicians need in an EHR. However, training effective models requires huge datasets of relevant medical questions, which are often hard to come by due to privacy restrictions. Existing models struggle to generate authentic questions - those that would be asked by a human doctor - and are often unable to successfully find correct answers.

To overcome this data shortage, researchers at MIT partnered with medical experts to study the questions physicians ask when reviewing EHRs. Then, they built a publicly available dataset of more than 2,000 clinically relevant questions written by these medical experts.

When they used their dataset to train a machine-learning model to generate clinical questions, they found that the model asked high-quality and authentic questions, as compared to real questions from medical experts, more than 60 percent of the time.

With this dataset, they plan to generate vast numbers of authentic medical questions and then use those questions to train a machine-learning model which would help doctors find sought-after information in a patient's record more efficiently.

"Two thousand questions may sound like a lot, but when you look at machine-learning models being trained nowadays, they have so much data, maybe billions of data points. When you train machine-learning models to work in health care settings, you have to be really creative because there is such a lack of data," says lead author Eric Lehman, a graduate student in the Computer Science and Artificial Intelligence Laboratory (CSAIL).

The senior author is Peter Szolovits, a professor in the Department of Electrical Engineering and Computer Science (EECS) who heads the Clinical Decision-Making Group in CSAIL and is also a member of the MIT-IBM Watson AI Lab. The research paper, a collaboration between co-authors at MIT, the MIT-IBM Watson AI Lab, IBM Research, and the doctors and medical experts who helped create questions and participated in the study, will be presented at the annual conference of the North American Chapter of the Association for Computational Linguistics.

"Realistic data is critical for training models that are relevant to the task yet difficult to find or create," Szolovits says. "The value of this work is in carefully collecting questions asked by clinicians about patient cases, from which we are able to develop methods that use these data and general language models to ask further plausible questions."

Data deficiency

The few large datasets of clinical questions the researchers were able to find had a host of issues, Lehman explains. Some were composed of medical questions asked by patients on web forums, which are a far cry from physician questions. Other datasets contained questions produced from templates, so they are mostly identical in structure, making many questions unrealistic.

"Collecting high-quality data is really important for doing machine-learning tasks, especially in a health care context, and we’ve shown that it can be done," Lehman says.

To build their dataset, the MIT researchers worked with practicing physicians and medical students in their last year of training. They gave these medical experts more than 100 EHR discharge summaries and told them to read through a summary and ask any questions they might have. The researchers didn't put any restrictions on question types or structures in an effort to gather natural questions. They also asked the medical experts to identify the “trigger text” in the EHR that led them to ask each question.

For instance, a medical expert might read a note in the EHR that says a patient's past medical history is significant for prostate cancer and hypothyroidism. The trigger text "prostate cancer" could lead the expert to ask questions like "date of diagnosis?" or "any interventions done?"

They found that most questions focused on symptoms, treatments, or the patient's test results. While these findings weren't unexpected, quantifying the number of questions about each broad topic will help them build an effective dataset for use in a real, clinical setting, says Lehman.

Once they had compiled their dataset of questions and accompanying trigger text, they used it to train machine-learning models to ask new questions based on the trigger text.

Then the medical experts determined whether those questions were "good" using four metrics: understandability (Does the question make sense to a human physician?), triviality (Is the question too easily answerable from the trigger text?), medical relevance (Does it makes sense to ask this question based on the context?), and relevancy to the trigger (Is the trigger related to the question?).

Cause for concern

The researchers found that when a model was given trigger text, it was able to generate a good question 63 percent of the time, whereas a human physician would ask a good question 80 percent of the time.

They also trained models to recover answers to clinical questions using the publicly available datasets they had found at the outset of this project. Then they tested these trained models to see if they could find answers to "good" questions asked by human medical experts.

The models were only able to recover about 25 percent of answers to physician-generated questions.

"That result is really concerning. What people thought were good-performing models were, in practice, just awful because the evaluation questions they were testing on were not good to begin with," Lehman says.

The team is now applying this work toward their initial goal: building a model that can automatically answer physicians' questions in an EHR. For the next step, they will use their dataset to train a machine-learning model that can automatically generate thousands or millions of good clinical questions, which can then be used to train a new model for automatic question answering.

While there is still much work to do before that model could be a reality, Lehman is encouraged by the strong initial results the team demonstrated with this dataset.

Lehman E, Lialin V, Legaspi KY, Sy AJ, Pile PT, Alberto NR, Ragasa RR, Puyat CV, Alberto IR, Alfonso PG, Taliño M.
Learning to Ask Like a Physician.
arXiv preprint arXiv:2206.02696. 2022. doi: 10.48550/arXiv.2206.02696

Most Popular Now

Northern Lincolnshire and Goole NHS Foun…

Northern Lincolnshire and Goole NHS Foundation Trust (NLAG) has launched new NHS App features to transform the way patients access and manage their appointments within the NHS. The programme, known...

Orion Health Strengthens French Business…

Orion Health is strengthening its presence in France with the appointment of digital health industry heavyweight, Tristan Debove, to lead its operations. Tristan Debove has more than 25 years of experience...

Genomics England Deploys Sectra Imaging …

Genomics England has completed installation of an enterprise imaging system that will help to support a world-pioneering initiative for cancer research. The programme is linking whole genome sequencing, pathology and...

AI Approach may Help Detect Alzheimer's …

Although investigators have made strides in detecting signs of Alzheimer's disease using high-quality brain imaging tests collected as part of research studies, a team at Massachusetts General Hospital (MGH) recently...

AI Predicts Cancer Patient Survival by R…

A team of researchers from the University of British Columbia and BC Cancer have developed an artificial intelligence (AI) model that predicts cancer patient survival more accurately and with more...

Will Future Computers Run on Human Brain…

A "biocomputer" powered by human brain cells could be developed within our lifetime, according to Johns Hopkins University researchers who expect such technology to exponentially expand the capabilities of modern...

Virtual Reality Games can be Used as a T…

Virtual reality gamers (VR game) who finished it faster than their fellow gamers also have higher levels of general intelligence and processing capacity. This was the result of a study...

Detecting Anaemia Earlier in Children Us…

Researchers at UCL and University of Ghana have successfully predicted whether children have anaemia using only a set of smartphone images. The study, published in PLOS ONE, brought together researchers and...

AI can Help Optimize CT Scan X-Ray Radia…

Computed tomography (CT) is one of the most powerful and well-established diagnostic tools available to modern medicine. An increasing number of people have been opting for CT scans, raising concerns...

Study Reveals Smartphone Spyware Apps ar…

Smartphone spyware apps that allow people to spy on each other are not only hard to notice and detect, they also will easily leak the sensitive personal information they...

Orion Health Appoints Mark Hindle as Vic…

Orion Health has appointed Mark Hindle as its new vice president for the UK and Ireland. Mark has joined the leading supplier of digital tools to improve healthcare experience from...