AI tool can accurately generate answers to patient queries about EHR

In a nationwide trend that has played out during the pandemic, many more NYU Langone Health patients have begun using electronic health record (EHR) tools to ask doctors questions, refill prescriptions and review test results. Many of those digital inquiries have come through a messaging tool called In Basket, which is built into NYU Langone’s EHR system, EPIC.

Although physicians have always spent time managing EHR messages, in recent years they have seen a more than 30 percent annual raise in the number of messages received daily, according to an article by Paul A. Testa, MD, chief medical information officer at NYU Langone. Dr. Testa wrote that it is not uncommon for physicians to receive more than 150 In Basket messages per day. Because health care systems are not designed to handle this type of traffic, physicians have ended up filling the gap by spending long hours after work reviewing messages. This burden is cited as the reason half of physicians report burnout.

Now, a novel study led by researchers at NYU Grossman School of Medicine shows that an AI tool can produce responses to patients’ EHR queries as accurately as their human health care providers, and with greater perceived “empathy.” The findings underscore the potential for these tools to drastically reduce the workload of In Basket physicians while improving their communication with patients, provided that human providers review AI drafts before they are sent out.

NYU Langone is testing the capabilities of generative AI (genAI), in which computer algorithms work out likely options for the next word in any given sentence based on how people have used words in context online. The result of this next-word prediction is that genAI chatbots can answer questions in convincing, human-like language. NYU Langone in 2023 licensed a “private instance” of GPT-4, the latest cousin of the eminent chatbot chatGPT, which allowed doctors to experiment with using real patient data while adhering to data privacy rules.

Published online on July 16 in JAMA network openThe novel study analyzed draft responses generated by GPT-4 to patient queries within the In Basket module, asking primary care physicians to compare them with people’s actual responses to those messages.

Our results suggest that chatbots can reduce the workload of care providers by enabling effective and empathetic responses to patient concerns. We found that EHR-integrated AI chatbots that leverage patient-specific data can produce messages of similar quality to human providers.

William Petite, M.D., lead author of the study, Clinical Assistant Professor, Department of Medicine, NYU Grossman School of Medicine

For the study, 16 primary care physicians evaluated 344 randomly assigned pairs of AI-human responses to patient messages for accuracy, relevance, completeness, and tone, and indicated whether they would exploit the AI ​​response as a first draft or have to start from scratch when writing a patient message. This was a blinded study, so physicians did not know whether the responses they reviewed were generated by humans or the AI ​​tool.

The research team found that the accuracy, completeness, and relevance of the generative AI responses and the human providers were not statistically different. The generative AI responses outperformed the human providers in terms of comprehensibility and tone by 9.5 percent. In addition, the AI ​​responses were more than twice as likely (125 percent more likely) to be perceived as empathetic and 62 percent more likely to exploit language that conveyed positivity (potentially related to hope) and belonging (“we’re in this together”).

On the other hand, the AI’s responses were also 38 percent longer and 31 percent more likely to exploit elaborate language, so further training is needed for the tool, the researchers said. While the human subjects answered the questions at a sixth-grade level, the AI ​​wrote at an eighth-grade level, according to a standard measure of readability called the Flesch Kincaid scale.

The researchers argued that the chatbots’ exploit of private patient information, rather than general internet information, better approximates how the technology would be used in the real world. Further research will be needed to confirm whether private data specifically improves the performance of AI tools.

“This work demonstrates that an AI tool can produce high-quality, workable responses to patient requests,” said corresponding author Devin Mann, MD, senior director of information innovation at NYU Langone’s Medical Center Information Technology (MCIT). “With this physician consent, the quality of GenAI messages will be equal to human-generated responses in quality, communication style, and usability in the near future,” added Dr. Mann, who is also a professor in the Departments of Population Health and Medicine.

In addition to Dr. Petite and Dr. Mann, the authors of the NYU Langone study were: Beatrix Brandfield-Harvey, BS; Zoe Jonassen, PhD; Soumik Mandal, PhD; Elizabeth R. Stevens, MPH, PhD; Vincent J. Major, PhD; Erin Lostraglio; Adam C. Szerencsy, DO; Simon A. Jones, PhD; Yindalon Aphinyanaphongs, MD, PhD; and Stephen B. Johnson, PhD. Additional authors included Oded Nov, MSc, PhD, of NYU Tandon School of Engineering and Batia Mishan Wiesenfeld, PhD, of NYU Stern School of Business.

The study was funded by the National Science Foundation under grants 1928614 and 2129076 and the Swiss National Science Foundation under grants P500PS_202955 and P5R5PS_217714.

Leave a Reply

Your email address will not be published. Required fields are marked *