ChatGPT-4, an artificial intelligence program designed to understand and generate human-like text, outperformed internal medicine residents and attend

Chatbot outperformed physicians in clinical reasoning in head-to-head study

submited by
Style Pass
2024-04-01 21:30:34

ChatGPT-4, an artificial intelligence program designed to understand and generate human-like text, outperformed internal medicine residents and attending physicians at two academic medical centers at processing medical data and demonstrating clinical reasoning. In a research letter published in JAMA Internal Medicine, physician-scientists at Beth Israel Deaconess Medical Center (BIDMC) compared a large language model's (LLM) reasoning abilities directly against human performance using standards developed to assess physicians.

"It became clear very early on that LLMs can make diagnoses, but anybody who practices medicine knows there's a lot more to medicine than that," said Adam Rodman MD, an internal medicine physician and investigator in the department of medicine at BIDMC. "There are multiple steps behind a diagnosis, so we wanted to evaluate whether LLMs are as good as physicians at doing that kind of clinical reasoning. It's a surprising finding that these things are capable of showing the equivalent or better reasoning than people throughout the evolution of clinical case."

Rodman and colleagues used a previously validated tool developed to assess physicians' clinical reasoning called the revised-IDEA (r-IDEA) score. The investigators recruited 21 attending physicians and 18 residents who each worked through one of 20 selected clinical cases comprised of four sequential stages of diagnostic reasoning. The authors instructed physicians to write out and justify their differential diagnoses at each stage. The chatbot GPT-4 was given a prompt with identical instructions and ran all 20 clinical cases. Their answers were then scored for clinical reasoning (r-IDEA score) and several other measures of reasoning.

Leave a Comment