By systematic analysis of patient cases, we evaluated the clinical utility of open-source large language models (LLMs), such as the DeepSeek models, for implementation in medical applications. Their performance on clinical decision-making tasks was comparable to and partly better than proprietary models GPT-4o and Gemini-2.0 Flash Thinking Experimental, respectively.
Quer, G. & Topol, E. J. The potential for large language models to transform cardiovascular medicine. Lancet Digit. Health 6, e767–e771 (2024). A review article that presents opportunities and limitations of artificial intelligence models in the field of cardiovascular medicine.
de Hond, A. et al. From text to treatment: the crucial role of validation for generative large language models in health care. Lancet Digit. Health 6, e441–e443 (2024). A comment on the challenge of validating LLMs in healthcare, suggesting general, task-specific and clinical validation.
Ong, J. C. L. et al. Medical ethics of large language models in medicine. NEJM AI https://doi.org/10.1056/Aira2400038 (2024). A review article that presents bioethical principles to promote the responsible use of LLMs, enabling their use ethically, equitably and effectively in medicine.