Large language models (LLMs) provide unprecedented opportunities to augment humans in various industries, including healthcare. However, understanding the language models’ limitations and mitigations is essential before applying them in regulated environments. In recent years multiple studies have been published to propose new techniques for tuning existing language models for medical tasks, and selecting the most suitable one for specific tasks requires exploring all of the model’s capabilities and weaknesses. The availability of scientific data over the internet helped to advance the fine-tuning techniques to provide task-specific models like SciBERT and BioBERT, which are trained on various biomedical and clinical sources to provide focused capabilities on specific tasks. Most industry medical models are built based on fine-tuning models, which require extensive training data to achieve high-quality results in new domains, limiting their expansion based on data availability. Few-shot learners, such as GPT3 provided the ability to train the model on a new domain with zero or few examples, resolving the need for operational cost to label the vast amount of data. Still, it comes with issues, like the tendency to generate nonfactual information, and the research is restricted because GPT3 is available only behind an API. For this, Meta released open-source few-shot learners with the ability to download for the researcher leading the way to understand these models and prepare them for the industry. A recent alternative to pre-trained models is retrieval models with the ability to search trillions of words, which can address potential privacy, bias, and toxicity concerns of using language models in a healthcare system while maintaining quality.
Biomedical and clinical tasks encompass many research areas, but this review focuses on the functions and datasets that language models can address. They consist of, but are not limited to: