While large language models (LLMs) have been successfully applied to various tasks, they still face challenges with hallucinations, especially for spe

Search code, repositories, users, issues, pull requests...

submited by
Style Pass
2024-02-12 19:30:22

While large language models (LLMs) have been successfully applied to various tasks, they still face challenges with hallucinations, especially for specialized knowledge. We propose GeneGPT, a novel approach to address this challenge by teaching LLMs to exploit biomedical tools, specifically NCBI Web APIs, for answering information-seeking questions. Our approach utilizes in-context learning, coupled with a novel decoding algorithm that can identify and execute API calls. Empirical results show that GeneGPT achieves state-of-the-art (SOTA) performance on eight GeneTuring tasks with an average score of 0.83, largely surpassing previous SOTA (0.44 by New Bing), biomedical LLMs such as BioGPT (0.04), and ChatGPT (0.12). Further analyses suggest that: Firstly, API demonstrations are more effective than documentations for in-context tool learning; Secondly, GeneGPT can generalize to longer chains of API calls and answer multi-hop questions; Lastly, the unique and task-specific errors made by GeneGPT provide valuable insights for future improvements. Our results underline the potential of integrating domain-specific tools with LLMs for improved access and accuracy in specialized knowledge areas.

This work was supported by the Intramural Research Programs of the National Institutes of Health, National Library of Medicine.

Leave a Comment