We strive to create an environment conducive to many different types of research across many different time scales and levels of risk. Introducing Tx-

Tx-LLM: Supporting therapeutic development with large language models

submited by
Style Pass
2024-10-10 00:00:05

We strive to create an environment conducive to many different types of research across many different time scales and levels of risk.

Introducing Tx-LLM, a language model fine-tuned to predict properties of biological entities across the therapeutic development pipeline, from early-stage target discovery to late-stage clinical trial approval.

Most candidates for therapeutic drugs fail clinical trials. Even if successful, they typically require 10–15 years and $1–2 billion to develop. A major reason for this is that the development pipeline contains many steps and many independent criteria that a therapeutic must satisfy. For example, a therapeutic should interact with its specific target but not other entities, producing the desired functional improvement but not off-target toxicity. Additionally, it should be able to travel to its desired destination, be cleared out of the body in an appropriate amount of time, and be suitable for manufacturing at scale. As one can imagine, measuring these properties experimentally is expensive and takes a long time, leaving an opportunity for an alternative approach: using machine learning (ML) to predict these properties quickly and efficiently.

To that end, we introduce Tx-LLM, a large language model (LLM) fine-tuned from PaLM-2, to predict properties of many entities (e.g., small molecules, proteins, nucleic acids, cell lines, diseases) that are relevant to therapeutic development. Tx-LLM is trained on 66 drug discovery datasets ranging from early-stage target gene identification to late-stage clinical trial approval, and therefore, is best suited to research on therapeutic applications. With a single set of weights, Tx-LLM achieved competitive performance with state-of-the-art models on 43 out of the 66 tasks and exceeded them on 22. Interestingly, we also observed that Tx-LLM exhibited abilities to combine molecular information with textual information as well as to transfer capabilities between tasks with diverse types of therapeutics. Overall, Tx-LLM is a single model that may be useful throughout the development of therapeutic drugs pipeline.

Leave a Comment