Imagine you have 20 or so experimentally validated EGFR-binding proteins, maybe from our recent protein design competition. None of them are fit

Protein Optimization 101: Insights from the literature

submited by

Style Pass

2024-10-08 20:30:16

Imagine you have 20 or so experimentally validated EGFR-binding proteins, maybe from our recent protein design competition. None of them are fit for your application yet, so you want to further optimize them before you submit them to round two. Or maybe you got some decent binders, but need to re-optimize to find a trade-off between their binding strength and their expression in a high-yield bioreactor for commercialization.

You can of course re-use tools like RFdiffusion or the newly released AlphaProteo. You can have the model propose hundreds of promising variants of your designs, and then use heuristics (for example other ML models) to select candidates for another round of validation. However, you might not have the budget of DeepMind or the capacity of the Baker lab to filter out and screen hundreds of candidates playing this “de novo slot machine” in hopes of improvement.

In this blog post, we will give a brief introduction to optimization techniques aiming to minimize the number of real world measurements required, i.e. aiming for "sample efficiency" to use the technical term. We will use a hypothetical lab budget constrained protein engineering scenario, characteristic of an individual designer or small company which just got some angel investment to build a proof of concept, but the same techniques can be used purely in silico, for example to optimize the score function used in your favourite EGFR binding competition.