There are more ways to synthesize a 100-amino acid (aa) protein (20100) than there are atoms in the universe. Only a very small fraction of such a vas

The genetic architecture of protein stability

submited by
Style Pass
2024-10-05 23:30:09

There are more ways to synthesize a 100-amino acid (aa) protein (20100) than there are atoms in the universe. Only a very small fraction of such a vast sequence space can ever be experimentally or computationally surveyed. Deep neural networks are increasingly being used to navigate high-dimensional sequence spaces1. However, these models are extremely complicated. Here, by experimentally sampling from sequence spaces larger than 1010, we show that the genetic architecture of at least some proteins is remarkably simple, allowing accurate genetic prediction in high-dimensional sequence spaces with fully interpretable energy models. These models capture the nonlinear relationships between free energies and phenotypes but otherwise consist of additive free energy changes with a small contribution from pairwise energetic couplings. These energetic couplings are sparse and associated with structural contacts and backbone proximity. Our results indicate that protein genetics is actually both rather simple and intelligible.

Massively parallel experiments allow the effects of single aa changes in proteins to be comprehensively quantified2,3. Similarly, experimental analysis of double mutants is feasible, at least for small proteins4,5. The analysis of higher-order mutants, however, quickly becomes infeasible owing to the combinatorial explosion of possible genotypes. For example, the number of ways to combine one mutation at 34 different sites in a protein is 234 ≈ 1.7 × 1010. Experimental exploration of such a large number of genotypes is extremely challenging6 given current technology, which—so far—has experimentally analysed sequence spaces up to about 106 (refs. 4,7).

Leave a Comment