Nature Communications                          volume  12, Article number: 3168  (2021 )             Cite this articl

Structure-based protein function prediction using graph convolutional networks

submited by
Style Pass
2021-05-30 13:00:07

Nature Communications volume  12, Article number: 3168 (2021 ) Cite this article

The rapid increase in the number of proteins in sequence databases and the diversity of their functions challenge computational approaches for automated function prediction. Here, we introduce DeepFRI, a Graph Convolutional Network for predicting protein functions by leveraging sequence features extracted from a protein language model and protein structures. It outperforms current leading methods and sequence-based Convolutional Neural Networks and scales to the size of current sequence repositories. Augmenting the training set of experimental structures with homology models allows us to significantly expand the number of predictable functions. DeepFRI has significant de-noising capability, with only a minor drop in performance when experimental structures are replaced by protein models. Class activation mapping allows function predictions at an unprecedented resolution, allowing site-specific annotations at the residue-level in an automated manner. We show the utility and high performance of our method by annotating structures from the PDB and SWISS-MODEL, making several new confident function predictions. DeepFRI is available as a webserver at https://beta.deepfri.flatironinstitute.org/.

Proteins fold into 3-dimensional structures to carry out a wide variety of functions within the cell1. Even though many functional regions of proteins are disordered, the majority of domains fold into specific and ordered three-dimensional conformations2,3,4,5,6. In turn, the structural features of proteins determine a wide range of functions: from binding specificity and conferring mechanical stability, to catalysis of biochemical reactions, transport, and signal transduction. There are several widely used classification schemes that organize these myriad protein functions including the Gene Ontology (GO) Consortium7, Enzyme Commission (EC) numbers8, Kyoto Encyclopedia of Genes and Genomes (KEGG)9, and others. For example, GO classifies proteins into hierarchically related functional classes organized into three different ontologies: Molecular Function (MF), Biological Process (BP), and Cellular Component (CC), to describe different aspects of protein functions.

Leave a Comment