Multiscale differential geometry learning for protein flexibility analysis
Protein flexibility is crucial for understanding protein structures, functions, and dynamics, and it can be measured through experimental methods such as X-ray crystallography. Theoretical approaches have also been developed to predict B-factor values, which reflect protein flexibility. Previous models have made significant strides in analyzing B-factors by fitting experimental data. In this study, we propose a novel approach for B-factor prediction using differential geometry theory, based on the assumption that the intrinsic properties of proteins reside on a family of low-dimensional manifolds embedded within the high-dimensional space of protein structures. By analyzing the mean and Gaussian curvatures of a set of kernel-function-defined low-dimensional manifolds, we develop effective and robust multiscale differential geometry (mDG) models. Our mDG model demonstrates a 27\% increase in accuracy compared to the classical Gaussian network model (GNM) in predicting B-factors for a dataset of 364 proteins. Additionally, by incorporating both global and local protein features, we construct a highly effective machine learning model for the blind prediction of B-factors. Extensive least-squares approximations and machine learning-based blind predictions validate the effectiveness of the mDG modeling approach for B-factor prediction.