1
|
Su Z, Tong Y, Wei GW. Hodge Decomposition of Single-Cell RNA Velocity. J Chem Inf Model 2024; 64:3558-3568. [PMID: 38572676 PMCID: PMC11035094 DOI: 10.1021/acs.jcim.4c00132] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2024] [Revised: 03/21/2024] [Accepted: 03/22/2024] [Indexed: 04/05/2024]
Abstract
RNA velocity has the ability to capture the cell dynamic information in the biological processes; yet, a comprehensive analysis of the cell state transitions and their associated chemical and biological processes remains a gap. In this work, we provide the Hodge decomposition, coupled with discrete exterior calculus (DEC), to unveil cell dynamics by examining the decomposed curl-free, divergence-free, and harmonic components of the RNA velocity field in a low dimensional representation, such as a UMAP or a t-SNE representation. Decomposition results show that the decomposed components distinctly reveal key cell dynamic features such as cell cycle, bifurcation, and cell lineage differentiation, regardless of the choice of the low-dimensional representations. The consistency across different representations demonstrates that the Hodge decomposition is a reliable and robust way to extract these cell dynamic features, offering unique analysis and insightful visualization of single-cell RNA velocity fields.
Collapse
Affiliation(s)
- Zhe Su
- Department
of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
| | - Yiying Tong
- Department
of Computer Science and Engineering, Michigan
State University, East Lansing, Michigan 48824, United States
| | - Guo-Wei Wei
- Department
of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
- Department
of Electrical and Computer Engineering, Michigan State University, East
Lansing, Michigan 48824, United States
- Department
of Biochemistry and Molecular Biology, Michigan
State University, East Lansing, Michigan 48824, United States
| |
Collapse
|
2
|
Pandey A, Liu E, Graham J, Chen W, Keten S. B-factor prediction in proteins using a sequence-based deep learning model. PATTERNS (NEW YORK, N.Y.) 2023; 4:100805. [PMID: 37720331 PMCID: PMC10499862 DOI: 10.1016/j.patter.2023.100805] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/28/2023] [Revised: 05/23/2023] [Accepted: 07/07/2023] [Indexed: 09/19/2023]
Abstract
B factors provide critical insight into protein dynamics. Predicting B factors of an atom in new proteins remains challenging as it is impacted by their neighbors in Euclidean space. Previous learning methods developed have resulted in low Pearson correlation coefficients beyond the training set due to their limited ability to capture the effect of neighboring atoms. With the advances in deep learning methods, we develop a sequence-based model that is tested on 2,442 proteins and outperforms the state-of-the-art models by 30%. We find that the model learns that the B factor of a site is prominently affected by atoms within a 12-15 Å radius, which is in excellent agreement with cutoffs from protein network models. The ablation study revealed that the B factor can largely be predicted from the primary sequence alone. Based on the abovementioned points, our model lays a foundation for predicting other properties that are correlated with the B factor.
Collapse
Affiliation(s)
- Akash Pandey
- Department of Mechanical Engineering, Northwestern University, Evanston, IL, USA
| | - Elaine Liu
- Department of Mechanical Engineering, Northwestern University, Evanston, IL, USA
| | - Jacob Graham
- Department of Mechanical Engineering, Northwestern University, Evanston, IL, USA
| | - Wei Chen
- Department of Mechanical Engineering, Northwestern University, Evanston, IL, USA
| | - Sinan Keten
- Department of Mechanical Engineering, Northwestern University, Evanston, IL, USA
- Department of Civil and Environmental Engineering, Northwestern University, Evanston, IL, USA
| |
Collapse
|
3
|
Rana MM, Nguyen DD. Geometric graph learning with extended atom-types features for protein-ligand binding affinity prediction. Comput Biol Med 2023; 164:107250. [PMID: 37515872 DOI: 10.1016/j.compbiomed.2023.107250] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2023] [Revised: 06/12/2023] [Accepted: 07/07/2023] [Indexed: 07/31/2023]
Abstract
Understanding and accurately predicting protein-ligand binding affinity are essential in the drug design and discovery process. At present, machine learning-based methodologies are gaining popularity as a means of predicting binding affinity due to their efficiency and accuracy, as well as the increasing availability of structural and binding affinity data for protein-ligand complexes. In biomolecular studies, graph theory has been widely applied since graphs can be used to model molecules or molecular complexes in a natural manner. In the present work, we upgrade the graph-based learners for the study of protein-ligand interactions by integrating extensive atom types such as SYBYL and extended connectivity interactive features (ECIF) into multiscale weighted colored graphs (MWCG). By pairing with the gradient boosting decision tree (GBDT) machine learning algorithm, our approach results in two different methods, namely sybylGGL-Score and ecifGGL-Score. Both of our models are extensively validated in their scoring power using three commonly used benchmark datasets in the drug design area, namely CASF-2007, CASF-2013, and CASF-2016. The performance of our best model sybylGGL-Score is compared with other state-of-the-art models in the binding affinity prediction for each benchmark. While both of our models achieve state-of-the-art results, the SYBYL atom-type model sybylGGL-Score outperforms other methods by a wide margin in all benchmarks. Finally, the best-performing SYBYL atom-type model is evaluated on two test sets that are independent of CASF benchmarks.
Collapse
Affiliation(s)
- Md Masud Rana
- Department of Mathematics, University of Kentucky, Lexington, 40506, KY, USA.
| | - Duc Duy Nguyen
- Department of Mathematics, University of Kentucky, Lexington, 40506, KY, USA.
| |
Collapse
|
4
|
Merkurjev E, Nguyen DD, Wei GW. Multiscale Laplacian Learning. APPL INTELL 2023; 53:15727-15746. [PMID: 38031564 PMCID: PMC10686291 DOI: 10.1007/s10489-022-04333-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/08/2022] [Indexed: 11/29/2022]
Abstract
Machine learning has greatly influenced many fields, including science. However, despite of the tremendous accomplishments of machine learning, one of the key limitations of most existing machine learning approaches is their reliance on large labeled sets, and thus, data with limited labeled samples remains a challenge. Moreover, the performance of machine learning methods often severely hindered in case of diverse data, usually associated with smaller data sets or data associated with areas of study where the size of the data sets is constrained by high experimental cost and/or ethics. These challenges call for innovative strategies for dealing with these types of data. In this work, the aforementioned challenges are addressed by integrating graph-based frameworks, semi-supervised techniques, multiscale structures, and modified and adapted optimization procedures. This results in two innovative multiscale Laplacian learning (MLL) approaches for machine learning tasks, such as data classification, and for tackling data with limited samples, diverse data, and small data sets. The first approach, multikernel manifold learning (MML), integrates manifold learning with multikernel information and incorporates a warped kernel regularizer using multiscale graph Laplacians. The second approach, the multiscale MBO (MMBO) method, introduces multiscale Laplacians to the modification of the famous classical Merriman-Bence-Osher (MBO) scheme, and makes use of fast solvers. We demonstrate the performance of our algorithms experimentally on a variety of benchmark data sets, and compare them favorably to the state-of-art approaches.
Collapse
Affiliation(s)
| | - Duc Duy Nguyen
- Department of Mathematics, University of Kentucky, KY 40506, USA
| | - Guo-Wei Wei
- Department of Mathematics, Department of Biochemistry and Molecular Biology, Department of Electrical and Computer Engineering Michigan State University, MI 48824, USA
| |
Collapse
|
5
|
Ma S, Zheng S, Zhang W, Chen D, Pan F. Algebraic Graph-Based Machine Learning Model for Li-Cluster Prediction. J Phys Chem A 2023; 127:2051-2059. [PMID: 36808983 DOI: 10.1021/acs.jpca.3c00272] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/23/2023]
Abstract
In cluster research, determining the ground-state structure of medium-sized clusters is hindered by a large number of local minimum on potential energy surfaces. The global optimization heuristic algorithm is time-consuming due to the use of DFT to determine the relative size of the cluster energy. Although machine learning (ML) is proved to be a promising way to reduce the DFT computational costs, a suitable method to represent clusters as input vectors is one of the bottlenecks in the application of ML to cluster research. In this work, we proposed a multiscale weighted spectral subgraph (MWSS) as an effective low-dimension representation of clusters and build an MWSS-based ML model to discover the structure-energy relationships in lithium clusters. We combine this model with the particle swarm optimization algorithm and DFT calculations to search for globally stable structures of clusters. We have successfully predicted the ground-state structure of Li20.
Collapse
Affiliation(s)
- Shengming Ma
- School of Advanced Materials, Peking University Shenzhen Graduate School, Shenzhen 518055, People's Republic of China
| | - Shisheng Zheng
- School of Advanced Materials, Peking University Shenzhen Graduate School, Shenzhen 518055, People's Republic of China
| | - Wentao Zhang
- School of Advanced Materials, Peking University Shenzhen Graduate School, Shenzhen 518055, People's Republic of China
| | - Dong Chen
- School of Advanced Materials, Peking University Shenzhen Graduate School, Shenzhen 518055, People's Republic of China
| | - Feng Pan
- School of Advanced Materials, Peking University Shenzhen Graduate School, Shenzhen 518055, People's Republic of China
| |
Collapse
|
6
|
González-Durruthy M, Rial R, Liu Z, Ruso JM. Lysozyme allosteric interactions with β-blocker drugs. J Mol Liq 2022. [DOI: 10.1016/j.molliq.2022.120370] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
7
|
Gao K, Wang R, Chen J, Cheng L, Frishcosy J, Huzumi Y, Qiu Y, Schluckbier T, Wei X, Wei GW. Methodology-Centered Review of Molecular Modeling, Simulation, and Prediction of SARS-CoV-2. Chem Rev 2022; 122:11287-11368. [PMID: 35594413 PMCID: PMC9159519 DOI: 10.1021/acs.chemrev.1c00965] [Citation(s) in RCA: 30] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
Despite tremendous efforts in the past two years, our understanding of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), virus-host interactions, immune response, virulence, transmission, and evolution is still very limited. This limitation calls for further in-depth investigation. Computational studies have become an indispensable component in combating coronavirus disease 2019 (COVID-19) due to their low cost, their efficiency, and the fact that they are free from safety and ethical constraints. Additionally, the mechanism that governs the global evolution and transmission of SARS-CoV-2 cannot be revealed from individual experiments and was discovered by integrating genotyping of massive viral sequences, biophysical modeling of protein-protein interactions, deep mutational data, deep learning, and advanced mathematics. There exists a tsunami of literature on the molecular modeling, simulations, and predictions of SARS-CoV-2 and related developments of drugs, vaccines, antibodies, and diagnostics. To provide readers with a quick update about this literature, we present a comprehensive and systematic methodology-centered review. Aspects such as molecular biophysics, bioinformatics, cheminformatics, machine learning, and mathematics are discussed. This review will be beneficial to researchers who are looking for ways to contribute to SARS-CoV-2 studies and those who are interested in the status of the field.
Collapse
Affiliation(s)
- Kaifu Gao
- Department
of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
| | - Rui Wang
- Department
of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
| | - Jiahui Chen
- Department
of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
| | - Limei Cheng
- Clinical
Pharmacology and Pharmacometrics, Bristol
Myers Squibb, Princeton, New Jersey 08536, United States
| | - Jaclyn Frishcosy
- Department
of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
| | - Yuta Huzumi
- Department
of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
| | - Yuchi Qiu
- Department
of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
| | - Tom Schluckbier
- Department
of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
| | - Xiaoqi Wei
- Department
of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
| | - Guo-Wei Wei
- Department
of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
- Department
of Electrical and Computer Engineering, Michigan State University, East Lansing, Michigan 48824, United States
- Department
of Biochemistry and Molecular Biology, Michigan
State University, East Lansing, Michigan 48824, United States
| |
Collapse
|
8
|
Chen J, Zhao R, Tong Y, Wei GW. EVOLUTIONARY DE RHAM-HODGE METHOD. DISCRETE AND CONTINUOUS DYNAMICAL SYSTEMS. SERIES B 2021; 26:3785-3821. [PMID: 34675756 DOI: 10.3934/dcdsb.2020257] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
Abstract
The de Rham-Hodge theory is a landmark of the 20th Century's mathematics and has had a great impact on mathematics, physics, computer science, and engineering. This work introduces an evolutionary de Rham-Hodge method to provide a unified paradigm for the multiscale geometric and topological analysis of evolving manifolds constructed from a filtration, which induces a family of evolutionary de Rham complexes. While the present method can be easily applied to close manifolds, the emphasis is given to more challenging compact manifolds with 2-manifold boundaries, which require appropriate analysis and treatment of boundary conditions on differential forms to maintain proper topological properties. Three sets of unique evolutionary Hodge Laplacians are proposed to generate three sets of topology-preserving singular spectra, for which the multiplicities of zero eigenvalues correspond to exactly the persistent Betti numbers of dimensions 0, 1 and 2. Additionally, three sets of non-zero eigenvalues further reveal both topological persistence and geometric progression during the manifold evolution. Extensive numerical experiments are carried out via the discrete exterior calculus to demonstrate the potential of the proposed paradigm for data representation and shape analysis of both point cloud data and density maps. To demonstrate the utility of the proposed method, the application is considered to the protein B-factor predictions of a few challenging cases for which existing biophysical models break down.
Collapse
Affiliation(s)
- Jiahui Chen
- Department of Mathematics, Michigan State University, MI 48824, USA
| | - Rundong Zhao
- Department of Computer Science and Engineering, Michigan State University, MI 48824, USA
| | - Yiying Tong
- Department of Computer Science and Engineering, Michigan State University, MI 48824, USA
| | - Guo-Wei Wei
- Department of Mathematics, Michigan State University, MI 48824, USA
| |
Collapse
|
9
|
Deng X, Wang S, Han Z, Gong W, Liu Y, Li C. Dynamics of binding interactions of TDP-43 and RNA: An equally weighted multiscale elastic network model study. Proteins 2021; 90:589-600. [PMID: 34599611 DOI: 10.1002/prot.26255] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2021] [Revised: 09/15/2021] [Accepted: 09/21/2021] [Indexed: 01/03/2023]
Abstract
Transactive response DNA binding protein 43 (TDP-43), an alternative-splicing regulator, can specifically bind long UG-rich RNAs, associated with a range of neurodegenerative diseases. Upon binding RNA, TDP-43 undergoes a large conformational change with two RNA recognition motifs (RRMs) connected by a long linker rearranged, strengthening the binding affinity of TDP-43 with RNA. We extend the equally weighted multiscale elastic network model (ewmENM), including its Gaussian network model (ewmGNM) and Anisotropic network model (ewmANM), with the multiscale effect of interactions considered, to the characterization of the dynamics of binding interactions of TDP-43 and RNA. The results reveal upon RNA binding a loss of flexibility occurs to TDP-43's loop3 segments rich in positively charged residues and C-terminal of high flexibility, suggesting their anchoring RNA, induced fit and conformational adjustment roles in recognizing RNA. Additionally, based on movement coupling analyses, it is found that RNA binding strengthens the interactions among intra-RRM β-sheets and between RRMs partially through the linker's mediating role, which stabilizes RNA binding interface, facilitating RNA binding efficiency. In addition, utilizing our proposed thermodynamic cycle method combined with ewmGNM, we identify the key residues for RNA binding whose perturbations induce a large change in binding free energy. We identify not only the residues important for specific binding, but also the ones critical for the conformational rearrangement between RRMs. Furthermore, molecular dynamics simulations are also performed to validate and further interpret the ENM-based results. The study demonstrates a useful avenue to utilize ewmENM to investigate the protein-RNA interaction dynamics characteristics.
Collapse
Affiliation(s)
- Xueqing Deng
- Faculty of Environmental and Life Sciences, Beijing University of Technology, Beijing, China
| | - Shihao Wang
- Faculty of Environmental and Life Sciences, Beijing University of Technology, Beijing, China
| | - Zhongjie Han
- Faculty of Environmental and Life Sciences, Beijing University of Technology, Beijing, China
| | - Weikang Gong
- Faculty of Environmental and Life Sciences, Beijing University of Technology, Beijing, China
| | - Yang Liu
- Faculty of Environmental and Life Sciences, Beijing University of Technology, Beijing, China
| | - Chunhua Li
- Faculty of Environmental and Life Sciences, Beijing University of Technology, Beijing, China
| |
Collapse
|
10
|
Abstract
In the global health emergency caused by coronavirus disease 2019 (COVID-19), efficient and specific therapies are urgently needed. Compared with traditional small-molecular drugs, antibody therapies are relatively easy to develop; they are as specific as vaccines in targeting severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2); and they have thus attracted much attention in the past few months. This article reviews seven existing antibodies for neutralizing SARS-CoV-2 with 3D structures deposited in the Protein Data Bank (PDB). Five 3D antibody structures associated with the SARS-CoV spike (S) protein are also evaluated for their potential in neutralizing SARS-CoV-2. The interactions of these antibodies with the S protein receptor-binding domain (RBD) are compared with those between angiotensin-converting enzyme 2 and RBD complexes. Due to the orders of magnitude in the discrepancies of experimental binding affinities, we introduce topological data analysis, a variety of network models, and deep learning to analyze the binding strength and therapeutic potential of the 14 antibody-antigen complexes. The current COVID-19 antibody clinical trials, which are not limited to the S protein target, are also reviewed.
Collapse
Affiliation(s)
- Jiahui Chen
- Department of Mathematics, Michigan State University, East Lansing, Michigan 48824, USA;
| | - Kaifu Gao
- Department of Mathematics, Michigan State University, East Lansing, Michigan 48824, USA;
| | - Rui Wang
- Department of Mathematics, Michigan State University, East Lansing, Michigan 48824, USA;
| | - Duc Duy Nguyen
- Department of Mathematics, University of Kentucky, Lexington, Kentucky 40506, USA
| | - Guo-Wei Wei
- Department of Mathematics, Michigan State University, East Lansing, Michigan 48824, USA;
- Department of Electrical and Computer Engineering, Michigan State University, East Lansing, Michigan 48824, USA
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, Michigan 48824, USA
| |
Collapse
|
11
|
Wang R, Chen J, Gao K, Hozumi Y, Yin C, Wei GW. Analysis of SARS-CoV-2 mutations in the United States suggests presence of four substrains and novel variants. Commun Biol 2021; 4:228. [PMID: 33589648 PMCID: PMC7884689 DOI: 10.1038/s42003-021-01754-6] [Citation(s) in RCA: 91] [Impact Index Per Article: 30.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2020] [Accepted: 11/13/2020] [Indexed: 02/07/2023] Open
Abstract
SARS-CoV-2 has been mutating since it was first sequenced in early January 2020. Here, we analyze 45,494 complete SARS-CoV-2 geneome sequences in the world to understand their mutations. Among them, 12,754 sequences are from the United States. Our analysis suggests the presence of four substrains and eleven top mutations in the United States. These eleven top mutations belong to 3 disconnected groups. The first and second groups consisting of 5 and 8 concurrent mutations are prevailing, while the other group with three concurrent mutations gradually fades out. Moreover, we reveal that female immune systems are more active than those of males in responding to SARS-CoV-2 infections. One of the top mutations, 27964C > T-(S24L) on ORF8, has an unusually strong gender dependence. Based on the analysis of all mutations on the spike protein, we uncover that two of four SASR-CoV-2 substrains in the United States become potentially more infectious.
Collapse
Affiliation(s)
- Rui Wang
- grid.17088.360000 0001 2150 1785Department of Mathematics, Michigan State University, East Lansing, MI 48824 USA
| | - Jiahui Chen
- grid.17088.360000 0001 2150 1785Department of Mathematics, Michigan State University, East Lansing, MI 48824 USA
| | - Kaifu Gao
- grid.17088.360000 0001 2150 1785Department of Mathematics, Michigan State University, East Lansing, MI 48824 USA
| | - Yuta Hozumi
- grid.17088.360000 0001 2150 1785Department of Mathematics, Michigan State University, East Lansing, MI 48824 USA
| | - Changchuan Yin
- grid.185648.60000 0001 2175 0319Department of Mathematics, Statistics, and Computer Science, University of Illinois at Chicago, Chicago, IL 60607 USA
| | - Guo-Wei Wei
- grid.17088.360000 0001 2150 1785Department of Mathematics, Michigan State University, East Lansing, MI 48824 USA ,grid.17088.360000 0001 2150 1785Department of Electrical and Computer Engineering, Michigan State University, East Lansing, MI 48824 USA ,grid.17088.360000 0001 2150 1785Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, MI 48824 USA
| |
Collapse
|
12
|
Wang R, Chen J, Gao K, Hozumi Y, Yin C, Wei GW. Characterizing SARS-CoV-2 mutations in the United States. RESEARCH SQUARE 2020:rs.3.rs-49671. [PMID: 32818213 PMCID: PMC7430589 DOI: 10.21203/rs.3.rs-49671/v1] [Citation(s) in RCA: 24] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
Abstract
The severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has been mutating since it was first sequenced in early January 2020. The genetic variants have developed into a few distinct clusters with different properties. Since the United States (US) has the highest number of viral infected patients globally, it is essential to understand the US SARS-CoV-2. Using genotyping, sequence-alignment, time-evolution, k-means clustering, protein-folding stability, algebraic topology, and network theory, we reveal that the US SARS-CoV-2 has four substrains and five top US SARS-CoV-2 mutations were first detected in China (2 cases), Singapore (2 cases), and the United Kingdom (1 case). The next three top US SARS-CoV-2 mutations were first detected in the US. These eight top mutations belong to two disconnected groups. The first group consisting of 5 concurrent mutations is prevailing, while the other group with three concurrent mutations gradually fades out. We identify that one of the top mutations, 27964C>T-(S24L) on ORF8, has an unusually strong gender dependence. Based on the analysis of all mutations on the spike protein, we further uncover that three of four US SASR-CoV-2 substrains become more infectious. Our study calls for effective viral control and containing strategies in the US.
Collapse
Affiliation(s)
- Rui Wang
- Department of Mathematics, Michigan State University, MI 48824, USA
| | - Jiahui Chen
- Department of Mathematics, Michigan State University, MI 48824, USA
| | - Kaifu Gao
- Department of Mathematics, Michigan State University, MI 48824, USA
| | - Yuta Hozumi
- Department of Mathematics, Michigan State University, MI 48824, USA
| | - Changchuan Yin
- Department of Mathematics, Statistics, and Computer Science, University of Illinois at Chicago, Chicago, IL 60607, USA
| | - Guo-Wei Wei
- Department of Mathematics, Michigan State University, MI 48824, USA
- Department of Electrical and Computer Engineering, Michigan State University, MI 48824, USA
- Department of Biochemistry and Molecular Biology, Michigan State University, MI 48824, USA
| |
Collapse
|
13
|
Zhao R, Wang M, Chen J, Tong Y, Wei GW. The de Rham-Hodge Analysis and Modeling of Biomolecules. Bull Math Biol 2020; 82:108. [PMID: 32770408 PMCID: PMC8137271 DOI: 10.1007/s11538-020-00783-2] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2019] [Accepted: 07/20/2020] [Indexed: 12/18/2022]
Abstract
Biological macromolecules have intricate structures that underpin their biological functions. Understanding their structure-function relationships remains a challenge due to their structural complexity and functional variability. Although de Rham-Hodge theory, a landmark of twentieth-century mathematics, has had a tremendous impact on mathematics and physics, it has not been devised for macromolecular modeling and analysis. In this work, we introduce de Rham-Hodge theory as a unified paradigm for analyzing the geometry, topology, flexibility, and Hodge mode analysis of biological macromolecules. Geometric characteristics and topological invariants are obtained either from the Helmholtz-Hodge decomposition of the scalar, vector, and/or tensor fields of a macromolecule or from the spectral analysis of various Laplace-de Rham operators defined on the molecular manifolds. We propose Laplace-de Rham spectral-based models for predicting macromolecular flexibility. We further construct a Laplace-de Rham-Helfrich operator for revealing cryo-EM natural frequencies. Extensive experiments are carried out to demonstrate that the proposed de Rham-Hodge paradigm is one of the most versatile tools for the multiscale modeling and analysis of biological macromolecules and subcellular organelles. Accurate, reliable, and topological structure-preserving algorithms for implementing discrete exterior calculus (DEC) have been developed to facilitate the aforementioned modeling and analysis of biological macromolecules. The proposed de Rham-Hodge paradigm has potential applications to subcellular organelles and the structure construction from medium- or low-resolution cryo-EM maps, and functional predictions from massive biomolecular datasets.
Collapse
Affiliation(s)
- Rundong Zhao
- Department of Computer Science and Engineering, Michigan State University, East Lansing, MI, 48824, USA
| | - Menglun Wang
- Department of Mathematics, Michigan State University, East Lansing, MI, 48824, USA
| | - Jiahui Chen
- Department of Mathematics, Michigan State University, East Lansing, MI, 48824, USA
| | - Yiying Tong
- Department of Computer Science and Engineering, Michigan State University, East Lansing, MI, 48824, USA.
| | - Guo-Wei Wei
- Department of Mathematics, Michigan State University, East Lansing, MI, 48824, USA.
- Department of Electrical and Computer Engineering, Michigan State University, East Lansing, MI, 48824, USA.
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, MI, 48824, USA.
| |
Collapse
|
14
|
Abstract
Recently, machine learning (ML) has established itself in various worldwide benchmarking competitions in computational biology, including Critical Assessment of Structure Prediction (CASP) and Drug Design Data Resource (D3R) Grand Challenges. However, the intricate structural complexity and high ML dimensionality of biomolecular datasets obstruct the efficient application of ML algorithms in the field. In addition to data and algorithm, an efficient ML machinery for biomolecular predictions must include structural representation as an indispensable component. Mathematical representations that simplify the biomolecular structural complexity and reduce ML dimensionality have emerged as a prime winner in D3R Grand Challenges. This review is devoted to the recent advances in developing low-dimensional and scalable mathematical representations of biomolecules in our laboratory. We discuss three classes of mathematical approaches, including algebraic topology, differential geometry, and graph theory. We elucidate how the physical and biological challenges have guided the evolution and development of these mathematical apparatuses for massive and diverse biomolecular data. We focus the performance analysis on protein-ligand binding predictions in this review although these methods have had tremendous success in many other applications, such as protein classification, virtual screening, and the predictions of solubility, solvation free energies, toxicity, partition coefficients, protein folding stability changes upon mutation, etc.
Collapse
Affiliation(s)
- Duc Duy Nguyen
- Department of Mathematics, Michigan State University, MI 48824, USA.
| | - Zixuan Cang
- Department of Mathematics, Michigan State University, MI 48824, USA.
| | - Guo-Wei Wei
- Department of Mathematics, Michigan State University, MI 48824, USA. and Department of Biochemistry and Molecular Biology, Michigan State University, MI 48824, USA and Department of Electrical and Computer Engineering, Michigan State University, MI 48824, USA
| |
Collapse
|
15
|
Nguyen DD, Gao K, Wang M, Wei GW. MathDL: mathematical deep learning for D3R Grand Challenge 4. J Comput Aided Mol Des 2020; 34:131-147. [PMID: 31734815 PMCID: PMC7376411 DOI: 10.1007/s10822-019-00237-5] [Citation(s) in RCA: 45] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2019] [Accepted: 10/14/2019] [Indexed: 12/17/2022]
Abstract
We present the performances of our mathematical deep learning (MathDL) models for D3R Grand Challenge 4 (GC4). This challenge involves pose prediction, affinity ranking, and free energy estimation for beta secretase 1 (BACE) as well as affinity ranking and free energy estimation for Cathepsin S (CatS). We have developed advanced mathematics, namely differential geometry, algebraic graph, and/or algebraic topology, to accurately and efficiently encode high dimensional physical/chemical interactions into scalable low-dimensional rotational and translational invariant representations. These representations are integrated with deep learning models, such as generative adversarial networks (GAN) and convolutional neural networks (CNN) for pose prediction and energy evaluation, respectively. Overall, our MathDL models achieved the top place in pose prediction for BACE ligands in Stage 1a. Moreover, our submissions obtained the highest Spearman correlation coefficient on the affinity ranking of 460 CatS compounds, and the smallest centered root mean square error on the free energy set of 39 CatS molecules. It is worthy to mention that our method on docking pose predictions has significantly improved from our previous ones.
Collapse
Affiliation(s)
- Duc Duy Nguyen
- Department of Mathematics, Michigan State University, East Lansing, MI, 48824, USA
| | - Kaifu Gao
- Department of Mathematics, Michigan State University, East Lansing, MI, 48824, USA
| | - Menglun Wang
- Department of Mathematics, Michigan State University, East Lansing, MI, 48824, USA
| | - Guo-Wei Wei
- Department of Mathematics, Michigan State University, East Lansing, MI, 48824, USA.
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, MI, 48824, USA.
- Department of Electrical and Computer Engineering, Michigan State University, East Lansing, MI, 48824, USA.
| |
Collapse
|
16
|
Nguyen DD, Wei GW. AGL-Score: Algebraic Graph Learning Score for Protein-Ligand Binding Scoring, Ranking, Docking, and Screening. J Chem Inf Model 2019; 59:3291-3304. [PMID: 31257871 PMCID: PMC6664294 DOI: 10.1021/acs.jcim.9b00334] [Citation(s) in RCA: 121] [Impact Index Per Article: 24.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Although algebraic graph theory-based models have been widely applied in physical modeling and molecular studies, they are typically incompetent in the analysis and prediction of biomolecular properties, confirming the common belief that "one cannot hear the shape of a drum". A new development in the century-old question about the spectrum-geometry relationship is provided. Novel algebraic graph learning score (AGL-Score) models are proposed to encode high-dimensional physical and biological information into intrinsically low-dimensional representations. The proposed AGL-Score models employ multiscale weighted colored subgraphs to describe crucial molecular and biomolecular interactions in terms of graph invariants derived from graph Laplacian, its pseudo-inverse, and adjacency matrices. Additionally, AGL-Score models are integrated with an advanced machine learning algorithm to predict biomolecular macroscopic properties from the low-dimensional graph representation of biomolecular structures. The proposed AGL-Score models are extensively validated for their scoring power, ranking power, docking power, and screening power via a number of benchmark datasets, namely CASF-2007, CASF-2013, and CASF-2016. Numerical results indicate that the proposed AGL-Score models are able to outperform other state-of-the-art scoring functions in protein-ligand binding scoring, ranking, docking, and screening. This study indicates that machine learning methods are powerful tools for molecular docking and virtual screening. It also indicates that spectral geometry or spectral graph theory has the ability to infer geometric properties.
Collapse
Affiliation(s)
- Duc Duy Nguyen
- Department of Mathematics , Michigan State University , East Lansing , Michigan 48824 , United States
| | - Guo-Wei Wei
- Department of Mathematics , Michigan State University , East Lansing , Michigan 48824 , United States
- Department of Biochemistry and Molecular Biology Michigan State University , East Lansing , Michigan 48824 , United States
- Department of Electrical and Computer Engineering Michigan State University , East Lansing , Michigan 48824 , United States
| |
Collapse
|
17
|
Nguyen DD, Wei GW. DG-GL: Differential geometry-based geometric learning of molecular datasets. INTERNATIONAL JOURNAL FOR NUMERICAL METHODS IN BIOMEDICAL ENGINEERING 2019; 35:e3179. [PMID: 30693661 PMCID: PMC6598676 DOI: 10.1002/cnm.3179] [Citation(s) in RCA: 33] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/25/2018] [Revised: 11/21/2018] [Accepted: 12/06/2018] [Indexed: 05/11/2023]
Abstract
MOTIVATION Despite its great success in various physical modeling, differential geometry (DG) has rarely been devised as a versatile tool for analyzing large, diverse, and complex molecular and biomolecular datasets because of the limited understanding of its potential power in dimensionality reduction and its ability to encode essential chemical and biological information in differentiable manifolds. RESULTS We put forward a differential geometry-based geometric learning (DG-GL) hypothesis that the intrinsic physics of three-dimensional (3D) molecular structures lies on a family of low-dimensional manifolds embedded in a high-dimensional data space. We encode crucial chemical, physical, and biological information into 2D element interactive manifolds, extracted from a high-dimensional structural data space via a multiscale discrete-to-continuum mapping using differentiable density estimators. Differential geometry apparatuses are utilized to construct element interactive curvatures in analytical forms for certain analytically differentiable density estimators. These low-dimensional differential geometry representations are paired with a robust machine learning algorithm to showcase their descriptive and predictive powers for large, diverse, and complex molecular and biomolecular datasets. Extensive numerical experiments are carried out to demonstrate that the proposed DG-GL strategy outperforms other advanced methods in the predictions of drug discovery-related protein-ligand binding affinity, drug toxicity, and molecular solvation free energy. AVAILABILITY AND IMPLEMENTATION http://weilab.math.msu.edu/DG-GL/ Contact: wei@math.msu.edu.
Collapse
Affiliation(s)
- Duc Duy Nguyen
- Department of Mathematics, Michigan State University, East Lansing, 48824, Michigan
| | - Guo-Wei Wei
- Department of Mathematics, Michigan State University, East Lansing, 48824, Michigan
- Department of Electrical and Computer Engineering, Michigan State University, MI 48824, USA
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, MI 48824, Michigan
| |
Collapse
|
18
|
Sun Z, Liu Q, Qu G, Feng Y, Reetz MT. Utility of B-Factors in Protein Science: Interpreting Rigidity, Flexibility, and Internal Motion and Engineering Thermostability. Chem Rev 2019; 119:1626-1665. [PMID: 30698416 DOI: 10.1021/acs.chemrev.8b00290] [Citation(s) in RCA: 280] [Impact Index Per Article: 56.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Affiliation(s)
- Zhoutong Sun
- Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, 32 West Seventh Avenue, Tianjin Airport Economic Area, Tianjin 300308, China
| | - Qian Liu
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Ge Qu
- Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, 32 West Seventh Avenue, Tianjin Airport Economic Area, Tianjin 300308, China
| | - Yan Feng
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Manfred T. Reetz
- Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, 32 West Seventh Avenue, Tianjin Airport Economic Area, Tianjin 300308, China
- Max-Planck-Institut für Kohlenforschung, Kaiser-Wilhelm-Platz 1, 45470 Mülheim an der Ruhr, Germany
- Chemistry Department, Philipps-University, Hans-Meerwein-Strasse 4, 35032 Marburg, Germany
| |
Collapse
|
19
|
Abstract
The Debye-Waller factor, a measure of X-ray attenuation, can be experimentally observed in protein X-ray crystallography. Previous theoretical models have made strong inroads in the analysis of beta (B)-factors by linearly fitting protein B-factors from experimental data. However, the blind prediction of B-factors for unknown proteins is an unsolved problem. This work integrates machine learning and advanced graph theory, namely, multiscale weighted colored graphs (MWCGs), to blindly predict B-factors of unknown proteins. MWCGs are local features that measure the intrinsic flexibility due to a protein structure. Global features that connect the B-factors of different proteins, e.g., the resolution of X-ray crystallography, are introduced to enable the cross-protein B-factor predictions. Several machine learning approaches, including ensemble methods and deep learning, are considered in the present work. The proposed method is validated with hundreds of thousands of experimental B-factors. Extensive numerical results indicate that the blind B-factor predictions obtained from the present method are more accurate than the least squares fittings using traditional methods.
Collapse
Affiliation(s)
- David Bramer
- Department of Mathematics, Michigan State University, East Lansing, Michigan 48824, USA
| | - Guo-Wei Wei
- Department of Mathematics, Michigan State University, East Lansing, Michigan 48824, USA
| |
Collapse
|
20
|
Agajanian S, Odeyemi O, Bischoff N, Ratra S, Verkhivker GM. Machine Learning Classification and Structure–Functional Analysis of Cancer Mutations Reveal Unique Dynamic and Network Signatures of Driver Sites in Oncogenes and Tumor Suppressor Genes. J Chem Inf Model 2018; 58:2131-2150. [DOI: 10.1021/acs.jcim.8b00414] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
Affiliation(s)
- Steve Agajanian
- Graduate Program in Computational and Data Sciences, Department of Computational Sciences, Schmid College of Science and Technology, Chapman University, One University
Drive, Orange, California 92866, United States
| | - Oluyemi Odeyemi
- Graduate Program in Computational and Data Sciences, Department of Computational Sciences, Schmid College of Science and Technology, Chapman University, One University
Drive, Orange, California 92866, United States
| | - Nathaniel Bischoff
- Graduate Program in Computational and Data Sciences, Department of Computational Sciences, Schmid College of Science and Technology, Chapman University, One University
Drive, Orange, California 92866, United States
| | - Simrath Ratra
- Graduate Program in Computational and Data Sciences, Department of Computational Sciences, Schmid College of Science and Technology, Chapman University, One University
Drive, Orange, California 92866, United States
| | - Gennady M. Verkhivker
- Graduate Program in Computational and Data Sciences, Department of Computational Sciences, Schmid College of Science and Technology, Chapman University, One University
Drive, Orange, California 92866, United States
- Chapman University, School of Pharmacy, Irvine, California 92618, United States
| |
Collapse
|
21
|
Bramer D, Wei GW. Multiscale weighted colored graphs for protein flexibility and rigidity analysis. J Chem Phys 2018; 148:054103. [PMID: 29421884 DOI: 10.1063/1.5016562] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023] Open
Abstract
Protein structural fluctuation, measured by Debye-Waller factors or B-factors, is known to correlate to protein flexibility and function. A variety of methods has been developed for protein Debye-Waller factor prediction and related applications to domain separation, docking pose ranking, entropy calculation, hinge detection, stability analysis, etc. Nevertheless, none of the current methodologies are able to deliver an accuracy of 0.7 in terms of the Pearson correlation coefficients averaged over a large set of proteins. In this work, we introduce a paradigm-shifting geometric graph model, multiscale weighted colored graph (MWCG), to provide a new generation of computational algorithms to significantly change the current status of protein structural fluctuation analysis. Our MWCG model divides a protein graph into multiple subgraphs based on interaction types between graph nodes and represents the protein rigidity by generalized centralities of subgraphs. MWCGs not only predict the B-factors of protein residues but also accurately analyze the flexibility of all atoms in a protein. The MWCG model is validated over a number of protein test sets and compared with many standard methods. An extensive numerical study indicates that the proposed MWCG offers an accuracy of over 0.8 and thus provides perhaps the first reliable method for estimating protein flexibility and B-factors. It also simultaneously predicts all-atom flexibility in a molecule.
Collapse
Affiliation(s)
- David Bramer
- Department of Mathematics, Michigan State University, East Lansing, Michigan 48824, USA
| | - Guo-Wei Wei
- Department of Mathematics, Michigan State University, East Lansing, Michigan 48824, USA
| |
Collapse
|
22
|
Carugo O. Atomic displacement parameters in structural biology. Amino Acids 2018; 50:775-786. [DOI: 10.1007/s00726-018-2574-y] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2017] [Accepted: 04/19/2018] [Indexed: 01/14/2023]
|
23
|
Stetz G, Verkhivker GM. Functional Role and Hierarchy of the Intermolecular Interactions in Binding of Protein Kinase Clients to the Hsp90–Cdc37 Chaperone: Structure-Based Network Modeling of Allosteric Regulation. J Chem Inf Model 2018; 58:405-421. [DOI: 10.1021/acs.jcim.7b00638] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Affiliation(s)
- Gabrielle Stetz
- Graduate Program
in Computational and Data Sciences, Department of Computational Sciences,
Schmid College of Science and Technology, Chapman University, One University Drive, Orange, California 92866, United States
| | - Gennady M. Verkhivker
- Graduate Program
in Computational and Data Sciences, Department of Computational Sciences,
Schmid College of Science and Technology, Chapman University, One University Drive, Orange, California 92866, United States
- Chapman University School of Pharmacy, Irvine, California 92618, United States
| |
Collapse
|
24
|
Xia K. Multiscale virtual particle based elastic network model (MVP-ENM) for normal mode analysis of large-sized biomolecules. Phys Chem Chem Phys 2018; 20:658-669. [PMID: 29227479 DOI: 10.1039/c7cp07177a] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023]
Abstract
In this paper, a multiscale virtual particle based elastic network model (MVP-ENM) is proposed for the normal mode analysis of large-sized biomolecules. The multiscale virtual particle (MVP) model is proposed for the discretization of biomolecular density data. With this model, large-sized biomolecular structures can be coarse-grained into virtual particles such that a balance between model accuracy and computational cost can be achieved. An elastic network is constructed by assuming "connections" between virtual particles. The connection is described by a special harmonic potential function, which considers the influence from both the mass distributions and distance relations of the virtual particles. Two independent models, i.e., the multiscale virtual particle based Gaussian network model (MVP-GNM) and the multiscale virtual particle based anisotropic network model (MVP-ANM), are proposed. It has been found that in the Debye-Waller factor (B-factor) prediction, the results from our MVP-GNM with a high resolution are as good as the ones from GNM. Even with low resolutions, our MVP-GNM can still capture the global behavior of the B-factor very well with mismatches predominantly from the regions with large B-factor values. Further, it has been demonstrated that the low-frequency eigenmodes from our MVP-ANM are highly consistent with the ones from ANM even with very low resolutions and a coarse grid. Finally, the great advantage of MVP-ANM model for large-sized biomolecules has been demonstrated by using two poliovirus virus structures. The paper ends with a conclusion.
Collapse
Affiliation(s)
- Kelin Xia
- Division of Mathematical Sciences, School of Physical and Mathematical Sciences, Nanyang Technological University, Singapore 637371.
| |
Collapse
|
25
|
Multiscale Persistent Functions for Biomolecular Structure Characterization. Bull Math Biol 2017; 80:1-31. [PMID: 29098540 DOI: 10.1007/s11538-017-0362-6] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2016] [Accepted: 10/19/2017] [Indexed: 10/18/2022]
Abstract
In this paper, we introduce multiscale persistent functions for biomolecular structure characterization. The essential idea is to combine our multiscale rigidity functions (MRFs) with persistent homology analysis, so as to construct a series of multiscale persistent functions, particularly multiscale persistent entropies, for structure characterization. To clarify the fundamental idea of our method, the multiscale persistent entropy (MPE) model is discussed in great detail. Mathematically, unlike the previous persistent entropy (Chintakunta et al. in Pattern Recognit 48(2):391-401, 2015; Merelli et al. in Entropy 17(10):6872-6892, 2015; Rucco et al. in: Proceedings of ECCS 2014, Springer, pp 117-128, 2016), a special resolution parameter is incorporated into our model. Various scales can be achieved by tuning its value. Physically, our MPE can be used in conformational entropy evaluation. More specifically, it is found that our method incorporates in it a natural classification scheme. This is achieved through a density filtration of an MRF built from angular distributions. To further validate our model, a systematical comparison with the traditional entropy evaluation model is done. It is found that our model is able to preserve the intrinsic topological features of biomolecular data much better than traditional approaches, particularly for resolutions in the intermediate range. Moreover, by comparing with traditional entropies from various grid sizes, bond angle-based methods and a persistent homology-based support vector machine method (Cang et al. in Mol Based Math Biol 3:140-162, 2015), we find that our MPE method gives the best results in terms of average true positive rate in a classic protein structure classification test. More interestingly, all-alpha and all-beta protein classes can be clearly separated from each other with zero error only in our model. Finally, a special protein structure index (PSI) is proposed, for the first time, to describe the "regularity" of protein structures. Basically, a protein structure is deemed as regular if it has a consistent and orderly configuration. Our PSI model is tested on a database of 110 proteins; we find that structures with larger portions of loops and intrinsically disorder regions are always associated with larger PSI, meaning an irregular configuration, while proteins with larger portions of secondary structures, i.e., alpha-helix or beta-sheet, have smaller PSI. Essentially, PSI can be used to describe the "regularity" information in any systems.
Collapse
|