1
|
Pun CS, Lee SX, Xia K. Persistent-homology-based machine learning: a survey and a comparative study. Artif Intell Rev 2022. [DOI: 10.1007/s10462-022-10146-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
2
|
Liu J, Chen D, Li J, Wu J. Neighborhood hypergraph model for topological data analysis. COMPUTATIONAL AND MATHEMATICAL BIOPHYSICS 2022. [DOI: 10.1515/cmb-2022-0142] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open
Abstract
Abstract
Hypergraph, as a generalization of the notions of graph and simplicial complex, has gained a lot of attention in many fields. It is a relatively new mathematical model to describe the high-dimensional structure and geometric shapes of data sets. In this paper,we introduce the neighborhood hypergraph model for graphs and combine the neighborhood hypergraph model with the persistent (embedded) homology of hypergraphs. Given a graph,we can obtain a neighborhood complex introduced by L. Lovász and a filtration of hypergraphs parameterized by aweight function on the power set of the vertex set of the graph. Theweight function can be obtained by the construction fromthe geometric structure of graphs or theweights on the vertices of the graph. We show the persistent theory of such filtrations of hypergraphs. One typical application of the persistent neighborhood hypergraph is to distinguish the planar square structure of cisplatin and transplatin. Another application of persistent neighborhood hypergraph is to describe the structure of small fullerenes such as C20. The bond length and the number of adjacent carbon atoms of a carbon atom can be derived from the persistence diagram. Moreover, our method gives a highly matched stability prediction (with a correlation coefficient 0.9976) of small fullerene molecules.
Collapse
Affiliation(s)
- Jian Liu
- School of Mathematical Sciences , Hebei Normal University , , China , Yanqi Lake Beijing Institute of Mathematical Sciences and Applications , , China
| | - Dong Chen
- School of Advanced Materials, Peking University , Shenzhen Graduate School , Shenzhen , China , Department of Mathematics , Michigan State University , , USA Yanqi Lake Beijing Institute of Mathematical Sciences and Applications , , China
| | - Jingyan Li
- Yanqi Lake Beijing Institute of Mathematical Sciences and Applications , , China
| | - Jie Wu
- Yanqi Lake Beijing Institute of Mathematical Sciences and Applications , , China
| |
Collapse
|
3
|
Li S, Liu Y, Chen D, Jiang Y, Nie Z, Pan F. Encoding the atomic structure for machine learning in materials science. WIRES COMPUTATIONAL MOLECULAR SCIENCE 2022. [DOI: 10.1002/wcms.1558] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Affiliation(s)
- Shunning Li
- School of Advanced Materials Peking University, Shenzhen Graduate School Shenzhen China
| | - Yuanji Liu
- School of Advanced Materials Peking University, Shenzhen Graduate School Shenzhen China
| | - Dong Chen
- School of Advanced Materials Peking University, Shenzhen Graduate School Shenzhen China
| | - Yi Jiang
- School of Advanced Materials Peking University, Shenzhen Graduate School Shenzhen China
| | - Zhiwei Nie
- School of Advanced Materials Peking University, Shenzhen Graduate School Shenzhen China
| | - Feng Pan
- School of Advanced Materials Peking University, Shenzhen Graduate School Shenzhen China
| |
Collapse
|
4
|
Stenseke J. Persistent homology and the shape of evolutionary games. J Theor Biol 2021; 531:110903. [PMID: 34534569 DOI: 10.1016/j.jtbi.2021.110903] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2021] [Revised: 09/08/2021] [Accepted: 09/09/2021] [Indexed: 11/17/2022]
Abstract
For nearly three decades, spatial games have produced a wealth of insights to the study of behavior and its relation to population structure. However, as different rules and factors are added or altered, the dynamics of spatial models often become increasingly complicated to interpret. To tackle this problem, we introduce persistent homology as a rigorous framework that can be used to both define and compute higher-order features of data in a manner which is invariant to parameter choices, robust to noise, and independent of human observation. Our work demonstrates its relevance for spatial games by showing how topological features of simulation data that persist over different spatial scales reflect the stability of strategies in 2D lattice games. To do so, we analyze the persistent homology of scenarios from two games: a Prisoner's Dilemma and a SIRS epidemic model. The experimental results show how the method accurately detects features that correspond to real aspects of the game dynamics. Unlike other tools that study dynamics of spatial systems, persistent homology can tell us something meaningful about population structure while remaining neutral about the underlying structure itself. Regardless of game complexity, since strategies either succeed or fail to conform to shapes of a certain topology there is much potential for the method to provide novel insights for a wide variety of spatially extended systems in biology, social science, and physics.
Collapse
Affiliation(s)
- Jakob Stenseke
- Department of Philosophy, Lund University, Helgonavagen 3, Lund 221 00, Sweden.
| |
Collapse
|
5
|
Chen J, Zhao R, Tong Y, Wei GW. EVOLUTIONARY DE RHAM-HODGE METHOD. DISCRETE AND CONTINUOUS DYNAMICAL SYSTEMS. SERIES B 2021; 26:3785-3821. [PMID: 34675756 DOI: 10.3934/dcdsb.2020257] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
Abstract
The de Rham-Hodge theory is a landmark of the 20th Century's mathematics and has had a great impact on mathematics, physics, computer science, and engineering. This work introduces an evolutionary de Rham-Hodge method to provide a unified paradigm for the multiscale geometric and topological analysis of evolving manifolds constructed from a filtration, which induces a family of evolutionary de Rham complexes. While the present method can be easily applied to close manifolds, the emphasis is given to more challenging compact manifolds with 2-manifold boundaries, which require appropriate analysis and treatment of boundary conditions on differential forms to maintain proper topological properties. Three sets of unique evolutionary Hodge Laplacians are proposed to generate three sets of topology-preserving singular spectra, for which the multiplicities of zero eigenvalues correspond to exactly the persistent Betti numbers of dimensions 0, 1 and 2. Additionally, three sets of non-zero eigenvalues further reveal both topological persistence and geometric progression during the manifold evolution. Extensive numerical experiments are carried out via the discrete exterior calculus to demonstrate the potential of the proposed paradigm for data representation and shape analysis of both point cloud data and density maps. To demonstrate the utility of the proposed method, the application is considered to the protein B-factor predictions of a few challenging cases for which existing biophysical models break down.
Collapse
Affiliation(s)
- Jiahui Chen
- Department of Mathematics, Michigan State University, MI 48824, USA
| | - Rundong Zhao
- Department of Computer Science and Engineering, Michigan State University, MI 48824, USA
| | - Yiying Tong
- Department of Computer Science and Engineering, Michigan State University, MI 48824, USA
| | - Guo-Wei Wei
- Department of Mathematics, Michigan State University, MI 48824, USA
| |
Collapse
|
6
|
Padellini T, Brutti P. Supervised learning with indefinite topological Kernels. STATISTICS-ABINGDON 2021. [DOI: 10.1080/02331888.2021.1976777] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
Affiliation(s)
- Tullia Padellini
- Dipartimento di Scienze Statistiche, Sapienza Università di Roma, Rome, Italy
| | - Pierpaolo Brutti
- Dipartimento di Scienze Statistiche, Sapienza Università di Roma, Rome, Italy
| |
Collapse
|
7
|
Yen PTW, Xia K, Cheong SA. Understanding Changes in the Topology and Geometry of Financial Market Correlations during a Market Crash. ENTROPY (BASEL, SWITZERLAND) 2021; 23:1211. [PMID: 34573837 PMCID: PMC8467365 DOI: 10.3390/e23091211] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/19/2021] [Revised: 09/05/2021] [Accepted: 09/06/2021] [Indexed: 12/24/2022]
Abstract
In econophysics, the achievements of information filtering methods over the past 20 years, such as the minimal spanning tree (MST) by Mantegna and the planar maximally filtered graph (PMFG) by Tumminello et al., should be celebrated. Here, we show how one can systematically improve upon this paradigm along two separate directions. First, we used topological data analysis (TDA) to extend the notions of nodes and links in networks to faces, tetrahedrons, or k-simplices in simplicial complexes. Second, we used the Ollivier-Ricci curvature (ORC) to acquire geometric information that cannot be provided by simple information filtering. In this sense, MSTs and PMFGs are but first steps to revealing the topological backbones of financial networks. This is something that TDA can elucidate more fully, following which the ORC can help us flesh out the geometry of financial networks. We applied these two approaches to a recent stock market crash in Taiwan and found that, beyond fusions and fissions, other non-fusion/fission processes such as cavitation, annihilation, rupture, healing, and puncture might also be important. We also successfully identified neck regions that emerged during the crash, based on their negative ORCs, and performed a case study on one such neck region.
Collapse
Affiliation(s)
- Peter Tsung-Wen Yen
- Center for Crystal Researches, National Sun Yet-Sen University, No. 70, Lien-hai Rd., Kaohsiung 80424, Taiwan;
| | - Kelin Xia
- Division of Mathematical Sciences, School of Physical and Mathematical Sciences, Nanyang Technological University, 21 Nanyang Link, Singapore 637371, Singapore;
| | - Siew Ann Cheong
- Division of Physics and Applied Physics, School of Physical and Mathematical Sciences, Nanyang Technological University, 21 Nanyang Link, Singapore 637371, Singapore
| |
Collapse
|
8
|
Terebus A, Manuchehrfar F, Cao Y, Liang J. Exact Probability Landscapes of Stochastic Phenotype Switching in Feed-Forward Loops: Phase Diagrams of Multimodality. Front Genet 2021; 12:645640. [PMID: 34306004 PMCID: PMC8297706 DOI: 10.3389/fgene.2021.645640] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2020] [Accepted: 04/26/2021] [Indexed: 11/13/2022] Open
Abstract
Feed-forward loops (FFLs) are among the most ubiquitously found motifs of reaction networks in nature. However, little is known about their stochastic behavior and the variety of network phenotypes they can exhibit. In this study, we provide full characterizations of the properties of stochastic multimodality of FFLs, and how switching between different network phenotypes are controlled. We have computed the exact steady-state probability landscapes of all eight types of coherent and incoherent FFLs using the finite-butter Accurate Chemical Master Equation (ACME) algorithm, and quantified the exact topological features of their high-dimensional probability landscapes using persistent homology. Through analysis of the degree of multimodality for each of a set of 10,812 probability landscapes, where each landscape resides over 105–106 microstates, we have constructed comprehensive phase diagrams of all relevant behavior of FFL multimodality over broad ranges of input and regulation intensities, as well as different regimes of promoter binding dynamics. In addition, we have quantified the topological sensitivity of the multimodality of the landscapes to regulation intensities. Our results show that with slow binding and unbinding dynamics of transcription factor to promoter, FFLs exhibit strong stochastic behavior that is very different from what would be inferred from deterministic models. In addition, input intensity play major roles in the phenotypes of FFLs: At weak input intensity, FFL exhibit monomodality, but strong input intensity may result in up to 6 stable phenotypes. Furthermore, we found that gene duplication can enlarge stable regions of specific multimodalities and enrich the phenotypic diversity of FFL networks, providing means for cells toward better adaptation to changing environment. Our results are directly applicable to analysis of behavior of FFLs in biological processes such as stem cell differentiation and for design of synthetic networks when certain phenotypic behavior is desired.
Collapse
Affiliation(s)
- Anna Terebus
- Center for Bioinformatics and Quantitative Biology, Richard and Loan Hill Department of Bioengineering, University of Illinois at Chicago, Chicago, IL, United States.,Constellation, Baltimore, MD, United States
| | - Farid Manuchehrfar
- Center for Bioinformatics and Quantitative Biology, Richard and Loan Hill Department of Bioengineering, University of Illinois at Chicago, Chicago, IL, United States
| | - Youfang Cao
- Center for Bioinformatics and Quantitative Biology, Richard and Loan Hill Department of Bioengineering, University of Illinois at Chicago, Chicago, IL, United States.,Merck & Co., Inc., Kenilworth, NJ, United States
| | - Jie Liang
- Center for Bioinformatics and Quantitative Biology, Richard and Loan Hill Department of Bioengineering, University of Illinois at Chicago, Chicago, IL, United States
| |
Collapse
|
9
|
Manuchehrfar F, Li H, Tian W, Ma A, Liang J. Exact Topology of the Dynamic Probability Surface of an Activated Process by Persistent Homology. J Phys Chem B 2021; 125:4667-4680. [PMID: 33938737 DOI: 10.1021/acs.jpcb.1c00904] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
To gain insight into the reaction mechanism of activated processes, we introduce an exact approach for quantifying the topology of high-dimensional probability surfaces of the underlying dynamic processes. Instead of Morse indexes, we study the homology groups of a sequence of superlevel sets of the probability surface over high-dimensional configuration spaces using persistent homology. For alanine-dipeptide isomerization, a prototype of activated processes, we identify locations of probability peaks and connecting ridges, along with measures of their global prominence. Instead of a saddle point, the transition state ensemble (TSE) of conformations is at the most prominent probability peak after reactants/products, when proper reaction coordinates are included. Intuition-based models, even those exhibiting a double-well, fail to capture the dynamics of the activated process. Peak occurrence, prominence, and locations can be distorted upon subspace projection. While principal component analysis accounts for conformational variance, it inflates the complexity of the surface topology and destroys the dynamic properties of the topological features. In contrast, TSE emerges naturally as the most prominent peak beyond the reactant/product basins, when projected to a subspace of minimum dimension containing the reaction coordinates. Our approach is general and can be applied to investigate the topology of high-dimensional probability surfaces of other activated processes.
Collapse
Affiliation(s)
- Farid Manuchehrfar
- Center for Bioinformatics and Quantiative Biology and Department of Bioengneering, University of Illinois at Chicago, Chicago, Illinois 60607, United States
| | - Huiyu Li
- Center for Bioinformatics and Quantiative Biology and Department of Bioengneering, University of Illinois at Chicago, Chicago, Illinois 60607, United States
| | - Wei Tian
- Center for Bioinformatics and Quantiative Biology and Department of Bioengneering, University of Illinois at Chicago, Chicago, Illinois 60607, United States
| | - Ao Ma
- Center for Bioinformatics and Quantiative Biology and Department of Bioengneering, University of Illinois at Chicago, Chicago, Illinois 60607, United States
| | - Jie Liang
- Center for Bioinformatics and Quantiative Biology and Department of Bioengneering, University of Illinois at Chicago, Chicago, Illinois 60607, United States
| |
Collapse
|
10
|
Ormrod Morley D, Salmon PS, Wilson M. Persistent homology in two-dimensional atomic networks. J Chem Phys 2021; 154:124109. [PMID: 33810685 DOI: 10.1063/5.0040393] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
The topology of two-dimensional network materials is investigated by persistent homology analysis. The constraint of two dimensions allows for a direct comparison of key persistent homology metrics (persistence diagrams, cycles, and Betti numbers) with more traditional metrics such as the ring-size distributions. Two different types of networks are employed in which the topology is manipulated systematically. In the first, comparatively rigid networks are generated for a triangle-raft model, which are representative of materials such as silica bilayers. In the second, more flexible networks are generated using a bond-switching algorithm, which are representative of materials such as graphene. Bands are identified in the persistence diagrams by reference to the length scales associated with distorted polygons. The triangle-raft models with the largest ordering allow specific bands Bn (n = 1, 2, 3, …) to be allocated to configurations of atoms separated by n bonds. The persistence diagrams for the more disordered network models also display bands albeit less pronounced. The persistent homology method thereby provides information on n-body correlations that is not accessible from structure factors or radial distribution functions. An analysis of the persistent cycles gives the primitive ring statistics, provided the level of disorder is not too large. The method also gives information on the regularity of rings that is unavailable from a ring-statistics analysis. The utility of the persistent homology method is demonstrated by its application to experimentally-obtained configurations of silica bilayers and graphene.
Collapse
Affiliation(s)
- David Ormrod Morley
- Department of Chemistry, Physical and Theoretical Chemistry Laboratory, University of Oxford, South Parks Road, Oxford OX1 3QZ, United Kingdom
| | - Philip S Salmon
- Department of Physics, University of Bath, Bath BA2 7AY, United Kingdom
| | - Mark Wilson
- Department of Chemistry, Physical and Theoretical Chemistry Laboratory, University of Oxford, South Parks Road, Oxford OX1 3QZ, United Kingdom
| |
Collapse
|
11
|
Mirth J, Zhai Y, Bush J, Alvarado EG, Jordan H, Heim M, Krishnamoorthy B, Pflaum M, Clark A, Z Y, Adams H. Representations of energy landscapes by sublevelset persistent homology: An example with n-alkanes. J Chem Phys 2021; 154:114114. [PMID: 33752361 DOI: 10.1063/5.0036747] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
Encoding the complex features of an energy landscape is a challenging task, and often, chemists pursue the most salient features (minima and barriers) along a highly reduced space, i.e., two- or three-dimensions. Even though disconnectivity graphs or merge trees summarize the connectivity of the local minima of an energy landscape via the lowest-barrier pathways, there is much information to be gained by also considering the topology of each connected component at different energy thresholds (or sublevelsets). We propose sublevelset persistent homology as an appropriate tool for this purpose. Our computations on the configuration phase space of n-alkanes from butane to octane allow us to conjecture, and then prove, a complete characterization of the sublevelset persistent homology of the alkane CmH2m+2 Potential Energy Landscapes (PELs), for all m, in all homological dimensions. We further compare both the analytical configurational PELs and sampled data from molecular dynamics simulation using the united and all-atom descriptions of the intramolecular interactions. In turn, this supports the application of distance metrics to quantify sampling fidelity and lays the foundation for future work regarding new metrics that quantify differences between the topological features of high-dimensional energy landscapes.
Collapse
Affiliation(s)
- Joshua Mirth
- Department of Mathematics, Colorado State University, Fort Collins, Colorado 80524, USA
| | - Yanqin Zhai
- Department of Nuclear, Plasma, and Radiological Engineering, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, USA
| | - Johnathan Bush
- Department of Mathematics, Colorado State University, Fort Collins, Colorado 80524, USA
| | - Enrique G Alvarado
- Department of Mathematics and Statistics, Washington State University, Pullman, Washington 99164, USA
| | - Howie Jordan
- Department of Mathematics, University of Colorado, Boulder, Colorado 80309, USA
| | - Mark Heim
- Department of Mathematics, Colorado State University, Fort Collins, Colorado 80524, USA
| | - Bala Krishnamoorthy
- Department of Mathematics and Statistics, Washington State University, Vancouver, Washington 98686, USA
| | - Markus Pflaum
- Department of Mathematics, University of Colorado, Boulder, Colorado 80309, USA
| | - Aurora Clark
- Department of Chemistry, Washington State University, Pullman, Washington 99164, USA
| | - Y Z
- Department of Nuclear, Plasma, and Radiological Engineering, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, USA
| | - Henry Adams
- Department of Mathematics, Colorado State University, Fort Collins, Colorado 80524, USA
| |
Collapse
|
12
|
|
13
|
Wang R, Nguyen DD, Wei GW. Persistent spectral graph. INTERNATIONAL JOURNAL FOR NUMERICAL METHODS IN BIOMEDICAL ENGINEERING 2020; 36:e3376. [PMID: 32515170 PMCID: PMC7719081 DOI: 10.1002/cnm.3376] [Citation(s) in RCA: 39] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/29/2020] [Revised: 05/15/2020] [Accepted: 05/31/2020] [Indexed: 05/25/2023]
Abstract
Persistent homology is constrained to purely topological persistence, while multiscale graphs account only for geometric information. This work introduces persistent spectral theory to create a unified low-dimensional multiscale paradigm for revealing topological persistence and extracting geometric shapes from high-dimensional datasets. For a point-cloud dataset, a filtration procedure is used to generate a sequence of chain complexes and associated families of simplicial complexes and chains, from which we construct persistent combinatorial Laplacian matrices. We show that a full set of topological persistence can be completely recovered from the harmonic persistent spectra, that is, the spectra that have zero eigenvalues, of the persistent combinatorial Laplacian matrices. However, non-harmonic spectra of the Laplacian matrices induced by the filtration offer another powerful tool for data analysis, modeling, and prediction. In this work, fullerene stability is predicted by using both harmonic spectra and non-harmonic persistent spectra, while the latter spectra are successfully devised to analyze the structure of fullerenes and model protein flexibility, which cannot be straightforwardly extracted from the current persistent homology. The proposed method is found to provide excellent predictions of the protein B-factors for which current popular biophysical models break down.
Collapse
Affiliation(s)
- Rui Wang
- Department of Mathematics, Michigan State University, MI 48824, USA
| | - Duc Duy Nguyen
- Department of Mathematics, Michigan State University, MI 48824, USA
| | - Guo-Wei Wei
- Department of Mathematics, Michigan State University, MI 48824, USA
- Department of Biochemistry and Molecular Biology, Michigan State University, MI 48824, USA
- Department of Electrical and Computer Engineering, Michigan State University, MI 48824, USA
| |
Collapse
|
14
|
Zhao R, Wang M, Chen J, Tong Y, Wei GW. The de Rham-Hodge Analysis and Modeling of Biomolecules. Bull Math Biol 2020; 82:108. [PMID: 32770408 PMCID: PMC8137271 DOI: 10.1007/s11538-020-00783-2] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2019] [Accepted: 07/20/2020] [Indexed: 12/18/2022]
Abstract
Biological macromolecules have intricate structures that underpin their biological functions. Understanding their structure-function relationships remains a challenge due to their structural complexity and functional variability. Although de Rham-Hodge theory, a landmark of twentieth-century mathematics, has had a tremendous impact on mathematics and physics, it has not been devised for macromolecular modeling and analysis. In this work, we introduce de Rham-Hodge theory as a unified paradigm for analyzing the geometry, topology, flexibility, and Hodge mode analysis of biological macromolecules. Geometric characteristics and topological invariants are obtained either from the Helmholtz-Hodge decomposition of the scalar, vector, and/or tensor fields of a macromolecule or from the spectral analysis of various Laplace-de Rham operators defined on the molecular manifolds. We propose Laplace-de Rham spectral-based models for predicting macromolecular flexibility. We further construct a Laplace-de Rham-Helfrich operator for revealing cryo-EM natural frequencies. Extensive experiments are carried out to demonstrate that the proposed de Rham-Hodge paradigm is one of the most versatile tools for the multiscale modeling and analysis of biological macromolecules and subcellular organelles. Accurate, reliable, and topological structure-preserving algorithms for implementing discrete exterior calculus (DEC) have been developed to facilitate the aforementioned modeling and analysis of biological macromolecules. The proposed de Rham-Hodge paradigm has potential applications to subcellular organelles and the structure construction from medium- or low-resolution cryo-EM maps, and functional predictions from massive biomolecular datasets.
Collapse
Affiliation(s)
- Rundong Zhao
- Department of Computer Science and Engineering, Michigan State University, East Lansing, MI, 48824, USA
| | - Menglun Wang
- Department of Mathematics, Michigan State University, East Lansing, MI, 48824, USA
| | - Jiahui Chen
- Department of Mathematics, Michigan State University, East Lansing, MI, 48824, USA
| | - Yiying Tong
- Department of Computer Science and Engineering, Michigan State University, East Lansing, MI, 48824, USA.
| | - Guo-Wei Wei
- Department of Mathematics, Michigan State University, East Lansing, MI, 48824, USA.
- Department of Electrical and Computer Engineering, Michigan State University, East Lansing, MI, 48824, USA.
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, MI, 48824, USA.
| |
Collapse
|
15
|
Cang Z, Munch E, Wei GW. Evolutionary homology on coupled dynamical systems with applications to protein flexibility analysis. ACTA ACUST UNITED AC 2020; 4:481-507. [PMID: 34179350 DOI: 10.1007/s41468-020-00057-9] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
While the spatial topological persistence is naturally constructed from a radius-based filtration, it has hardly been derived from a temporal filtration. Most topological models are designed for the global topology of a given object as a whole. There is no method reported in the literature for the topology of an individual component in an object to the best of our knowledge. For many problems in science and engineering, the topology of an individual component is important for describing its properties. We propose evolutionary homology (EH) constructed via a time evolution-based filtration and topological persistence. Our approach couples a set of dynamical systems or chaotic oscillators by the interactions of a physical system, such as a macromolecule. The interactions are approximated by weighted graph Laplacians. Simplices, simplicial complexes, algebraic groups and topological persistence are defined on the coupled trajectories of the chaotic oscillators. The resulting EH gives rise to time-dependent topological invariants or evolutionary barcodes for an individual component of the physical system, revealing its topology-function relationship. In conjunction with Wasserstein metrics, the proposed EH is applied to protein flexibility analysis, an important problem in computational biophysics. Numerical results for the B-factor prediction of a benchmark set of 364 proteins indicate that the proposed EH outperforms all the other state-of-the-art methods in the field.
Collapse
Affiliation(s)
- Zixuan Cang
- Department of Mathematics, Michigan State University
| | - Elizabeth Munch
- Department of Computational Mathematics, Science and Engineering, Michigan State University
| | - Guo-Wei Wei
- Department of Mathematics, Michigan State University
| |
Collapse
|
16
|
Weighted persistent homology for osmolyte molecular aggregation and hydrogen-bonding network analysis. Sci Rep 2020; 10:9685. [PMID: 32546801 PMCID: PMC7297731 DOI: 10.1038/s41598-020-66710-6] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2019] [Accepted: 05/20/2020] [Indexed: 12/24/2022] Open
Abstract
It has long been observed that trimethylamine N-oxide (TMAO) and urea demonstrate dramatically different properties in a protein folding process. Even with the enormous theoretical and experimental research work on these two osmolytes, various aspects of their underlying mechanisms still remain largely elusive. In this paper, we propose to use the weighted persistent homology to systematically study the osmolytes molecular aggregation and their hydrogen-bonding network from a local topological perspective. We consider two weighted models, i.e., localized persistent homology (LPH) and interactive persistent homology (IPH). Boltzmann persistent entropy (BPE) is proposed to quantitatively characterize the topological features from LPH and IPH, together with persistent Betti number (PBN). More specifically, from the localized persistent homology models, we have found that TMAO and urea have very different local topology. TMAO is found to exhibit a local network structure. With the concentration increase, the circle elements in these networks show a clear increase in their total numbers and a decrease in their relative sizes. In contrast, urea shows two types of local topological patterns, i.e., local clusters around 6 Å and a few global circle elements at around 12 Å. From the interactive persistent homology models, it has been found that our persistent radial distribution function (PRDF) from the global-scale IPH has same physical properties as the traditional radial distribution function. Moreover, PRDFs from the local-scale IPH can also be generated and used to characterize the local interaction information. Other than the clear difference of the first peak value of PRDFs at filtration size 4 Å, TMAO and urea also shows very different behaviors at the second peak region from filtration size 5 Å to 10 Å. These differences are also reflected in the PBNs and BPEs of the local-scale IPH. These localized topological information has never been revealed before. Since graphs can be transferred into simplicial complexes by the clique complex, our weighted persistent homology models can be used in the analysis of various networks and graphs from any molecular structures and aggregation systems.
Collapse
|
17
|
Chen X, Chen D, Weng M, Jiang Y, Wei GW, Pan F. Topology-Based Machine Learning Strategy for Cluster Structure Prediction. J Phys Chem Lett 2020; 11:4392-4401. [PMID: 32320253 PMCID: PMC7351018 DOI: 10.1021/acs.jpclett.0c00974] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/10/2023]
Abstract
In cluster physics, the determination of the ground-state structure of medium-sized and large-sized clusters is a challenge due to the number of local minimal values on the potential energy surface growing exponentially with cluster size. Although machine learning approaches have had much success in materials sciences, their applications in clusters are often hindered by the geometric complexity clusters. Persistent homology provides a new topological strategy to simplify geometric complexity while retaining important chemical and physical information without having to "downgrade" the original data. We further propose persistent pairwise independence (PPI) to enhance the predictive power of persistent homology. We construct topology-based machine learning models to reveal hidden structure-energy relationships in lithium (Li) clusters. We integrate the topology-based machine learning models, a particle swarm optimization algorithm, and density functional theory calculations to accelerate the search of the globally stable structure of clusters.
Collapse
Affiliation(s)
- Xin Chen
- School of Advanced Materials, Shenzhen Graduate School, Peking University, Shenzhen 518055, People's Republic of China
| | - Dong Chen
- School of Advanced Materials, Shenzhen Graduate School, Peking University, Shenzhen 518055, People's Republic of China
| | - Mouyi Weng
- School of Advanced Materials, Shenzhen Graduate School, Peking University, Shenzhen 518055, People's Republic of China
| | - Yi Jiang
- School of Advanced Materials, Shenzhen Graduate School, Peking University, Shenzhen 518055, People's Republic of China
| | - Guo-Wei Wei
- Department of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
| | - Feng Pan
- School of Advanced Materials, Shenzhen Graduate School, Peking University, Shenzhen 518055, People's Republic of China
| |
Collapse
|
18
|
Abstract
Recently, machine learning (ML) has established itself in various worldwide benchmarking competitions in computational biology, including Critical Assessment of Structure Prediction (CASP) and Drug Design Data Resource (D3R) Grand Challenges. However, the intricate structural complexity and high ML dimensionality of biomolecular datasets obstruct the efficient application of ML algorithms in the field. In addition to data and algorithm, an efficient ML machinery for biomolecular predictions must include structural representation as an indispensable component. Mathematical representations that simplify the biomolecular structural complexity and reduce ML dimensionality have emerged as a prime winner in D3R Grand Challenges. This review is devoted to the recent advances in developing low-dimensional and scalable mathematical representations of biomolecules in our laboratory. We discuss three classes of mathematical approaches, including algebraic topology, differential geometry, and graph theory. We elucidate how the physical and biological challenges have guided the evolution and development of these mathematical apparatuses for massive and diverse biomolecular data. We focus the performance analysis on protein-ligand binding predictions in this review although these methods have had tremendous success in many other applications, such as protein classification, virtual screening, and the predictions of solubility, solvation free energies, toxicity, partition coefficients, protein folding stability changes upon mutation, etc.
Collapse
Affiliation(s)
- Duc Duy Nguyen
- Department of Mathematics, Michigan State University, MI 48824, USA.
| | - Zixuan Cang
- Department of Mathematics, Michigan State University, MI 48824, USA.
| | - Guo-Wei Wei
- Department of Mathematics, Michigan State University, MI 48824, USA. and Department of Biochemistry and Molecular Biology, Michigan State University, MI 48824, USA and Department of Electrical and Computer Engineering, Michigan State University, MI 48824, USA
| |
Collapse
|
19
|
Weighted persistent homology for biomolecular data analysis. Sci Rep 2020; 10:2079. [PMID: 32034168 PMCID: PMC7005716 DOI: 10.1038/s41598-019-55660-3] [Citation(s) in RCA: 24] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2019] [Accepted: 11/29/2019] [Indexed: 11/08/2022] Open
Abstract
In this paper, we systematically review weighted persistent homology (WPH) models and their applications in biomolecular data analysis. Essentially, the weight value, which reflects physical, chemical and biological properties, can be assigned to vertices (atom centers), edges (bonds), or higher order simplexes (cluster of atoms), depending on the biomolecular structure, function, and dynamics properties. Further, we propose the first localized weighted persistent homology (LWPH). Inspired by the great success of element specific persistent homology (ESPH), we do not treat biomolecules as an inseparable system like all previous weighted models, instead we decompose them into a series of local domains, which may be overlapped with each other. The general persistent homology or weighted persistent homology analysis is then applied on each of these local domains. In this way, functional properties, that are embedded in local structures, can be revealed. Our model has been applied to systematically study DNA structures. It has been found that our LWPH based features can be used to successfully discriminate the A-, B-, and Z-types of DNA. More importantly, our LWPH based principal component analysis (PCA) model can identify two configurational states of DNA structures in ion liquid environment, which can be revealed only by the complicated helical coordinate system. The great consistence with the helical-coordinate model demonstrates that our model captures local structure variations so well that it is comparable with geometric models. Moreover, geometric measurements are usually defined in local regions. For instance, the helical-coordinate system is limited to one or two basepairs. However, our LWPH can quantitatively characterize structure information in regions or domains with arbitrary sizes and shapes, where traditional geometrical measurements fail.
Collapse
|
20
|
Nguyen DD, Gao K, Wang M, Wei GW. MathDL: mathematical deep learning for D3R Grand Challenge 4. J Comput Aided Mol Des 2020; 34:131-147. [PMID: 31734815 PMCID: PMC7376411 DOI: 10.1007/s10822-019-00237-5] [Citation(s) in RCA: 45] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2019] [Accepted: 10/14/2019] [Indexed: 12/17/2022]
Abstract
We present the performances of our mathematical deep learning (MathDL) models for D3R Grand Challenge 4 (GC4). This challenge involves pose prediction, affinity ranking, and free energy estimation for beta secretase 1 (BACE) as well as affinity ranking and free energy estimation for Cathepsin S (CatS). We have developed advanced mathematics, namely differential geometry, algebraic graph, and/or algebraic topology, to accurately and efficiently encode high dimensional physical/chemical interactions into scalable low-dimensional rotational and translational invariant representations. These representations are integrated with deep learning models, such as generative adversarial networks (GAN) and convolutional neural networks (CNN) for pose prediction and energy evaluation, respectively. Overall, our MathDL models achieved the top place in pose prediction for BACE ligands in Stage 1a. Moreover, our submissions obtained the highest Spearman correlation coefficient on the affinity ranking of 460 CatS compounds, and the smallest centered root mean square error on the free energy set of 39 CatS molecules. It is worthy to mention that our method on docking pose predictions has significantly improved from our previous ones.
Collapse
Affiliation(s)
- Duc Duy Nguyen
- Department of Mathematics, Michigan State University, East Lansing, MI, 48824, USA
| | - Kaifu Gao
- Department of Mathematics, Michigan State University, East Lansing, MI, 48824, USA
| | - Menglun Wang
- Department of Mathematics, Michigan State University, East Lansing, MI, 48824, USA
| | - Guo-Wei Wei
- Department of Mathematics, Michigan State University, East Lansing, MI, 48824, USA.
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, MI, 48824, USA.
- Department of Electrical and Computer Engineering, Michigan State University, East Lansing, MI, 48824, USA.
| |
Collapse
|
21
|
Chevyrev I, Nanda V, Oberhauser H. Persistence Paths and Signature Features in Topological Data Analysis. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2020; 42:192-202. [PMID: 30530312 DOI: 10.1109/tpami.2018.2885516] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
We introduce a new feature map for barcodes as they arise in persistent homology computation. The main idea is to first realize each barcode as a path in a convenient vector space, and to then compute its path signature which takes values in the tensor algebra of that vector space. The composition of these two operations-barcode to path, path to tensor series-results in a feature map that has several desirable properties for statistical learning, such as universality and characteristicness, and achieves state-of-the-art results on common classification benchmarks.
Collapse
|
22
|
Shi Q, Chen W, Huang S, Wang Y, Xue Z. Deep learning for mining protein data. Brief Bioinform 2019; 22:194-218. [PMID: 31867611 DOI: 10.1093/bib/bbz156] [Citation(s) in RCA: 31] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2019] [Revised: 10/21/2019] [Accepted: 11/07/2019] [Indexed: 01/16/2023] Open
Abstract
The recent emergence of deep learning to characterize complex patterns of protein big data reveals its potential to address the classic challenges in the field of protein data mining. Much research has revealed the promise of deep learning as a powerful tool to transform protein big data into valuable knowledge, leading to scientific discoveries and practical solutions. In this review, we summarize recent publications on deep learning predictive approaches in the field of mining protein data. The application architectures of these methods include multilayer perceptrons, stacked autoencoders, deep belief networks, two- or three-dimensional convolutional neural networks, recurrent neural networks, graph neural networks, and complex neural networks and are described from five perspectives: residue-level prediction, sequence-level prediction, three-dimensional structural analysis, interaction prediction, and mass spectrometry data mining. The advantages and deficiencies of these architectures are presented in relation to various tasks in protein data mining. Additionally, some practical issues and their future directions are discussed, such as robust deep learning for protein noisy data, architecture optimization for specific tasks, efficient deep learning for limited protein data, multimodal deep learning for heterogeneous protein data, and interpretable deep learning for protein understanding. This review provides comprehensive perspectives on general deep learning techniques for protein data analysis.
Collapse
Affiliation(s)
- Qiang Shi
- School of Software Engineering, Huazhong University of Science and Technology. His main interests cover machine learning especially deep learning, protein data analysis, and big data mining
| | - Weiya Chen
- School of Software Engineering, Huazhong University of Science & Technology, Wuhan, China. His research interests cover bioinformatics, virtual reality, and data visualization
| | - Siqi Huang
- Software Engineering at Huazhong University of science and technology, focusing on Machine learning and data mining
| | - Yan Wang
- School of life, University of Science & Technology; her main interests cover protein structure and function prediction and big data mining
| | - Zhidong Xue
- School of Software Engineering, Huazhong University of Science & Technology, Wuhan, China. His research interests cover bioinformatics, machine learning, and image processing
| |
Collapse
|
23
|
Zhao R, Cang Z, Tong Y, Wei GW. Protein pocket detection via convex hull surface evolution and associated Reeb graph. Bioinformatics 2019; 34:i830-i837. [PMID: 30423105 DOI: 10.1093/bioinformatics/bty598] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
Motivation Protein pocket information is invaluable for drug target identification, agonist design, virtual screening and receptor-ligand binding analysis. A recent study indicates that about half holoproteins can simultaneously bind multiple interacting ligands in a large pocket containing structured sub-pockets. Although this hierarchical pocket and sub-pocket structure has a significant impact to multi-ligand synergistic interactions in the protein binding site, there is no method available for this analysis. This work introduces a computational tool based on differential geometry, algebraic topology and physics-based simulation to address this pressing issue. Results We propose to detect protein pockets by evolving the convex hull surface inwards until it touches the protein surface everywhere. The governing partial differential equations (PDEs) include the mean curvature flow combined with the eikonal equation commonly used in the fast marching algorithm in the Eulerian representation. The surface evolution induced Morse function and Reeb graph are utilized to characterize the hierarchical pocket and sub-pocket structure in controllable detail. The proposed method is validated on PDBbind refined sets of 4414 protein-ligand complexes. Extensive numerical tests indicate that the proposed method not only provides a unique description of pocket-sub-pocket relations, but also offers efficient estimations of pocket surface area, pocket volume and pocket depth. Availability and implementation Source code available at https://github.com/rdzhao/ProteinPocketDetection. Webserver available at http://weilab.math.msu.edu/PPD/.
Collapse
Affiliation(s)
- Rundong Zhao
- Department of Computer Science and Engineering, Michigan State University, East Lansing, MI, USA
| | - Zixuan Cang
- Department of Mathematics, Michigan State University, East Lansing, MI, USA
| | - Yiying Tong
- Department of Computer Science and Engineering, Michigan State University, East Lansing, MI, USA
| | - Guo-Wei Wei
- Department of Mathematics, Michigan State University, East Lansing, MI, USA
| |
Collapse
|
24
|
A Primer on Persistent Homology of Finite Metric Spaces. Bull Math Biol 2019; 81:2074-2116. [PMID: 31140053 DOI: 10.1007/s11538-019-00614-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2018] [Accepted: 05/10/2019] [Indexed: 10/26/2022]
Abstract
Topological data analysis (TDA) is a relatively new area of research related to importing classical ideas from topology into the realm of data analysis. Under the umbrella term TDA, there falls, in particular, the notion of persistent homology PH, which can be described in a nutshell, as the study of scale-dependent homological invariants of datasets. In these notes, we provide a terse self-contained description of the main ideas behind the construction of persistent homology as an invariant feature of datasets, and its stability to perturbations.
Collapse
|
25
|
Grow C, Gao K, Nguyen DD, Wei GW. Generative network complex (GNC) for drug discovery. COMMUNICATIONS IN INFORMATION AND SYSTEMS 2019; 19:241-277. [PMID: 34257523 PMCID: PMC8274326 DOI: 10.4310/cis.2019.v19.n3.a2] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
It remains a challenging task to generate a vast variety of novel compounds with desirable pharmacological properties. In this work, a generative network complex (GNC) is proposed as a new platform for designing novel compounds, predicting their physical and chemical properties, and selecting potential drug candidates that fulfill various druggable criteria such as binding affinity, solubility, partition coefficient, etc. We combine a SMILES string generator, which consists of an encoder, a drug-property controlled or regulated latent space, and a decoder, with verification deep neural networks, a target-specific three-dimensional (3D) pose generator, and mathematical deep learning networks to generate new compounds, predict their drug properties, construct 3D poses associated with target proteins, and reevaluate druggability, respectively. New compounds were generated in the latent space by either randomized output, controlled output, or optimized output. In our demonstration, 2.08 million and 2.8 million novel compounds are generated respectively for Cathepsin S and BACE targets. These new compounds are very different from the seeds and cover a larger chemical space. For potentially active compounds, their 3D poses are generated using a state-of-the-art method. The resulting 3D complexes are further evaluated for druggability by a championing deep learning algorithm based on algebraic topology, differential geometry, and algebraic graph theories. Performed on supercomputers, the whole process took less than one week. Therefore, our GNC is an efficient new paradigm for discovering new drug candidates.
Collapse
Affiliation(s)
- Christopher Grow
- Department of Mathematics, Michigan State University, East Lansing, MI 48824, USA
| | - Kaifu Gao
- Department of Mathematics, Michigan State University, East Lansing, MI 48824, USA
| | - Duc Duy Nguyen
- Department of Mathematics, Michigan State University, East Lansing, MI 48824, USA
| | - Guo-Wei Wei
- Department of Mathematics, Michigan State University, East Lansing, MI 48824, USA
| |
Collapse
|
26
|
Xia K, Anand DV, Shikhar S, Mu Y. Persistent homology analysis of osmolyte molecular aggregation and their hydrogen-bonding networks. Phys Chem Chem Phys 2019; 21:21038-21048. [DOI: 10.1039/c9cp03009c] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Dramatically different patterns can be observed in the topological fingerprints for hydrogen-bonding networks from two types of osmolyte systems.
Collapse
Affiliation(s)
- Kelin Xia
- Division of Mathematical Sciences
- School of Physical and Mathematical Sciences
- School of Biological Sciences
- Nanyang Technological University
- Singapore
| | - D. Vijay Anand
- Division of Mathematical Sciences
- School of Physical and Mathematical Sciences
- School of Biological Sciences
- Nanyang Technological University
- Singapore
| | - Saxena Shikhar
- School of Biological Sciences
- Nanyang Technological University
- Singapore
| | - Yuguang Mu
- School of Biological Sciences
- Nanyang Technological University
- Singapore
| |
Collapse
|
27
|
Pirashvili M, Steinberg L, Belchi Guillamon F, Niranjan M, Frey JG, Brodzki J. Improved understanding of aqueous solubility modeling through topological data analysis. J Cheminform 2018; 10:54. [PMID: 30460426 PMCID: PMC6755597 DOI: 10.1186/s13321-018-0308-5] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2018] [Accepted: 11/08/2018] [Indexed: 11/10/2022] Open
Abstract
Topological data analysis is a family of recent mathematical techniques seeking to understand the 'shape' of data, and has been used to understand the structure of the descriptor space produced from a standard chemical informatics software from the point of view of solubility. We have used the mapper algorithm, a TDA method that creates low-dimensional representations of data, to create a network visualization of the solubility space. While descriptors with clear chemical implications are prominent features in this space, reflecting their importance to the chemical properties, an unexpected and interesting correlation between chlorine content and rings and their implication for solubility prediction is revealed. A parallel representation of the chemical space was generated using persistent homology applied to molecular graphs. Links between this chemical space and the descriptor space were shown to be in agreement with chemical heuristics. The use of persistent homology on molecular graphs, extended by the use of norms on the associated persistence landscapes allow the conversion of discrete shape descriptors to continuous ones, and a perspective of the application of these descriptors to quantitative structure property relations is presented.
Collapse
Affiliation(s)
| | - Lee Steinberg
- Department of Chemistry, University of Southampton, Southampton, UK
| | - Francisco Belchi Guillamon
- Mathematical Sciences, University of Southampton, Southampton, UK.,Institut de Robòtica i Informàtica industrial, CSIC-UPC, Llorens i Artigas 4-6, 08028, Barcelona, Spain
| | | | - Jeremy G Frey
- Department of Chemistry, University of Southampton, Southampton, UK
| | - Jacek Brodzki
- Mathematical Sciences, University of Southampton, Southampton, UK
| |
Collapse
|
28
|
Mathematical deep learning for pose and binding affinity prediction and ranking in D3R Grand Challenges. J Comput Aided Mol Des 2018; 33:71-82. [PMID: 30116918 DOI: 10.1007/s10822-018-0146-6] [Citation(s) in RCA: 99] [Impact Index Per Article: 16.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2018] [Accepted: 08/03/2018] [Indexed: 12/18/2022]
Abstract
Advanced mathematics, such as multiscale weighted colored subgraph and element specific persistent homology, and machine learning including deep neural networks were integrated to construct mathematical deep learning models for pose and binding affinity prediction and ranking in the last two D3R Grand Challenges in computer-aided drug design and discovery. D3R Grand Challenge 2 focused on the pose prediction, binding affinity ranking and free energy prediction for Farnesoid X receptor ligands. Our models obtained the top place in absolute free energy prediction for free energy set 1 in stage 2. The latest competition, D3R Grand Challenge 3 (GC3), is considered as the most difficult challenge so far. It has five subchallenges involving Cathepsin S and five other kinase targets, namely VEGFR2, JAK2, p38-α, TIE2, and ABL1. There is a total of 26 official competitive tasks for GC3. Our predictions were ranked 1st in 10 out of these 26 tasks.
Collapse
|
29
|
Xia K. Persistent homology analysis of ion aggregations and hydrogen-bonding networks. Phys Chem Chem Phys 2018; 20:13448-13460. [PMID: 29722784 DOI: 10.1039/c8cp01552j] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
Despite the great advancement of experimental tools and theoretical models, a quantitative characterization of the microscopic structures of ion aggregates and their associated water hydrogen-bonding networks still remains a challenging problem. In this paper, a newly-invented mathematical method called persistent homology is introduced, for the first time, to quantitatively analyze the intrinsic topological properties of ion aggregation systems and hydrogen-bonding networks. The two most distinguishable properties of persistent homology analysis of assembly systems are as follows. First, it does not require a predefined bond length to construct the ion or hydrogen-bonding network. Persistent homology results are determined by the morphological structure of the data only. Second, it can directly measure the size of circles or holes in ion aggregates and hydrogen-bonding networks. To validate our model, we consider two well-studied systems, i.e., NaCl and KSCN solutions, generated from molecular dynamics simulations. They are believed to represent two morphological types of aggregation, i.e., local clusters and extended ion networks. It has been found that the two aggregation types have distinguishable topological features and can be characterized by our topological model very well. Further, we construct two types of networks, i.e., O-networks and H2O-networks, for analyzing the topological properties of hydrogen-bonding networks. It is found that for both models, KSCN systems demonstrate much more dramatic variations in their local circle structures with a concentration increase. A consistent increase of large-sized local circle structures is observed and the sizes of these circles become more and more diverse. In contrast, NaCl systems show no obvious increase of large-sized circles. Instead a consistent decline of the average size of the circle structures is observed and the sizes of these circles become more and more uniform with a concentration increase. As far as we know, these unique intrinsic topological features in ion aggregation systems have never been pointed out before. More importantly, our models can be directly used to quantitatively analyze the intrinsic topological invariants, including circles, loops, holes, and cavities, of any network-like structures, such as nanomaterials, colloidal systems, biomolecular assemblies, among others. These topological invariants cannot be described by traditional graph and network models.
Collapse
Affiliation(s)
- Kelin Xia
- Division of Mathematical Sciences, School of Physical and Mathematical Sciences, School of Biological Sciences, Nanyang Technological University, 637371, Singapore.
| |
Collapse
|
30
|
TopP-S: Persistent homology-based multi-task deep neural networks for simultaneous predictions of partition coefficient and aqueous solubility. J Comput Chem 2018; 39:1444-1454. [DOI: 10.1002/jcc.25213] [Citation(s) in RCA: 39] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2017] [Revised: 01/15/2018] [Accepted: 02/25/2018] [Indexed: 01/09/2023]
|
31
|
Kimura M, Obayashi I, Takeichi Y, Murao R, Hiraoka Y. Non-empirical identification of trigger sites in heterogeneous processes using persistent homology. Sci Rep 2018; 8:3553. [PMID: 29476108 PMCID: PMC5824834 DOI: 10.1038/s41598-018-21867-z] [Citation(s) in RCA: 26] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2017] [Accepted: 02/12/2018] [Indexed: 11/09/2022] Open
Abstract
Macroscopic phenomena, such as fracture, corrosion, and degradation of materials, are associated with various reactions which progress heterogeneously. Thus, material properties are generally determined not by their averaged characteristics but by specific features in heterogeneity (or 'trigger sites') of phases, chemical states, etc., where the key reactions that dictate macroscopic properties initiate and propagate. Therefore, the identification of trigger sites is crucial for controlling macroscopic properties. However, this is a challenging task. Previous studies have attempted to identify trigger sites based on the knowledge of materials science derived from experimental data ('empirical approach'). However, this approach becomes impractical when little is known about the reaction or when large multi-dimensional datasets, such as those with multiscale heterogeneities in time and/or space, are considered. Here, we introduce a new persistent homology approach for identifying trigger sites and apply it to the heterogeneous reduction of iron ore sinters. Four types of trigger sites, 'hourglass'-shaped calcium ferrites and 'island'- shaped iron oxides, were determined to initiate crack formation using only mapping data depicting the heterogeneities of phases and cracks without prior mechanistic information. The identification of these trigger sites can provide a design rule for reducing mechanical degradation during reduction.
Collapse
Affiliation(s)
- Masao Kimura
- Photon Factory, Institute of Materials Structure Science, High Energy Accelerator Research Organization (KEK), Tsukuba, Ibaraki, 305-0801, Japan. .,Department of Materials Structure Science, School of High Energy Accelerator Science, SOKENDAI (The Graduate University for Advanced Studies), Tsukuba, Ibaraki, 305-0801, Japan.
| | - Ippei Obayashi
- Advanced Institute for Materials Research (AIMR), Tohoku University, Sendai, Miyagi, 980-8577, Japan
| | - Yasuo Takeichi
- Photon Factory, Institute of Materials Structure Science, High Energy Accelerator Research Organization (KEK), Tsukuba, Ibaraki, 305-0801, Japan.,Department of Materials Structure Science, School of High Energy Accelerator Science, SOKENDAI (The Graduate University for Advanced Studies), Tsukuba, Ibaraki, 305-0801, Japan
| | - Reiko Murao
- Advanced Technology Research Laboratories, Nippon Steel & Sumitomo Metal Co., Futtsu, Chiba, 293-8511, Japan
| | - Yasuaki Hiraoka
- Advanced Institute for Materials Research (AIMR), Tohoku University, Sendai, Miyagi, 980-8577, Japan.,Center for Materials research by Information Integration (CMI2), Research and Services Division of Materials Data and Integrated System (MaDIS), National Institute for Materials Science (NIMS), Tsukuba, Ibaraki, 305-0047, Japan.,Center for Advanced Intelligence Project, RIKEN, Tokyo, 103-0027, Japan
| |
Collapse
|
32
|
Cang Z, Wei GW. Integration of element specific persistent homology and machine learning for protein-ligand binding affinity prediction. INTERNATIONAL JOURNAL FOR NUMERICAL METHODS IN BIOMEDICAL ENGINEERING 2018; 34. [PMID: 28677268 DOI: 10.1002/cnm.2914] [Citation(s) in RCA: 93] [Impact Index Per Article: 15.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/14/2017] [Revised: 06/27/2017] [Accepted: 06/29/2017] [Indexed: 05/17/2023]
Abstract
Protein-ligand binding is a fundamental biological process that is paramount to many other biological processes, such as signal transduction, metabolic pathways, enzyme construction, cell secretion, and gene expression. Accurate prediction of protein-ligand binding affinities is vital to rational drug design and the understanding of protein-ligand binding and binding induced function. Existing binding affinity prediction methods are inundated with geometric detail and involve excessively high dimensions, which undermines their predictive power for massive binding data. Topology provides the ultimate level of abstraction and thus incurs too much reduction in geometric information. Persistent homology embeds geometric information into topological invariants and bridges the gap between complex geometry and abstract topology. However, it oversimplifies biological information. This work introduces element specific persistent homology (ESPH) or multicomponent persistent homology to retain crucial biological information during topological simplification. The combination of ESPH and machine learning gives rise to a powerful paradigm for macromolecular analysis. Tests on 2 large data sets indicate that the proposed topology-based machine-learning paradigm outperforms other existing methods in protein-ligand binding affinity predictions. ESPH reveals protein-ligand binding mechanism that can not be attained from other conventional techniques. The present approach reveals that protein-ligand hydrophobic interactions are extended to 40Å away from the binding site, which has a significant ramification to drug and protein design.
Collapse
Affiliation(s)
- Zixuan Cang
- Department of Mathematics, Michigan State University, MI 48824, USA
| | - Guo-Wei Wei
- Department of Mathematics, Michigan State University, MI 48824, USA
- Department of Biochemistry and Molecular Biology, Michigan State University, MI 48824, USA
- Department of Electrical and Computer Engineering, Michigan State University, MI 48824, USA
| |
Collapse
|
33
|
Wu K, Wei GW. Quantitative Toxicity Prediction Using Topology Based Multitask Deep Neural Networks. J Chem Inf Model 2018; 58:520-531. [DOI: 10.1021/acs.jcim.7b00558] [Citation(s) in RCA: 75] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Kedi Wu
- Department of Mathematics, ‡Department of Electrical and Computer Engineering, and ¶Department of Biochemistry
and Molecular Biology, Michigan State University, East Lansing, Michigan 48824, United States
| | - Guo-Wei Wei
- Department of Mathematics, ‡Department of Electrical and Computer Engineering, and ¶Department of Biochemistry
and Molecular Biology, Michigan State University, East Lansing, Michigan 48824, United States
| |
Collapse
|
34
|
Cang Z, Mu L, Wei GW. Representability of algebraic topology for biomolecules in machine learning based scoring and virtual screening. PLoS Comput Biol 2018; 14:e1005929. [PMID: 29309403 PMCID: PMC5774846 DOI: 10.1371/journal.pcbi.1005929] [Citation(s) in RCA: 141] [Impact Index Per Article: 23.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2017] [Revised: 01/19/2018] [Accepted: 12/15/2017] [Indexed: 12/05/2022] Open
Abstract
This work introduces a number of algebraic topology approaches, including multi-component persistent homology, multi-level persistent homology, and electrostatic persistence for the representation, characterization, and description of small molecules and biomolecular complexes. In contrast to the conventional persistent homology, multi-component persistent homology retains critical chemical and biological information during the topological simplification of biomolecular geometric complexity. Multi-level persistent homology enables a tailored topological description of inter- and/or intra-molecular interactions of interest. Electrostatic persistence incorporates partial charge information into topological invariants. These topological methods are paired with Wasserstein distance to characterize similarities between molecules and are further integrated with a variety of machine learning algorithms, including k-nearest neighbors, ensemble of trees, and deep convolutional neural networks, to manifest their descriptive and predictive powers for protein-ligand binding analysis and virtual screening of small molecules. Extensive numerical experiments involving 4,414 protein-ligand complexes from the PDBBind database and 128,374 ligand-target and decoy-target pairs in the DUD database are performed to test respectively the scoring power and the discriminatory power of the proposed topological learning strategies. It is demonstrated that the present topological learning outperforms other existing methods in protein-ligand binding affinity prediction and ligand-decoy discrimination.
Collapse
Affiliation(s)
- Zixuan Cang
- Department of Mathematics, Michigan State University, East Lansing, Michigan, United States of America
| | - Lin Mu
- Computer Science and Mathematics Division, Oak Ridge National Laboratory, Oak Ridge, Tennessee, United States of America
| | - Guo-Wei Wei
- Department of Mathematics, Michigan State University, East Lansing, Michigan, United States of America
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, Michigan, United States of America
- Department of Electrical and Computer Engineering, Michigan State University, East Lansing, Michigan, United States of America
| |
Collapse
|
35
|
Multiscale Persistent Functions for Biomolecular Structure Characterization. Bull Math Biol 2017; 80:1-31. [PMID: 29098540 DOI: 10.1007/s11538-017-0362-6] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2016] [Accepted: 10/19/2017] [Indexed: 10/18/2022]
Abstract
In this paper, we introduce multiscale persistent functions for biomolecular structure characterization. The essential idea is to combine our multiscale rigidity functions (MRFs) with persistent homology analysis, so as to construct a series of multiscale persistent functions, particularly multiscale persistent entropies, for structure characterization. To clarify the fundamental idea of our method, the multiscale persistent entropy (MPE) model is discussed in great detail. Mathematically, unlike the previous persistent entropy (Chintakunta et al. in Pattern Recognit 48(2):391-401, 2015; Merelli et al. in Entropy 17(10):6872-6892, 2015; Rucco et al. in: Proceedings of ECCS 2014, Springer, pp 117-128, 2016), a special resolution parameter is incorporated into our model. Various scales can be achieved by tuning its value. Physically, our MPE can be used in conformational entropy evaluation. More specifically, it is found that our method incorporates in it a natural classification scheme. This is achieved through a density filtration of an MRF built from angular distributions. To further validate our model, a systematical comparison with the traditional entropy evaluation model is done. It is found that our model is able to preserve the intrinsic topological features of biomolecular data much better than traditional approaches, particularly for resolutions in the intermediate range. Moreover, by comparing with traditional entropies from various grid sizes, bond angle-based methods and a persistent homology-based support vector machine method (Cang et al. in Mol Based Math Biol 3:140-162, 2015), we find that our MPE method gives the best results in terms of average true positive rate in a classic protein structure classification test. More interestingly, all-alpha and all-beta protein classes can be clearly separated from each other with zero error only in our model. Finally, a special protein structure index (PSI) is proposed, for the first time, to describe the "regularity" of protein structures. Basically, a protein structure is deemed as regular if it has a consistent and orderly configuration. Our PSI model is tested on a database of 110 proteins; we find that structures with larger portions of loops and intrinsically disorder regions are always associated with larger PSI, meaning an irregular configuration, while proteins with larger portions of secondary structures, i.e., alpha-helix or beta-sheet, have smaller PSI. Essentially, PSI can be used to describe the "regularity" information in any systems.
Collapse
|
36
|
Otter N, Porter MA, Tillmann U, Grindrod P, Harrington HA. A roadmap for the computation of persistent homology. EPJ DATA SCIENCE 2017; 6:17. [PMID: 32025466 PMCID: PMC6979512 DOI: 10.1140/epjds/s13688-017-0109-5] [Citation(s) in RCA: 124] [Impact Index Per Article: 17.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/30/2017] [Accepted: 06/07/2017] [Indexed: 05/21/2023]
Abstract
Persistent homology (PH) is a method used in topological data analysis (TDA) to study qualitative features of data that persist across multiple scales. It is robust to perturbations of input data, independent of dimensions and coordinates, and provides a compact representation of the qualitative features of the input. The computation of PH is an open area with numerous important and fascinating challenges. The field of PH computation is evolving rapidly, and new algorithms and software implementations are being updated and released at a rapid pace. The purposes of our article are to (1) introduce theory and computational methods for PH to a broad range of computational scientists and (2) provide benchmarks of state-of-the-art implementations for the computation of PH. We give a friendly introduction to PH, navigate the pipeline for the computation of PH with an eye towards applications, and use a range of synthetic and real-world data sets to evaluate currently available open-source implementations for the computation of PH. Based on our benchmarking, we indicate which algorithms and implementations are best suited to different types of data sets. In an accompanying tutorial, we provide guidelines for the computation of PH. We make publicly available all scripts that we wrote for the tutorial, and we make available the processed version of the data sets used in the benchmarking. ELECTRONIC SUPPLEMENTARY MATERIAL The online version of this article (doi:10.1140/epjds/s13688-017-0109-5) contains supplementary material.
Collapse
Affiliation(s)
- Nina Otter
- Mathematical Institute, University of Oxford, Oxford, OX2 6GG UK
- The Alan Turing Institute, 96 Euston Road, London, NW1 2DB UK
| | - Mason A Porter
- Mathematical Institute, University of Oxford, Oxford, OX2 6GG UK
- CABDyN Complexity Centre, University of Oxford, Oxford, OX1 1HP UK
- Department of Mathematics, UCLA, Los Angeles, CA 90095 USA
| | - Ulrike Tillmann
- Mathematical Institute, University of Oxford, Oxford, OX2 6GG UK
- The Alan Turing Institute, 96 Euston Road, London, NW1 2DB UK
| | - Peter Grindrod
- Mathematical Institute, University of Oxford, Oxford, OX2 6GG UK
| | | |
Collapse
|
37
|
Cang Z, Wei GW. TopologyNet: Topology based deep convolutional and multi-task neural networks for biomolecular property predictions. PLoS Comput Biol 2017; 13:e1005690. [PMID: 28749969 PMCID: PMC5549771 DOI: 10.1371/journal.pcbi.1005690] [Citation(s) in RCA: 161] [Impact Index Per Article: 23.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2017] [Revised: 08/08/2017] [Accepted: 07/18/2017] [Indexed: 11/18/2022] Open
Abstract
Although deep learning approaches have had tremendous success in image, video and audio processing, computer vision, and speech recognition, their applications to three-dimensional (3D) biomolecular structural data sets have been hindered by the geometric and biological complexity. To address this problem we introduce the element-specific persistent homology (ESPH) method. ESPH represents 3D complex geometry by one-dimensional (1D) topological invariants and retains important biological information via a multichannel image-like representation. This representation reveals hidden structure-function relationships in biomolecules. We further integrate ESPH and deep convolutional neural networks to construct a multichannel topological neural network (TopologyNet) for the predictions of protein-ligand binding affinities and protein stability changes upon mutation. To overcome the deep learning limitations from small and noisy training sets, we propose a multi-task multichannel topological convolutional neural network (MM-TCNN). We demonstrate that TopologyNet outperforms the latest methods in the prediction of protein-ligand binding affinities, mutation induced globular protein folding free energy changes, and mutation induced membrane protein folding free energy changes. AVAILABILITY weilab.math.msu.edu/TDL/.
Collapse
Affiliation(s)
- Zixuan Cang
- Department of Mathematics, Michigan State University, East Lansing, MI 48824, USA
| | - Guo-Wei Wei
- Department of Mathematics, Michigan State University, East Lansing, MI 48824, USA
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, MI 48824, USA
- Department of Electrical and Computer Engineering, Michigan State University, East Lansing, MI 48824, USA
| |
Collapse
|
38
|
Liu B, Wang B, Zhao R, Tong Y, Wei GW. ESES: Software for Eulerian solvent excluded surface. J Comput Chem 2017; 38:446-466. [PMID: 28052350 DOI: 10.1002/jcc.24682] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2016] [Revised: 11/02/2016] [Accepted: 11/09/2016] [Indexed: 12/17/2022]
Abstract
Solvent excluded surface (SES) is one of the most popular surface definitions in biophysics and molecular biology. In addition to its usage in biomolecular visualization, it has been widely used in implicit solvent models, in which SES is usually immersed in a Cartesian mesh. Therefore, it is important to construct SESs in the Eulerian representation for biophysical modeling and computation. This work describes a software package called Eulerian solvent excluded surface (ESES) for the generation of accurate SESs in Cartesian grids. ESES offers the description of the solvent and solute domains by specifying all the intersection points between the SES and the Cartesian grid lines. Additionally, the interface normal at each intersection point is evaluated. Furthermore, for a given biomolecule, the ESES software not only provides the whole surface area, but also partitions the surface area according to atomic types. Homology theory is utilized to detect topological features, such as loops and cavities, on the complex formed by the SES. The sizes of loops and cavities are measured based on persistent homology with an evolutionary partial differential equation-based filtration. ESES is extensively validated by surface visualization, electrostatic solvation free energy computation, surface area and volume calculations, and loop and cavity detection and their size estimation. We used the Amber PBSA test set in our electrostatic solvation energy, area, and volume validations. Our results are either calibrated by analytical values or compared with those from the MSMS software. © 2017 Wiley Periodicals, Inc.
Collapse
Affiliation(s)
- Beibei Liu
- Department of Computer Science and Engineering, Michigan State University, East Lansing, Michigan, 48824
| | - Bao Wang
- Department of Mathematics, Michigan State University, East Lansing, Michigan, 48824
| | - Rundong Zhao
- Department of Computer Science and Engineering, Michigan State University, East Lansing, Michigan, 48824
| | - Yiying Tong
- Department of Computer Science and Engineering, Michigan State University, East Lansing, Michigan, 48824
| | - Guo-Wei Wei
- Department of Mathematics, Michigan State University, East Lansing, Michigan, 48824.,Department of Electrical and Computer Engineering, Michigan State University, East Lansing, Michigan, 48824.,Department of Biochemistry and, Molecular Biology, Michigan State University, East Lansing, Michigan, 48824
| |
Collapse
|
39
|
Sanderson N, Shugerman E, Molnar S, Meiss JD, Bradley E. Computational Topology Techniques for Characterizing Time-Series Data. ADVANCES IN INTELLIGENT DATA ANALYSIS XVI 2017. [DOI: 10.1007/978-3-319-68765-0_24] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/24/2023]
|
40
|
Nguyen DD, Wei GW. The impact of surface area, volume, curvature, and Lennard-Jones potential to solvation modeling. J Comput Chem 2016; 38:24-36. [PMID: 27718270 DOI: 10.1002/jcc.24512] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2016] [Revised: 08/17/2016] [Accepted: 08/30/2016] [Indexed: 12/24/2022]
Abstract
This article explores the impact of surface area, volume, curvature, and Lennard-Jones (LJ) potential on solvation free energy predictions. Rigidity surfaces are utilized to generate robust analytical expressions for maximum, minimum, mean, and Gaussian curvatures of solvent-solute interfaces, and define a generalized Poisson-Boltzmann (GPB) equation with a smooth dielectric profile. Extensive correlation analysis is performed to examine the linear dependence of surface area, surface enclosed volume, maximum curvature, minimum curvature, mean curvature, and Gaussian curvature for solvation modeling. It is found that surface area and surfaces enclosed volumes are highly correlated to each other's, and poorly correlated to various curvatures for six test sets of molecules. Different curvatures are weakly correlated to each other for six test sets of molecules, but are strongly correlated to each other within each test set of molecules. Based on correlation analysis, we construct twenty six nontrivial nonpolar solvation models. Our numerical results reveal that the LJ potential plays a vital role in nonpolar solvation modeling, especially for molecules involving strong van der Waals interactions. It is found that curvatures are at least as important as surface area or surface enclosed volume in nonpolar solvation modeling. In conjugation with the GPB model, various curvature-based nonpolar solvation models are shown to offer some of the best solvation free energy predictions for a wide range of test sets. For example, root mean square errors from a model constituting surface area, volume, mean curvature, and LJ potential are less than 0.42 kcal/mol for all test sets. © 2016 Wiley Periodicals, Inc.
Collapse
Affiliation(s)
- Duc D Nguyen
- Department of Mathematics, Michigan State University, Michigan, 48824
| | - Guo-Wei Wei
- Department of Mathematics, Michigan State University, Michigan, 48824.,Department of Electrical and Computer Engineering, Michigan State University, Michigan, 48824.,Department of Biochemistry and Molecular Biology, Michigan State University, Michigan, 48824
| |
Collapse
|
41
|
Giusti C, Ghrist R, Bassett DS. Two's company, three (or more) is a simplex : Algebraic-topological tools for understanding higher-order structure in neural data. J Comput Neurosci 2016; 41:1-14. [PMID: 27287487 PMCID: PMC4927616 DOI: 10.1007/s10827-016-0608-6] [Citation(s) in RCA: 147] [Impact Index Per Article: 18.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2016] [Revised: 03/25/2016] [Accepted: 05/16/2016] [Indexed: 12/11/2022]
Abstract
The language of graph theory, or network science, has proven to be an exceptional tool for addressing myriad problems in neuroscience. Yet, the use of networks is predicated on a critical simplifying assumption: that the quintessential unit of interest in a brain is a dyad - two nodes (neurons or brain regions) connected by an edge. While rarely mentioned, this fundamental assumption inherently limits the types of neural structure and function that graphs can be used to model. Here, we describe a generalization of graphs that overcomes these limitations, thereby offering a broad range of new possibilities in terms of modeling and measuring neural phenomena. Specifically, we explore the use of simplicial complexes: a structure developed in the field of mathematics known as algebraic topology, of increasing applicability to real data due to a rapidly growing computational toolset. We review the underlying mathematical formalism as well as the budding literature applying simplicial complexes to neural data, from electrophysiological recordings in animal models to hemodynamic fluctuations in humans. Based on the exceptional flexibility of the tools and recent ground-breaking insights into neural function, we posit that this framework has the potential to eclipse graph theory in unraveling the fundamental mysteries of cognition.
Collapse
Affiliation(s)
- Chad Giusti
- Department of Mathematics, University of Pennsylvania, Philadelphia, PA, 19104, USA
- Department of Bioengineering, University of Pennsylvania, Philadelphia, PA, 19104, USA
| | - Robert Ghrist
- Department of Mathematics, University of Pennsylvania, Philadelphia, PA, 19104, USA
- Department of Electrical & Systems Engineering, University of Pennsylvania, Philadelphia, PA, 19104, USA
| | - Danielle S Bassett
- Department of Bioengineering, University of Pennsylvania, Philadelphia, PA, 19104, USA.
- Department of Electrical & Systems Engineering, University of Pennsylvania, Philadelphia, PA, 19104, USA.
| |
Collapse
|
42
|
Opron K, Xia K, Burton Z, Wei GW. Flexibility-rigidity index for protein-nucleic acid flexibility and fluctuation analysis. J Comput Chem 2016; 37:1283-95. [PMID: 26927815 PMCID: PMC5844491 DOI: 10.1002/jcc.24320] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2015] [Revised: 12/02/2015] [Accepted: 01/17/2016] [Indexed: 12/29/2022]
Abstract
Protein-nucleic acid complexes are important for many cellular processes including the most essential functions such as transcription and translation. For many protein-nucleic acid complexes, flexibility of both macromolecules has been shown to be critical for specificity and/or function. The flexibility-rigidity index (FRI) has been proposed as an accurate and efficient approach for protein flexibility analysis. In this article, we introduce FRI for the flexibility analysis of protein-nucleic acid complexes. We demonstrate that a multiscale strategy, which incorporates multiple kernels to capture various length scales in biomolecular collective motions, is able to significantly improve the state of art in the flexibility analysis of protein-nucleic acid complexes. We take the advantage of the high accuracy and O(N) computational complexity of our multiscale FRI method to investigate the flexibility of ribosomal subunits, which are difficult to analyze by alternative approaches. An anisotropic FRI approach, which involves localized Hessian matrices, is utilized to study the translocation dynamics in an RNA polymerase.
Collapse
Affiliation(s)
- Kristopher Opron
- Department of Biochemistry and Molecular Biology, Michigan State University, MI 48824, USA
| | - Kelin Xia
- Department of Mathematics Michigan State University, MI 48824, USA
| | - Zach Burton
- Department of Biochemistry and Molecular Biology, Michigan State University, MI 48824, USA
| | - Guo-Wei Wei
- Mathematical Biosciences Institute The Ohio State University, Columbus, Ohio 43210, USA
| |
Collapse
|
43
|
Abstract
Persistent homology provides a new approach for the topological simplification of big data via measuring the life time of intrinsic topological features in a filtration process and has found its success in scientific and engineering applications. However, such a success is essentially limited to qualitative data classification and analysis. Indeed, persistent homology has rarely been employed for quantitative modeling and prediction. Additionally, the present persistent homology is a passive tool, rather than a proactive technique, for classification and analysis. In this work, we outline a general protocol to construct object-oriented persistent homology methods. By means of differential geometry theory of surfaces, we construct an objective functional, namely, a surface free energy defined on the data of interest. The minimization of the objective functional leads to a Laplace-Beltrami operator which generates a multiscale representation of the initial data and offers an objective oriented filtration process. The resulting differential geometry based object-oriented persistent homology is able to preserve desirable geometric features in the evolutionary filtration and enhances the corresponding topological persistence. The cubical complex based homology algorithm is employed in the present work to be compatible with the Cartesian representation of the Laplace-Beltrami flow. The proposed Laplace-Beltrami flow based persistent homology method is extensively validated. The consistence between Laplace-Beltrami flow based filtration and Euclidean distance based filtration is confirmed on the Vietoris-Rips complex for a large amount of numerical tests. The convergence and reliability of the present Laplace-Beltrami flow based cubical complex filtration approach are analyzed over various spatial and temporal mesh sizes. The Laplace-Beltrami flow based persistent homology approach is utilized to study the intrinsic topology of proteins and fullerene molecules. Based on a quantitative model which correlates the topological persistence of fullerene central cavity with the total curvature energy of the fullerene structure, the proposed method is used for the prediction of fullerene isomer stability. The efficiency and robustness of the present method are verified by more than 500 fullerene molecules. It is shown that the proposed persistent homology based quantitative model offers good predictions of total curvature energies for ten types of fullerene isomers. The present work offers the first example to design object-oriented persistent homology to enhance or preserve desirable features in the original data during the filtration process and then automatically detect or extract the corresponding topological traits from the data.
Collapse
Affiliation(s)
- Bao Wang
- Department of Mathematics Michigan State University, MI 48824, USA
| | - Guo-Wei Wei
- Mathematical Biosciences Institute, The Ohio State University, Columbus, Ohio 43210, USA
| |
Collapse
|
44
|
Xia K, Zhao Z, Wei GW. Multiresolution persistent homology for excessively large biomolecular datasets. J Chem Phys 2015; 143:134103. [PMID: 26450288 PMCID: PMC4592433 DOI: 10.1063/1.4931733] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2015] [Accepted: 09/08/2015] [Indexed: 12/21/2022] Open
Abstract
Although persistent homology has emerged as a promising tool for the topological simplification of complex data, it is computationally intractable for large datasets. We introduce multiresolution persistent homology to handle excessively large datasets. We match the resolution with the scale of interest so as to represent large scale datasets with appropriate resolution. We utilize flexibility-rigidity index to access the topological connectivity of the data set and define a rigidity density for the filtration analysis. By appropriately tuning the resolution of the rigidity density, we are able to focus the topological lens on the scale of interest. The proposed multiresolution topological analysis is validated by a hexagonal fractal image which has three distinct scales. We further demonstrate the proposed method for extracting topological fingerprints from DNA molecules. In particular, the topological persistence of a virus capsid with 273 780 atoms is successfully analyzed which would otherwise be inaccessible to the normal point cloud method and unreliable by using coarse-grained multiscale persistent homology. The proposed method has also been successfully applied to the protein domain classification, which is the first time that persistent homology is used for practical protein domain analysis, to our knowledge. The proposed multiresolution topological method has potential applications in arbitrary data sets, such as social networks, biological networks, and graphs.
Collapse
Affiliation(s)
- Kelin Xia
- Department of Mathematics, Michigan State University, East Lansing, Michigan 48824, USA
| | - Zhixiong Zhao
- Department of Mathematics, Michigan State University, East Lansing, Michigan 48824, USA
| | - Guo-Wei Wei
- Department of Mathematics, Michigan State University, East Lansing, Michigan 48824, USA
| |
Collapse
|
45
|
Xia K, Wei GW. Persistent topology for cryo-EM data analysis. INTERNATIONAL JOURNAL FOR NUMERICAL METHODS IN BIOMEDICAL ENGINEERING 2015; 31:n/a-n/a. [PMID: 25851063 DOI: 10.1002/cnm.2719] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/23/2014] [Revised: 03/13/2015] [Accepted: 03/31/2015] [Indexed: 06/04/2023]
Abstract
In this work, we introduce persistent homology for the analysis of cryo-electron microscopy (cryo-EM) density maps. We identify the topological fingerprint or topological signature of noise, which is widespread in cryo-EM data. For low signal-to-noise ratio (SNR) volumetric data, intrinsic topological features of biomolecular structures are indistinguishable from noise. To remove noise, we employ geometric flows that are found to preserve the intrinsic topological fingerprints of cryo-EM structures and diminish the topological signature of noise. In particular, persistent homology enables us to visualize the gradual separation of the topological fingerprints of cryo-EM structures from those of noise during the denoising process, which gives rise to a practical procedure for prescribing a noise threshold to extract cryo-EM structure information from noise contaminated data after certain iterations of the geometric flow equation. To further demonstrate the utility of persistent homology for cryo-EM data analysis, we consider a microtubule intermediate structure Electron Microscopy Data (EMD 1129). Three helix models, an alpha-tubulin monomer model, an alpha-tubulin and beta-tubulin model, and an alpha-tubulin and beta-tubulin dimer model, are constructed to fit the cryo-EM data. The least square fitting leads to similarly high correlation coefficients, which indicates that structure determination via optimization is an ill-posed inverse problem. However, these models have dramatically different topological fingerprints. Especially, linkages or connectivities that discriminate one model from another, play little role in the traditional density fitting or optimization but are very sensitive and crucial to topological fingerprints. The intrinsic topological features of the microtubule data are identified after topological denoising. By a comparison of the topological fingerprints of the original data and those of three models, we found that the third model is topologically favored. The present work offers persistent homology based new strategies for topological denoising and for resolving ill-posed inverse problems.
Collapse
Affiliation(s)
- Kelin Xia
- Department of Mathematics, Michigan State University, MI 48824, USA
| | - Guo-Wei Wei
- Department of Mathematics, Michigan State University, MI 48824, USA
- Department of Electrical and Computer Engineering, Michigan State University, MI 48824, USA
- Department of Biochemistry and Molecular Biology, Michigan State University, MI 48824, USA
| |
Collapse
|
46
|
Xia K, Wei GW. Multidimensional persistence in biomolecular data. J Comput Chem 2015; 36:1502-20. [PMID: 26032339 PMCID: PMC4485576 DOI: 10.1002/jcc.23953] [Citation(s) in RCA: 38] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2014] [Revised: 04/02/2015] [Accepted: 04/19/2015] [Indexed: 12/24/2022]
Abstract
Persistent homology has emerged as a popular technique for the topological simplification of big data, including biomolecular data. Multidimensional persistence bears considerable promise to bridge the gap between geometry and topology. However, its practical and robust construction has been a challenge. We introduce two families of multidimensional persistence, namely pseudomultidimensional persistence and multiscale multidimensional persistence. The former is generated via the repeated applications of persistent homology filtration to high-dimensional data, such as results from molecular dynamics or partial differential equations. The latter is constructed via isotropic and anisotropic scales that create new simiplicial complexes and associated topological spaces. The utility, robustness, and efficiency of the proposed topological methods are demonstrated via protein folding, protein flexibility analysis, the topological denoising of cryoelectron microscopy data, and the scale dependence of nanoparticles. Topological transition between partial folded and unfolded proteins has been observed in multidimensional persistence. The separation between noise topological signatures and molecular topological fingerprints is achieved by the Laplace-Beltrami flow. The multiscale multidimensional persistent homology reveals relative local features in Betti-0 invariants and the relatively global characteristics of Betti-1 and Betti-2 invariants.
Collapse
Affiliation(s)
- Kelin Xia
- Department of Mathematics, Michigan State University, MI 48824, USA
| | - Guo-Wei Wei
- Department of Mathematics, Michigan State University, MI 48824, USA
- Department of Electrical and Computer Engineering, Michigan State University, MI 48824, USA
- Department of Biochemistry and Molecular Biology, Michigan State University, MI 48824, USA
| |
Collapse
|
47
|
Abstract
Persistent homology has been advocated as a new strategy for the topological simplification of complex data. However, it is computationally intractable for large data sets. In this work, we introduce multiresolution persistent homology for tackling large datasets. Our basic idea is to match the resolution with the scale of interest so as to create a topological microscopy for the underlying data. We adjust the resolution via a rigidity density-based filtration. The proposed multiresolution topological analysis is validated by the study of a complex RNA molecule.
Collapse
Affiliation(s)
- Kelin Xia
- 1 Department of Mathematics, Michigan State University , East Lansing, Michigan
| | - Zhixiong Zhao
- 1 Department of Mathematics, Michigan State University , East Lansing, Michigan
| | - Guo-Wei Wei
- 1 Department of Mathematics, Michigan State University , East Lansing, Michigan.,2 Department of Electrical and Computer Engineering, Michigan State University , East Lansing, Michigan.,3 Department of Biochemistry and Molecular Biology, Michigan State University , East Lansing, Michigan
| |
Collapse
|