1
|
Miotto M, Warner N, Ruocco G, Tartaglia GG, Scherman OA, Milanetti E. Osmolyte-induced protein stability changes explained by graph theory. Comput Struct Biotechnol J 2024; 23:4077-4087. [PMID: 39660214 PMCID: PMC11630646 DOI: 10.1016/j.csbj.2024.10.014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2024] [Revised: 10/07/2024] [Accepted: 10/08/2024] [Indexed: 12/12/2024] Open
Abstract
Enhanced stabilization of protein structures via the presence of inert osmolytes is a key mechanism adopted both by physiological systems and in biotechnological applications. While the intrinsic stability of proteins is ultimately fixed by their amino acid composition and organization, the interactions between osmolytes and proteins together with their concentrations introduce an additional layer of complexity and in turn, a method of modulating protein stability. Here, we combined experimental measurements with molecular dynamics simulations and graph-theory-based analyses to predict the stabilizing/destabilizing effects of different kinds of osmolytes on proteins during heat-mediated denaturation. We found that (i) proteins in solution with stability-enhancing osmolytes tend to have more compact interaction networks than those assumed in the presence of destabilizing osmolytes; (ii) a strong negative correlation (R = -0.85) characterizes the relationship between the melting temperatureT m and the preferential interaction coefficient defined by the radial distribution functions of osmolytes and water around the protein and (iii) a positive correlation exists between osmolyte-osmolyte clustering and the extent of preferential exclusion from the local domain of the protein, suggesting that exclusion may be driven by enhanced steric hindrance of aggregated osmolytes.
Collapse
Affiliation(s)
- Mattia Miotto
- Center for Life Nano & Neuro Science, Istituto Italiano di Tecnologia, Viale Regina Elena 291, 00161, Rome, Italy
| | - Nina Warner
- Melville Laboratory for Polymer Synthesis, Yusuf Hamied Department of Chemistry, Lensfield Road, Cambridge CB2 1EW, United Kingdom
| | - Giancarlo Ruocco
- Center for Life Nano & Neuro Science, Istituto Italiano di Tecnologia, Viale Regina Elena 291, 00161, Rome, Italy
- Department of Physics, Sapienza University, Piazzale Aldo Moro 5, 00185, Rome, Italy
| | - Gian Gaetano Tartaglia
- Center for Life Nano & Neuro Science, Istituto Italiano di Tecnologia, Viale Regina Elena 291, 00161, Rome, Italy
- Department of Biology, Sapienza University, Piazzale Aldo Moro 5, 00185, Rome, Italy
| | - Oren A. Scherman
- Melville Laboratory for Polymer Synthesis, Yusuf Hamied Department of Chemistry, Lensfield Road, Cambridge CB2 1EW, United Kingdom
| | - Edoardo Milanetti
- Center for Life Nano & Neuro Science, Istituto Italiano di Tecnologia, Viale Regina Elena 291, 00161, Rome, Italy
- Department of Physics, Sapienza University, Piazzale Aldo Moro 5, 00185, Rome, Italy
| |
Collapse
|
2
|
Long Y, Donald BR. Predicting Affinity Through Homology (PATH): Interpretable Binding Affinity Prediction with Persistent Homology. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.11.16.567384. [PMID: 38014181 PMCID: PMC10680814 DOI: 10.1101/2023.11.16.567384] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/29/2023]
Abstract
Accurate binding affinity prediction is crucial to structure-based drug design. Recent work used computational topology to obtain an effective representation of protein-ligand interactions. While algorithms using algebraic topology have proven useful in predicting properties of biomolecules, previous algorithms employed uninterpretable machine learning models which failed to explain the underlying geometric and topological features that drive accurate binding affinity prediction. Moreover, they had high computational complexity which made them intractable for large proteins. We present the fastest known algorithm to compute persistent homology features for protein-ligand complexes using opposition distance, with a runtime that is independent of the protein size. Then, we exploit these features in a novel, interpretable algorithm to predict protein-ligand binding affinity. Our algorithm achieves interpretability through an effective embedding of distances across bipartite matchings of the protein and ligand atoms into real-valued functions by summing Gaussians centered at features constructed by persistent homology. We name these functions internuclear persistent contours (IPCs) . Next, we introduce persistence fingerprints , a vector with 10 components that sketches the distances of different bipartite matching between protein and ligand atoms, refined from IPCs. Let the number of protein atoms in the protein-ligand complex be n , number of ligand atoms be m , and ω ≈ 2.4 be the matrix multiplication exponent. We show that for any 0 < ε < 1, after an 𝒪 ( mn log( mn )) preprocessing procedure, we can compute an ε -accurate approximation to the persistence fingerprint in 𝒪 ( m log 6 ω ( m/ε )) time, independent of protein size. This is an improvement in time complexity by a factor of 𝒪 (( m + n ) 3 ) over any previous binding affinity prediction that uses persistent homology. We show that the representational power of persistence fingerprint generalizes to protein-ligand binding datasets beyond the training dataset. Then, we introduce PATH , Predicting Affinity Through Homology, a two-part algorithm consisting of PATH + and PATH - . PATH + is an interpretable, small ensemble of shallow regression trees for binding affinity prediction from persistence fingerprints. We show that despite using 1,400-fold fewer features, PATH + has comparable performance to a previous state-of-the-art binding affinity prediction algorithm that uses persistent homology. Moreover, PATH + has the advantage of being interpretable. We visualize the features captured by persistence fingerprint for variant HIV-1 protease complexes and show that persistence fingerprint captures binding-relevant structural mutations. PATH - , in turn, uses regression trees over IPCs to differentiate between binding and decoy complexes. Finally, we benchmarked PATH versus established binding affinity prediction algorithms spanning physics-based, knowledge-based, and deep learning methods, revealing that PATH has comparable or better performance with less overfitting, compared to these state-of-the-art methods. The source code for PATH is released open-source as part of the osprey protein design software package.
Collapse
|
3
|
Arango AS, Park H, Tajkhorshid E. Topological Learning Approach to Characterizing Biological Membranes. J Chem Inf Model 2024; 64:5242-5252. [PMID: 38912752 DOI: 10.1021/acs.jcim.4c00552] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/25/2024]
Abstract
Biological membranes play key roles in cellular compartmentalization, structure, and its signaling pathways. At varying temperatures, individual membrane lipids sample from different configurations, a process that frequently leads to higher-order phase behavior and phenomena. Here, we present a persistent homology (PH)-based method for quantifying the structural features of individual and bulk lipids, providing local and contextual information on lipid tail organization. Our method leverages the mathematical machinery of algebraic topology and machine learning to infer temperature-dependent structural information on lipids from static coordinates. To train our model, we generated multiple molecular dynamics trajectories of dipalmitoyl-phosphatidylcholine membranes at varying temperatures. A fingerprint was then constructed for each set of lipid coordinates by PH filtration, in which interaction spheres were grown around the lipid atoms while tracking their intersections. The sphere filtration formed a simplicial complex that captures enduring key topological features of the configuration landscape using homology, yielding persistence data. Following fingerprint extraction for physiologically relevant temperatures, the persistence data were used to train an attention-based neural network for assignment of effective temperature values to selected membrane regions. Our persistence homology-based method captures the local structural effects, via effective temperature, of lipids adjacent to other membrane constituents, e.g., sterols and proteins. This topological learning approach can predict lipid effective temperatures from static coordinates across multiple spatial resolutions. The tool, called MembTDA, can be accessed at https://github.com/hyunp2/Memb-TDA.
Collapse
Affiliation(s)
- Andres S Arango
- Theoretical and Computational Biophysics Group, NIH Resource Center for Macromolecular Modeling and Visualization, Beckman Institute for Advanced Science and Technology, Department of Biochemistry, and Center for Biophysics and Quantitative Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States
| | - Hyun Park
- Theoretical and Computational Biophysics Group, NIH Resource Center for Macromolecular Modeling and Visualization, Beckman Institute for Advanced Science and Technology, Department of Biochemistry, and Center for Biophysics and Quantitative Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States
| | - Emad Tajkhorshid
- Theoretical and Computational Biophysics Group, NIH Resource Center for Macromolecular Modeling and Visualization, Beckman Institute for Advanced Science and Technology, Department of Biochemistry, and Center for Biophysics and Quantitative Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States
| |
Collapse
|
4
|
Sun X, Yang S, Wu Z, Su J, Hu F, Chang F, Li C. PMSPcnn: Predicting protein stability changes upon single point mutations with convolutional neural network. Structure 2024; 32:838-848.e3. [PMID: 38508191 DOI: 10.1016/j.str.2024.02.016] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2023] [Revised: 12/19/2023] [Accepted: 02/22/2024] [Indexed: 03/22/2024]
Abstract
Protein missense mutations and resulting protein stability changes are important causes for many human genetic diseases. However, the accurate prediction of stability changes due to mutations remains a challenging problem. To address this problem, we have developed an unbiased effective model: PMSPcnn that is based on a convolutional neural network. We have included an anti-symmetry property to build a balanced training dataset, which improves the prediction, in particular for stabilizing mutations. Persistent homology, which is an effective approach for characterizing protein structures, is used to obtain topological features. Additionally, a regression stratification cross-validation scheme has been proposed to improve the prediction for mutations with extreme ΔΔG. For three test datasets: Ssym, p53, and myoglobin, PMSPcnn achieves a better performance than currently existing predictors. PMSPcnn also outperforms currently available methods for membrane proteins. Overall, PMSPcnn is a promising method for the prediction of protein stability changes caused by single point mutations.
Collapse
Affiliation(s)
- Xiaohan Sun
- College of Chemistry and Life Science, Beijing University of Technology, Beijing 100124, China
| | - Shuang Yang
- College of Chemistry and Life Science, Beijing University of Technology, Beijing 100124, China
| | - Zhixiang Wu
- College of Chemistry and Life Science, Beijing University of Technology, Beijing 100124, China
| | - Jingjie Su
- College of Chemistry and Life Science, Beijing University of Technology, Beijing 100124, China
| | - Fangrui Hu
- College of Chemistry and Life Science, Beijing University of Technology, Beijing 100124, China
| | - Fubin Chang
- College of Chemistry and Life Science, Beijing University of Technology, Beijing 100124, China
| | - Chunhua Li
- College of Chemistry and Life Science, Beijing University of Technology, Beijing 100124, China.
| |
Collapse
|
5
|
Maurya A, Stanley RJ, Lama N, Nambisan AK, Patel G, Saeed D, Swinfard S, Smith C, Jagannathan S, Hagerty JR, Stoecker WV. Hybrid Topological Data Analysis and Deep Learning for Basal Cell Carcinoma Diagnosis. JOURNAL OF IMAGING INFORMATICS IN MEDICINE 2024; 37:92-106. [PMID: 38343238 PMCID: PMC10976904 DOI: 10.1007/s10278-023-00924-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/15/2023] [Revised: 10/04/2023] [Accepted: 10/05/2023] [Indexed: 03/02/2024]
Abstract
A critical clinical indicator for basal cell carcinoma (BCC) is the presence of telangiectasia (narrow, arborizing blood vessels) within the skin lesions. Many skin cancer imaging processes today exploit deep learning (DL) models for diagnosis, segmentation of features, and feature analysis. To extend automated diagnosis, recent computational intelligence research has also explored the field of Topological Data Analysis (TDA), a branch of mathematics that uses topology to extract meaningful information from highly complex data. This study combines TDA and DL with ensemble learning to create a hybrid TDA-DL BCC diagnostic model. Persistence homology (a TDA technique) is implemented to extract topological features from automatically segmented telangiectasia as well as skin lesions, and DL features are generated by fine-tuning a pre-trained EfficientNet-B5 model. The final hybrid TDA-DL model achieves state-of-the-art accuracy of 97.4% and an AUC of 0.995 on a holdout test of 395 skin lesions for BCC diagnosis. This study demonstrates that telangiectasia features improve BCC diagnosis, and TDA techniques hold the potential to improve DL performance.
Collapse
Affiliation(s)
- Akanksha Maurya
- Missouri University of Science &Technology, Rolla, MO, 65209, USA
| | - R Joe Stanley
- Missouri University of Science &Technology, Rolla, MO, 65209, USA.
| | - Norsang Lama
- Missouri University of Science &Technology, Rolla, MO, 65209, USA
| | - Anand K Nambisan
- Missouri University of Science &Technology, Rolla, MO, 65209, USA
| | | | | | | | - Colin Smith
- A.T. Still, University of Health Sciences, Kirksville, MO, USA
| | | | | | | |
Collapse
|
6
|
Seo J, Singh R, Ryu J, Choi JH. Molecular Aggregation Behavior and Microscopic Heterogeneity in Binary Osmolyte-Water Solutions. J Chem Inf Model 2024; 64:138-149. [PMID: 37983534 DOI: 10.1021/acs.jcim.3c01382] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2023]
Abstract
Osmolytes, small organic compounds, play a key role in modulating the protein stability in aqueous solutions, but the operating mechanism of the osmolyte remains inconclusive. Here, we attempt to clarify the mode of osmolyte action by quantitatively estimating the microheterogeneity of osmolyte-water mixtures with the aid of molecular dynamics simulation, graph theoretical analysis, and spatial distribution measurement in the four osmolyte solutions of trimethylamine-N-oxide (TMAO), tetramethylurea (TMU), dimethyl sulfoxide, and urea. TMAO, acting as a protecting osmolyte, tends to remain isolated with no formation of osmolyte aggregates while preferentially interacting with water, but there is a strong aggregation propensity in the denaturant TMU solution, characterized by favored hydrophobic interactions between TMU molecules. Taken together, the mechanism of osmolyte action on protein stability is proposed as a comprehensive one that encompasses the direct interactions between osmolytes and proteins and indirect interactions through the regulation of water properties in the osmolyte-water mixtures.
Collapse
Affiliation(s)
- Jiwon Seo
- Department of Chemistry, Gwangju Institute of Science and Technology (GIST), 123 Cheomdangwagi-ro, Buk-gu, Gwangju 61005, Republic of Korea
| | - Ravi Singh
- Department of Chemistry, Gwangju Institute of Science and Technology (GIST), 123 Cheomdangwagi-ro, Buk-gu, Gwangju 61005, Republic of Korea
| | - Jonghyuk Ryu
- Department of Chemistry, Gwangju Institute of Science and Technology (GIST), 123 Cheomdangwagi-ro, Buk-gu, Gwangju 61005, Republic of Korea
| | - Jun-Ho Choi
- Department of Chemistry, Gwangju Institute of Science and Technology (GIST), 123 Cheomdangwagi-ro, Buk-gu, Gwangju 61005, Republic of Korea
| |
Collapse
|
7
|
Wee J, Bianconi G, Xia K. Persistent Dirac for molecular representation. Sci Rep 2023; 13:11183. [PMID: 37433870 DOI: 10.1038/s41598-023-37853-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2023] [Accepted: 06/28/2023] [Indexed: 07/13/2023] Open
Abstract
Molecular representations are of fundamental importance for the modeling and analysing molecular systems. The successes in drug design and materials discovery have been greatly contributed by molecular representation models. In this paper, we present a computational framework for molecular representation that is mathematically rigorous and based on the persistent Dirac operator. The properties of the discrete weighted and unweighted Dirac matrix are systematically discussed, and the biological meanings of both homological and non-homological eigenvectors are studied. We also evaluate the impact of various weighting schemes on the weighted Dirac matrix. Additionally, a set of physical persistent attributes that characterize the persistence and variation of spectrum properties of Dirac matrices during a filtration process is proposed to be molecular fingerprints. Our persistent attributes are used to classify molecular configurations of nine different types of organic-inorganic halide perovskites. The combination of persistent attributes with gradient boosting tree model has achieved great success in molecular solvation free energy prediction. The results show that our model is effective in characterizing the molecular structures, demonstrating the power of our molecular representation and featurization approach.
Collapse
Affiliation(s)
- Junjie Wee
- Division of Mathematical Sciences, School of Physical and Mathematical Sciences, Nanyang Technological University, Singapore, 637371, Singapore.
| | - Ginestra Bianconi
- School of Mathematical Sciences, Queen Mary University of London, London, E1 4NS, UK
- The Alan Turing Institute, London, NW1 2DB, UK
| | - Kelin Xia
- Division of Mathematical Sciences, School of Physical and Mathematical Sciences, Nanyang Technological University, Singapore, 637371, Singapore
| |
Collapse
|
8
|
Xia K, Liu X, Wee J. Persistent Homology for RNA Data Analysis. Methods Mol Biol 2023; 2627:211-229. [PMID: 36959450 DOI: 10.1007/978-1-0716-2974-1_12] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/25/2023]
Abstract
Molecular representations are of great importance for machine learning models in RNA data analysis. Essentially, efficient molecular descriptors or fingerprints that characterize the intrinsic structural and interactional information of RNAs can significantly boost the performance of all learning modeling. In this paper, we introduce two persistent models, including persistent homology and persistent spectral, for RNA structure and interaction representations and their applications in RNA data analysis. Different from traditional geometric and graph representations, persistent homology is built on simplicial complex, which is a generalization of graph models to higher-dimensional situations. Hypergraph is a further generalization of simplicial complexes and hypergraph-based embedded persistent homology has been proposed recently. Moreover, persistent spectral models, which combine filtration process with spectral models, including spectral graph, spectral simplicial complex, and spectral hypergraph, are proposed for molecular representation. The persistent attributes for RNAs can be obtained from these two persistent models and further combined with machine learning models for RNA structure, flexibility, dynamics, and function analysis.
Collapse
Affiliation(s)
- Kelin Xia
- Division of Mathematical Sciences, School of Physical and Mathematical Sciences, Nanyang Technological University, Singapore, Singapore.
| | - Xiang Liu
- Division of Mathematical Sciences, School of Physical and Mathematical Sciences, Nanyang Technological University, Singapore, Singapore
- Chern Institute of Mathematics and LPMC, Nankai University, Tianjin, China
| | - JunJie Wee
- Division of Mathematical Sciences, School of Physical and Mathematical Sciences, Nanyang Technological University, Singapore, Singapore
| |
Collapse
|
9
|
Seo J, Choi S, Singh R, Choi JH. Spatial Inhomogeneity and Molecular Aggregation behavior in Aqueous Binary Liquid Mixtures. J Mol Liq 2022. [DOI: 10.1016/j.molliq.2022.120949] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/05/2022]
|
10
|
Gao K, Wang R, Chen J, Cheng L, Frishcosy J, Huzumi Y, Qiu Y, Schluckbier T, Wei X, Wei GW. Methodology-Centered Review of Molecular Modeling, Simulation, and Prediction of SARS-CoV-2. Chem Rev 2022; 122:11287-11368. [PMID: 35594413 PMCID: PMC9159519 DOI: 10.1021/acs.chemrev.1c00965] [Citation(s) in RCA: 33] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
Despite tremendous efforts in the past two years, our understanding of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), virus-host interactions, immune response, virulence, transmission, and evolution is still very limited. This limitation calls for further in-depth investigation. Computational studies have become an indispensable component in combating coronavirus disease 2019 (COVID-19) due to their low cost, their efficiency, and the fact that they are free from safety and ethical constraints. Additionally, the mechanism that governs the global evolution and transmission of SARS-CoV-2 cannot be revealed from individual experiments and was discovered by integrating genotyping of massive viral sequences, biophysical modeling of protein-protein interactions, deep mutational data, deep learning, and advanced mathematics. There exists a tsunami of literature on the molecular modeling, simulations, and predictions of SARS-CoV-2 and related developments of drugs, vaccines, antibodies, and diagnostics. To provide readers with a quick update about this literature, we present a comprehensive and systematic methodology-centered review. Aspects such as molecular biophysics, bioinformatics, cheminformatics, machine learning, and mathematics are discussed. This review will be beneficial to researchers who are looking for ways to contribute to SARS-CoV-2 studies and those who are interested in the status of the field.
Collapse
Affiliation(s)
- Kaifu Gao
- Department
of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
| | - Rui Wang
- Department
of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
| | - Jiahui Chen
- Department
of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
| | - Limei Cheng
- Clinical
Pharmacology and Pharmacometrics, Bristol
Myers Squibb, Princeton, New Jersey 08536, United States
| | - Jaclyn Frishcosy
- Department
of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
| | - Yuta Huzumi
- Department
of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
| | - Yuchi Qiu
- Department
of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
| | - Tom Schluckbier
- Department
of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
| | - Xiaoqi Wei
- Department
of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
| | - Guo-Wei Wei
- Department
of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
- Department
of Electrical and Computer Engineering, Michigan State University, East Lansing, Michigan 48824, United States
- Department
of Biochemistry and Molecular Biology, Michigan
State University, East Lansing, Michigan 48824, United States
| |
Collapse
|
11
|
Tavanti F, Calzolari A. Multi-technique Approach to Unravel the (Dis)order in Amorphous Materials. ACS OMEGA 2022; 7:23255-23264. [PMID: 35847340 PMCID: PMC9280971 DOI: 10.1021/acsomega.2c01359] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
The concept of order in disordered materials is the key to controlling the mechanical, electrical, and chemical properties of amorphous compounds widely exploited in industrial applications and daily life. Rather, it is far from being understood. Here, we propose a multi-technique numerical approach to study the order/disorder of amorphous materials on both the short- and the medium-range scale. We combine the analysis of the disorder level based on chemical and physical features with their geometrical and topological properties, defining a previously unexplored interplay between the different techniques and the different order scales. We applied this scheme to amorphous GeSe and GeSeTe chalcogenides, showing a modulation of the internal disorder as a function of the stoichiometry and composition: Se-rich systems are less ordered than Ge-rich systems at the short- and medium-range length scales. The present approach can be easily applied to more complex systems containing three or more atom types without any a priori knowledge about the system chemical-physical features, giving a deep insight into the understanding of complex systems.
Collapse
|
12
|
Servis MJ, Sadhu B, Soderholm L, Clark AE. Amphiphile conformation impacts aggregate morphology and solution structure across multiple lengthscales. J Mol Liq 2022. [DOI: 10.1016/j.molliq.2021.117743] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
13
|
Wee J, Xia K. Forman persistent Ricci curvature (FPRC)-based machine learning models for protein-ligand binding affinity prediction. Brief Bioinform 2021; 22:6262241. [PMID: 33940588 DOI: 10.1093/bib/bbab136] [Citation(s) in RCA: 29] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2021] [Revised: 03/14/2021] [Accepted: 03/23/2021] [Indexed: 01/01/2023] Open
Abstract
Artificial intelligence (AI) techniques have already been gradually applied to the entire drug design process, from target discovery, lead discovery, lead optimization and preclinical development to the final three phases of clinical trials. Currently, one of the central challenges for AI-based drug design is molecular featurization, which is to identify or design appropriate molecular descriptors or fingerprints. Efficient and transferable molecular descriptors are key to the success of all AI-based drug design models. Here we propose Forman persistent Ricci curvature (FPRC)-based molecular featurization and feature engineering, for the first time. Molecular structures and interactions are modeled as simplicial complexes, which are generalization of graphs to their higher dimensional counterparts. Further, a multiscale representation is achieved through a filtration process, during which a series of nested simplicial complexes at different scales are generated. Forman Ricci curvatures (FRCs) are calculated on the series of simplicial complexes, and the persistence and variation of FRCs during the filtration process is defined as FPRC. Moreover, persistent attributes, which are FPRC-based functions and properties, are employed as molecular descriptors, and combined with machine learning models, in particular, gradient boosting tree (GBT). Our FPRC-GBT models are extensively trained and tested on three most commonly-used datasets, including PDBbind-2007, PDBbind-2013 and PDBbind-2016. It has been found that our results are better than the ones from machine learning models with traditional molecular descriptors.
Collapse
Affiliation(s)
- JunJie Wee
- Division of Mathematical Sciences, School of Physical and Mathematical Sciences, Nanyang Technological University, Singapore 637371
| | - Kelin Xia
- Division of Mathematical Sciences, School of Physical and Mathematical Sciences, Nanyang Technological University, Singapore 637371
| |
Collapse
|
14
|
Using persistent homology as preprocessing of early warning signals for critical transition in flood. Sci Rep 2021; 11:7234. [PMID: 33790391 PMCID: PMC8012601 DOI: 10.1038/s41598-021-86739-5] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2020] [Accepted: 03/19/2021] [Indexed: 11/09/2022] Open
Abstract
Flood early warning systems (FLEWSs) contribute remarkably to reducing economic and life losses during a flood. The theory of critical slowing down (CSD) has been successfully used as a generic indicator of early warning signals in various fields. A new tool called persistent homology (PH) was recently introduced for data analysis. PH employs a qualitative approach to assess a data set and provide new information on the topological features of the data set. In the present paper, we propose the use of PH as a preprocessing step to achieve a FLEWS through CSD. We test our proposal on water level data of the Kelantan River, which tends to flood nearly every year. The results suggest that the new information obtained by PH exhibits CSD and, therefore, can be used as a signal for a FLEWS. Further analysis of the signal, we manage to establish an early warning signal for ten of the twelve flood events recorded in the river; the two other events are detected on the first day of the flood. Finally, we compare our results with those of a FLEWS constructed directly from water level data and find that FLEWS via PH creates fewer false alarms than the conventional technique.
Collapse
|
15
|
Wee J, Xia K. Ollivier Persistent Ricci Curvature-Based Machine Learning for the Protein-Ligand Binding Affinity Prediction. J Chem Inf Model 2021; 61:1617-1626. [PMID: 33724038 DOI: 10.1021/acs.jcim.0c01415] [Citation(s) in RCA: 28] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Efficient molecular featurization is one of the major issues for machine learning models in drug design. Here, we propose a persistent Ricci curvature (PRC), in particular, Ollivier PRC (OPRC), for the molecular featurization and feature engineering, for the first time. The filtration process proposed in the persistent homology is employed to generate a series of nested molecular graphs. Persistence and variation of Ollivier Ricci curvatures on these nested graphs are defined as OPRC. Moreover, persistent attributes, which are statistical and combinatorial properties of OPRCs during the filtration process, are used as molecular descriptors and further combined with machine learning models, in particular, gradient boosting tree (GBT). Our OPRC-GBT model is used in the prediction of the protein-ligand binding affinity, which is one of the key steps in drug design. Based on three of the most commonly used data sets from the well-established protein-ligand binding databank, that is, PDBbind, we intensively test our model and compare with existing models. It has been found that our model can achieve the state-of-the-art results and has advantages over traditional molecular descriptors.
Collapse
Affiliation(s)
- JunJie Wee
- Division of Mathematical Sciences, School of Physical and Mathematical Sciences, Nanyang Technological University, 637371, Singapore
| | - Kelin Xia
- Division of Mathematical Sciences, School of Physical and Mathematical Sciences, Nanyang Technological University, 637371, Singapore
| |
Collapse
|
16
|
Higher-order structure of polymer melt described by persistent homology. Sci Rep 2021; 11:2274. [PMID: 33500448 PMCID: PMC7838420 DOI: 10.1038/s41598-021-80975-5] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2020] [Accepted: 12/21/2020] [Indexed: 11/21/2022] Open
Abstract
The optimal method of the polymer Materials Informatics (MI) has not been developed because the amorphous nature of the higher-order structure affects these properties. We have now tried to develop the polymer MI’s descriptor of the higher-order structure using persistent homology as the topological method. We have experimentally studied the influence of the MD simulation cell size as the higher-order structure of the polymer on its electrical properties important for a soft material sensor or actuator device. The all-atom MD simulation of the polymer has been calculated and the obtained atomic coordinate has been analyzed by the persistent homology. The change in the higher-order structure by different cell size simulations affects the dielectric constant, although these changes are not described by a radial distribution function (RDF). On the other hand, using the 2nd order persistent diagram (PD), it was found that when the cell size is small, the island-shaped distribution become smoother as the cell size increased. There is the same tendency for the condition of change in the monomer ratio, the polymer chain length or temperature. As a result, the persistent homology may express the higher-order structure generated by the MD simulation as a descriptor of the polymer MI.
Collapse
|
17
|
Pun CS, Yong BYS, Xia K. Weighted-persistent-homology-based machine learning for RNA flexibility analysis. PLoS One 2020; 15:e0237747. [PMID: 32822369 PMCID: PMC7446851 DOI: 10.1371/journal.pone.0237747] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2020] [Accepted: 08/01/2020] [Indexed: 12/22/2022] Open
Abstract
With the great significance of biomolecular flexibility in biomolecular dynamics and functional analysis, various experimental and theoretical models are developed. Experimentally, Debye-Waller factor, also known as B-factor, measures atomic mean-square displacement and is usually considered as an important measurement for flexibility. Theoretically, elastic network models, Gaussian network model, flexibility-rigidity model, and other computational models have been proposed for flexibility analysis by shedding light on the biomolecular inner topological structures. Recently, a topology-based machine learning model has been proposed. By using the features from persistent homology, this model achieves a remarkable high Pearson correlation coefficient (PCC) in protein B-factor prediction. Motivated by its success, we propose weighted-persistent-homology (WPH)-based machine learning (WPHML) models for RNA flexibility analysis. Our WPH is a newly-proposed model, which incorporate physical, chemical and biological information into topological measurements using a weight function. In particular, we use local persistent homology (LPH) to focus on the topological information of local regions. Our WPHML model is validated on a well-established RNA dataset, and numerical experiments show that our model can achieve a PCC of up to 0.5822. The comparison with the previous sequence-information-based learning models shows that a consistent improvement in performance by at least 10% is achieved in our current model.
Collapse
Affiliation(s)
- Chi Seng Pun
- Division of Mathematical Sciences, School of Physical and Mathematical Sciences, Nanyang Technological University, Singapore, Singapore
- * E-mail: (CSP); (KX)
| | - Brandon Yung Sin Yong
- Division of Mathematical Sciences, School of Physical and Mathematical Sciences, Nanyang Technological University, Singapore, Singapore
| | - Kelin Xia
- Division of Mathematical Sciences, School of Physical and Mathematical Sciences, Nanyang Technological University, Singapore, Singapore
- School of Biological Sciences, Nanyang Technological University, Singapore, Singapore
- * E-mail: (CSP); (KX)
| |
Collapse
|