1
|
Zhang Y, Shen C, Xia K. Multi-Cover Persistence (MCP)-based machine learning for polymer property prediction. Brief Bioinform 2024; 25:bbae465. [PMID: 39323091 PMCID: PMC11424509 DOI: 10.1093/bib/bbae465] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2024] [Revised: 08/07/2024] [Accepted: 09/05/2024] [Indexed: 09/27/2024] Open
Abstract
Accurate and efficient prediction of polymers properties is crucial for polymer design. Recently, data-driven artificial intelligence (AI) models have demonstrated great promise in polymers property analysis. Even with the great progresses, a pivotal challenge in all the AI-driven models remains to be the effective representation of molecules. Here we introduce Multi-Cover Persistence (MCP)-based molecular representation and featurization for the first time. Our MCP-based polymer descriptors are combined with machine learning models, in particular, Gradient Boosting Tree (GBT) models, for polymers property prediction. Different from all previous molecular representation, polymer molecular structure and interactions are represented as MCP, which utilizes Delaunay slices at different dimensions and Rhomboid tiling to characterize the complicated geometric and topological information within the data. Statistic features from the generated persistent barcodes are used as polymer descriptors, and further combined with GBT model. Our model has been extensively validated on polymer benchmark datasets. It has been found that our models can outperform traditional fingerprint-based models and has similar accuracy with geometric deep learning models. In particular, our model tends to be more effective on large-sized monomer structures, demonstrating the great potential of MCP in characterizing more complicated polymer data. This work underscores the potential of MCP in polymer informatics, presenting a novel perspective on molecular representation and its application in polymer science.
Collapse
Affiliation(s)
- Yipeng Zhang
- Division of Mathematical Sciences, School of Physical and Mathematical Sciences, Nanyang Technological University, Singapore 637371, Singapore
| | - Cong Shen
- Department of Mathematics, National University of Singapore, Singapore 119076, Singapore
| | - Kelin Xia
- Division of Mathematical Sciences, School of Physical and Mathematical Sciences, Nanyang Technological University, Singapore 637371, Singapore
| |
Collapse
|
2
|
Karki R, Gadiya Y, Gribbon P, Zaliani A. Pharmacophore-Based Machine Learning Model To Predict Ligand Selectivity for E3 Ligase Binders. ACS OMEGA 2023; 8:30177-30185. [PMID: 37636935 PMCID: PMC10448689 DOI: 10.1021/acsomega.3c02803] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/26/2023] [Accepted: 07/06/2023] [Indexed: 08/29/2023]
Abstract
E3 ligases are enzymes that play a critical role in ubiquitin-mediated protein degradation and are involved in various cellular processes. Pharmacophore analysis is a useful approach for predicting E3 ligase binding selectivity, which involves identifying key chemical features necessary for a ligand to interact with a specific protein target cavity. While pharmacophore analysis is not always sufficient to accurately predict ligand binding affinity, it can be a valuable tool for filtering and/or designing focused libraries for screening campaigns. In this study, we present a fast and an inexpensive approach using a pharmacophore fingerprinting scheme known as ErG, which is used in a multi-class machine learning classification model. This model can assign the correct E3 ligase binder to its known E3 ligase and predict the probability of each molecule to bind to different E3 ligases. Practical applications of this approach are demonstrated on commercial libraries such as Asinex for the rational design of E3 ligase binders. The scripts and data associated with this study can be found on GitHub at https://github.com/Fraunhofer-ITMP/E3_binder_Model.
Collapse
Affiliation(s)
- Reagon Karki
- Fraunhofer
Institute for Translational Medicine and Pharmacology (ITMP), Schnackenburgallee 114, 22525 Hamburg, Germany
- Fraunhofer
Cluster of Excellence for Immune-Mediated Diseases (CIMD), Theodor Stern Kai 7, 60590 Frankfurt, Germany
| | - Yojana Gadiya
- Fraunhofer
Institute for Translational Medicine and Pharmacology (ITMP), Schnackenburgallee 114, 22525 Hamburg, Germany
- Fraunhofer
Cluster of Excellence for Immune-Mediated Diseases (CIMD), Theodor Stern Kai 7, 60590 Frankfurt, Germany
- Bonn-Aachen
International Center for Information Technology (B-IT), University of Bonn, 53113 Bonn, Germany
| | - Philip Gribbon
- Fraunhofer
Institute for Translational Medicine and Pharmacology (ITMP), Schnackenburgallee 114, 22525 Hamburg, Germany
- Fraunhofer
Cluster of Excellence for Immune-Mediated Diseases (CIMD), Theodor Stern Kai 7, 60590 Frankfurt, Germany
| | - Andrea Zaliani
- Fraunhofer
Institute for Translational Medicine and Pharmacology (ITMP), Schnackenburgallee 114, 22525 Hamburg, Germany
- Fraunhofer
Cluster of Excellence for Immune-Mediated Diseases (CIMD), Theodor Stern Kai 7, 60590 Frankfurt, Germany
| |
Collapse
|
3
|
Wee J, Bianconi G, Xia K. Persistent Dirac for molecular representation. Sci Rep 2023; 13:11183. [PMID: 37433870 DOI: 10.1038/s41598-023-37853-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2023] [Accepted: 06/28/2023] [Indexed: 07/13/2023] Open
Abstract
Molecular representations are of fundamental importance for the modeling and analysing molecular systems. The successes in drug design and materials discovery have been greatly contributed by molecular representation models. In this paper, we present a computational framework for molecular representation that is mathematically rigorous and based on the persistent Dirac operator. The properties of the discrete weighted and unweighted Dirac matrix are systematically discussed, and the biological meanings of both homological and non-homological eigenvectors are studied. We also evaluate the impact of various weighting schemes on the weighted Dirac matrix. Additionally, a set of physical persistent attributes that characterize the persistence and variation of spectrum properties of Dirac matrices during a filtration process is proposed to be molecular fingerprints. Our persistent attributes are used to classify molecular configurations of nine different types of organic-inorganic halide perovskites. The combination of persistent attributes with gradient boosting tree model has achieved great success in molecular solvation free energy prediction. The results show that our model is effective in characterizing the molecular structures, demonstrating the power of our molecular representation and featurization approach.
Collapse
Affiliation(s)
- Junjie Wee
- Division of Mathematical Sciences, School of Physical and Mathematical Sciences, Nanyang Technological University, Singapore, 637371, Singapore.
| | - Ginestra Bianconi
- School of Mathematical Sciences, Queen Mary University of London, London, E1 4NS, UK
- The Alan Turing Institute, London, NW1 2DB, UK
| | - Kelin Xia
- Division of Mathematical Sciences, School of Physical and Mathematical Sciences, Nanyang Technological University, Singapore, 637371, Singapore
| |
Collapse
|
4
|
Choo HY, Wee J, Shen C, Xia K. Fingerprint-Enhanced Graph Attention Network (FinGAT) Model for Antibiotic Discovery. J Chem Inf Model 2023; 63:2928-2935. [PMID: 37167016 DOI: 10.1021/acs.jcim.3c00045] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/13/2023]
Abstract
Artificial Intelligence (AI) techniques are of great potential to fundamentally change antibiotic discovery industries. Efficient and effective molecular featurization is key to all highly accurate learning models for antibiotic discovery. In this paper, we propose a fingerprint-enhanced graph attention network (FinGAT) model by the combination of sequence-based 2D fingerprints and structure-based graph representation. In our feature learning process, sequence information is transformed into a fingerprint vector, and structural information is encoded through a GAT module into another vector. These two vectors are concatenated and input into a multilayer perceptron (MLP) for antibiotic activity classification. Our model is extensively tested and compared with existing models. It has been found that our FinGAT can outperform various state-of-the-art GNN models in antibiotic discovery.
Collapse
Affiliation(s)
- Hou Yee Choo
- Division of Mathematical Sciences, School of Physical and Mathematical Sciences Nanyang Technological University, Singapore 637371
| | - JunJie Wee
- Division of Mathematical Sciences, School of Physical and Mathematical Sciences Nanyang Technological University, Singapore 637371
| | - Cong Shen
- Division of Mathematical Sciences, School of Physical and Mathematical Sciences Nanyang Technological University, Singapore 637371
- College of Computer Science and Electronic Engineering, Hunan University, Changsha 410083, China
| | - Kelin Xia
- Division of Mathematical Sciences, School of Physical and Mathematical Sciences Nanyang Technological University, Singapore 637371
| |
Collapse
|
5
|
Kumar S, Choudhary M. Structural and theoretical investigations, Hirshfeld surface analysis and anti-SARS CoV-2 of nickel (II) coordination complex. J Biomol Struct Dyn 2023; 41:402-422. [PMID: 34842499 DOI: 10.1080/07391102.2021.2006089] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
Abstract
A nickel(II) Schiff base complex, [Ni(L)(DMF)](1), was synthesized by treating NiCl2.6H2O with an ONS-donor Schiff base ligand(H2L) derived from the condensation 3,5-Dichlorosalicylaldehyde and 4,4-Dimethyl-3-thiosemicarbazide in DMF. The geometry around the center metal ion in [Ni(L)(DMF)](1) was square planar as revealed by the data collection from diffraction studies. DFT calculations were performed on the complex to get a structure-property relationship. Hirshfeld surface analysis was also carried out in the crystal structure of nickel (II) Schiff base complex. Additionally, inspiring from recent developments to find a potential inhibitor for SARS-CoV-2 virus, we have also performed molecular docking study of [Ni(L)(DMF)](1) to see if our novel complex show affinity for main protease (Mpro) of SARS-CoV-2 Mpro (PDB ID: 6LZE). Interestingly, the results are found quite encouraging where the binding affinity and inhibition constant was found to be -6.6 kcal/mol and 2.358 µM, respectively, for the best docked confirmation of complex [Ni(L)(DMF)](1) with Mpro protein. This binding affinity is reasonably well as compared to recently known antiviral drugs. For instance, the binding affinity of complex [Ni(L)(DMF)](1) is found to be better than that of recently docking results of anti-SARS-CoV-2 drugs like chloroquine (-6.293 kcal/mol), hydroxychloroquine (-5.573 kcal/mol) and remdesivir (-6.352 kcal/mol) when targeted to the active-site of SARS-CoV-2 Mpro. Besides this, molecular docking against G25K GTP-nucleotide binding protein (PDB ID: 1A4R) was also studied. We believe that current results can intrigue not only for the biomedical community but also for the materials chemists who are engaged to explore the application coordination complexes. Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Sunil Kumar
- Department of Chemistry, National Institute of Technology Patna, Patna, Bihar, India
| | - Mukesh Choudhary
- Department of Chemistry, National Institute of Technology Patna, Patna, Bihar, India
| |
Collapse
|
6
|
Liu J, Xia KL, Wu J, Yau SST, Wei GW. Biomolecular Topology: Modelling and Analysis. ACTA MATHEMATICA SINICA, ENGLISH SERIES 2022; 38:1901-1938. [PMID: 36407804 PMCID: PMC9640850 DOI: 10.1007/s10114-022-2326-5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/27/2022] [Accepted: 07/12/2022] [Indexed: 05/25/2023]
Abstract
With the great advancement of experimental tools, a tremendous amount of biomolecular data has been generated and accumulated in various databases. The high dimensionality, structural complexity, the nonlinearity, and entanglements of biomolecular data, ranging from DNA knots, RNA secondary structures, protein folding configurations, chromosomes, DNA origami, molecular assembly, to others at the macromolecular level, pose a severe challenge in their analysis and characterization. In the past few decades, mathematical concepts, models, algorithms, and tools from algebraic topology, combinatorial topology, computational topology, and topological data analysis, have demonstrated great power and begun to play an essential role in tackling the biomolecular data challenge. In this work, we introduce biomolecular topology, which concerns the topological problems and models originated from the biomolecular systems. More specifically, the biomolecular topology encompasses topological structures, properties and relations that are emerged from biomolecular structures, dynamics, interactions, and functions. We discuss the various types of biomolecular topology from structures (of proteins, DNAs, and RNAs), protein folding, and protein assembly. A brief discussion of databanks (and databases), theoretical models, and computational algorithms, is presented. Further, we systematically review related topological models, including graphs, simplicial complexes, persistent homology, persistent Laplacians, de Rham-Hodge theory, Yau-Hausdorff distance, and the topology-based machine learning models.
Collapse
Affiliation(s)
- Jian Liu
- School of Mathematical Sciences, Hebei Normal University, Shijiazhuang, 050024 P. R. China
- Yanqi Lake Beijing Institute of Mathematical Sciences and Applications, Beijing, 101408 P. R. China
| | - Ke-Lin Xia
- School of Physical and Mathematical Sciences, Nanyang Technological University, Singapore, 639798 Singapore
| | - Jie Wu
- Yanqi Lake Beijing Institute of Mathematical Sciences and Applications, Beijing, 101408 P. R. China
- Department of Mathematical Sciences, Tsinghua University, Beijing, 100084 P. R. China
| | - Stephen Shing-Toung Yau
- Yanqi Lake Beijing Institute of Mathematical Sciences and Applications, Beijing, 101408 P. R. China
- Department of Mathematical Sciences, Tsinghua University, Beijing, 100084 P. R. China
| | - Guo-Wei Wei
- Department of Mathematics & Department of Biochemistry and Molecular Biology & Department of Electrical and Computer Engineering, Michigan State University, Wells Hall 619 Red Cedar Road, East Lansing, MI 48824-1027 USA
| |
Collapse
|
7
|
Liu X, Feng H, Wu J, Xia K. Hom-Complex-Based Machine Learning (HCML) for the Prediction of Protein-Protein Binding Affinity Changes upon Mutation. J Chem Inf Model 2022; 62:3961-3969. [PMID: 36040839 DOI: 10.1021/acs.jcim.2c00580] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Protein-protein interactions (PPIs) are involved in almost all biological processes in the cell. Understanding protein-protein interactions holds the key for the understanding of biological functions, diseases and the development of therapeutics. Recently, artificial intelligence (AI) models have demonstrated great power in PPIs. However, a key issue for all AI-based PPI models is efficient molecular representations and featurization. Here, we propose Hom-complex-based PPI representation, and Hom-complex-based machine learning models for the prediction of PPI binding affinity changes upon mutation, for the first time. In our model, various Hom complexes Hom(G1, G) can be generated for the graph representation G of protein-protein complex by using different graphs G1, which reveal G1-related inner connections within the graph representation G of protein-protein complex. Further, for a specific graph G1, a series of nested Hom complexes are generated to give a multiscale characterization of the PPIs. Its persistent homology and persistent Euler characteristic are used as molecular descriptors and further combined with the machine learning model, in particular, gradient boosting tree (GBT). We systematically test our model on the two most-commonly used data sets, that is, SKEMPI and AB-Bind. It has been found that our model outperforms all the existing models as far as we know, which demonstrates the great potential of our model for the analysis of PPIs. Our model can be used for the analysis and design of efficient antibodies for SARS-CoV-2.
Collapse
Affiliation(s)
- Xiang Liu
- Chern Institute of Mathematics and LPMC, Nankai University, Tianjin, China, 300071.,Division of Mathematical Sciences, School of Physical and Mathematical Sciences Nanyang Technological University, Singapore 637371
| | - Huitao Feng
- Division of Mathematical Sciences, School of Physical and Mathematical Sciences Nanyang Technological University, Singapore 637371.,Mathematical Science Research Center, Chongqing University of Technology, Chongqing, China, 400054
| | - Jie Wu
- Yanqi Lake Beijing Institute of Mathematical Sciences and Applications (BIMSA), Beijing, China,101408
| | - Kelin Xia
- Division of Mathematical Sciences, School of Physical and Mathematical Sciences Nanyang Technological University, Singapore 637371
| |
Collapse
|
8
|
Gao K, Wang R, Chen J, Cheng L, Frishcosy J, Huzumi Y, Qiu Y, Schluckbier T, Wei X, Wei GW. Methodology-Centered Review of Molecular Modeling, Simulation, and Prediction of SARS-CoV-2. Chem Rev 2022; 122:11287-11368. [PMID: 35594413 PMCID: PMC9159519 DOI: 10.1021/acs.chemrev.1c00965] [Citation(s) in RCA: 33] [Impact Index Per Article: 16.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
Despite tremendous efforts in the past two years, our understanding of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), virus-host interactions, immune response, virulence, transmission, and evolution is still very limited. This limitation calls for further in-depth investigation. Computational studies have become an indispensable component in combating coronavirus disease 2019 (COVID-19) due to their low cost, their efficiency, and the fact that they are free from safety and ethical constraints. Additionally, the mechanism that governs the global evolution and transmission of SARS-CoV-2 cannot be revealed from individual experiments and was discovered by integrating genotyping of massive viral sequences, biophysical modeling of protein-protein interactions, deep mutational data, deep learning, and advanced mathematics. There exists a tsunami of literature on the molecular modeling, simulations, and predictions of SARS-CoV-2 and related developments of drugs, vaccines, antibodies, and diagnostics. To provide readers with a quick update about this literature, we present a comprehensive and systematic methodology-centered review. Aspects such as molecular biophysics, bioinformatics, cheminformatics, machine learning, and mathematics are discussed. This review will be beneficial to researchers who are looking for ways to contribute to SARS-CoV-2 studies and those who are interested in the status of the field.
Collapse
Affiliation(s)
- Kaifu Gao
- Department
of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
| | - Rui Wang
- Department
of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
| | - Jiahui Chen
- Department
of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
| | - Limei Cheng
- Clinical
Pharmacology and Pharmacometrics, Bristol
Myers Squibb, Princeton, New Jersey 08536, United States
| | - Jaclyn Frishcosy
- Department
of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
| | - Yuta Huzumi
- Department
of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
| | - Yuchi Qiu
- Department
of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
| | - Tom Schluckbier
- Department
of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
| | - Xiaoqi Wei
- Department
of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
| | - Guo-Wei Wei
- Department
of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
- Department
of Electrical and Computer Engineering, Michigan State University, East Lansing, Michigan 48824, United States
- Department
of Biochemistry and Molecular Biology, Michigan
State University, East Lansing, Michigan 48824, United States
| |
Collapse
|
9
|
Schauperl M, Denny RA. AI-Based Protein Structure Prediction in Drug Discovery: Impacts and Challenges. J Chem Inf Model 2022; 62:3142-3156. [PMID: 35727311 DOI: 10.1021/acs.jcim.2c00026] [Citation(s) in RCA: 36] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Proteins are the molecular machinery of the human body, and their malfunctioning is often responsible for diseases, making them crucial targets for drug discovery. The three-dimensional structure of a protein determines its biological function, its conformational state determines substrates, cofactors, and protein binding. Rational drug discovery employs engineered small molecules to selectively interact with proteins to modulate their function. To selectively target a protein and to design small molecules, knowing the protein structure with all its specific conformation is critical. Unfortunately, for a large number of proteins relevant for drug discovery, the three-dimensional structure has not yet been experimentally solved. Therefore, accurately predicting their structure based on their amino acid sequence is one of the grant challenges in biology. Recently, AlphaFold2, a machine learning application based on a deep neural network, was able to predict unknown structures of proteins with an unprecedented accuracy. Despite the impressive progress made by AlphaFold2, nature still challenges the field of structure prediction. In this Perspective, we explore how AlphaFold2 and related methods help make drug design more efficient. Furthermore, we discuss the roles of predicting domain-domain orientations, all relevant conformational states, the influence of posttranslational modifications, and conformational changes due to protein binding partners. We highlight where further improvements are needed for advanced machine learning methods to be successfully and frequently used in the pharmaceutical industry.
Collapse
Affiliation(s)
- Michael Schauperl
- Department of Computational Sciences HotSpot Therapeutics 50 Milk Street, Boston, Massachusetts 02110, United States
| | - Rajiah Aldrin Denny
- Department of Computational Sciences HotSpot Therapeutics 50 Milk Street, Boston, Massachusetts 02110, United States
| |
Collapse
|
10
|
Lu Q. Identifying molecular structural features by pattern recognition methods. RSC Adv 2022; 12:17559-17569. [PMID: 35765452 PMCID: PMC9192268 DOI: 10.1039/d2ra00764a] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2022] [Accepted: 06/06/2022] [Indexed: 11/21/2022] Open
Abstract
Identification of molecular structural features is a central part of computational chemistry. It would be beneficial if pattern recognition techniques could be incorporated to facilitate the identification. Currently, the quantification of the structural dissimilarity is mainly carried out by root-mean-square-deviation (RMSD) calculations such as in molecular dynamics simulations. However, the RMSD calculation underperforms for large molecules, showing the so-called "curse of dimensionality" problem. Also, it requires consistent ordering of atoms in two comparing structures, which needs nontrivial effort to fulfill. In this work, we propose to take advantage of the point cloud recognition using convex hulls as the basis to recognize molecular structural features. Two advantages of the method can be highlighted. First, the dimension of the input data structure is largely reduced from the number of atoms of molecules to the number of atoms of convex hulls. Therefore, the dimensionality curse problem is avoided, and the atom ordering process is saved. Second, the construction of convex hulls can be used to define new molecular descriptors, such as the contact area of molecular interactions. These new molecular descriptors have different properties from existing ones, therefore they are expected to exhibit different behaviors for certain machine learning studies. Several illustrative applications have been carried out, which provide promising results for structure-activity studies.
Collapse
Affiliation(s)
- Qing Lu
- Beijing National Laboratory for Molecular Sciences, Institute of Chemistry, Chinese Academy of Sciences Beijing 100190 China
| |
Collapse
|
11
|
Alfarraj A, Wei GW. Geometric algebra generation of molecular surfaces. J R Soc Interface 2022; 19:20220117. [PMID: 35414214 PMCID: PMC9006026 DOI: 10.1098/rsif.2022.0117] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Geometric algebra is a powerful framework that unifies mathematics and physics. Since its revival in the 1960s, it has attracted great attention and has been exploited in fields like physics, computer science and engineering. This work introduces a geometric algebra method for the molecular surface generation that uses the Clifford-Fourier transform (CFT) which is a generalization of the classical Fourier transform. Notably, the classical Fourier transform and CFT differ in the derivative property in [Formula: see text] for k even. This distinction is due to the non-commutativity of geometric product of pseudoscalars with multivectors and has significant consequences in applications. We use the CFT in [Formula: see text] to benefit from the derivative property in solving partial differential equations (PDEs). The CFT is used to solve the mode decomposition process in PDE transform. Two different initial cases are proposed to make the initial shapes in the present method. The proposed method is applied first to small molecules and proteins. To validate the method, the molecular surfaces generated are compared to surfaces of other definitions. Applications are considered to protein electrostatic surface potentials and solvation free energy. This work opens the door for further applications of geometric algebra and CFT in biological sciences.
Collapse
Affiliation(s)
- Azzam Alfarraj
- Department of Mathematics, Michigan State University, East Lansing, MI 48824, USA.,Department of Mathematics, King Fahd University of Petroleum and Minerals, Dhahran 31261, KSA
| | - Guo-Wei Wei
- Department of Mathematics, Michigan State University, East Lansing, MI 48824, USA.,Department of Electrical and Computer Engineering, Michigan State University, East Lansing, MI 48824, USA.,Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, MI 48824, USA
| |
Collapse
|
12
|
Liu X, Feng H, Wu J, Xia K. Dowker complex based machine learning (DCML) models for protein-ligand binding affinity prediction. PLoS Comput Biol 2022; 18:e1009943. [PMID: 35385478 PMCID: PMC8985993 DOI: 10.1371/journal.pcbi.1009943] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2021] [Accepted: 02/21/2022] [Indexed: 11/19/2022] Open
Abstract
With the great advancements in experimental data, computational power and learning algorithms, artificial intelligence (AI) based drug design has begun to gain momentum recently. AI-based drug design has great promise to revolutionize pharmaceutical industries by significantly reducing the time and cost in drug discovery processes. However, a major issue remains for all AI-based learning model that is efficient molecular representations. Here we propose Dowker complex (DC) based molecular interaction representations and Riemann Zeta function based molecular featurization, for the first time. Molecular interactions between proteins and ligands (or others) are modeled as Dowker complexes. A multiscale representation is generated by using a filtration process, during which a series of DCs are generated at different scales. Combinatorial (Hodge) Laplacian matrices are constructed from these DCs, and the Riemann zeta functions from their spectral information can be used as molecular descriptors. To validate our models, we consider protein-ligand binding affinity prediction. Our DC-based machine learning (DCML) models, in particular, DC-based gradient boosting tree (DC-GBT), are tested on three most-commonly used datasets, i.e., including PDBbind-2007, PDBbind-2013 and PDBbind-2016, and extensively compared with other existing state-of-the-art models. It has been found that our DC-based descriptors can achieve the state-of-the-art results and have better performance than all machine learning models with traditional molecular descriptors. Our Dowker complex based machine learning models can be used in other tasks in AI-based drug design and molecular data analysis. With the ever-increasing accumulation of chemical and biomolecular data, data-driven artificial intelligence (AI) models will usher in an era of faster, cheaper and more-efficient drug design and drug discovery. However, unlike image, text, video, audio data, molecular data from chemistry and biology, have much complicated three-dimensional structures, as well as physical and chemical properties. Efficient molecular representations and descriptors are key to the success of machine learning models in drug design. Here, we propose Dowker complex based molecular representation and Riemann Zeta function based molecular featurization, for the first time. To characterize the complicated molecular structures and interactions at the atomic level, Dowker complexes are constructed. Based on them, intrinsic mathematical invariants are derived and used as molecular descriptors, which can be further combined with machine learning and deep learning models. Our model has achieved state-of-the-art results in protein-ligand binding affinity prediction, demonstrating its great potential for other drug design and discovery problems.
Collapse
Affiliation(s)
- Xiang Liu
- Division of Mathematical Sciences, School of Physical and Mathematical Sciences, Nanyang Technological University, Singapore
- Chern Institute of Mathematics and LPMC, Nankai University, Tianjin, China
- Center for Topology and Geometry Based Technology, Hebei Normal University, Hebei, China
| | - Huitao Feng
- Chern Institute of Mathematics and LPMC, Nankai University, Tianjin, China
- Mathematical Science Research Center, Chongqing University of Technology, Chongqing, China
| | - Jie Wu
- Center for Topology and Geometry Based Technology, Hebei Normal University, Hebei, China
- School of Mathematical Sciences, Hebei Normal University, Hebei, China
| | - Kelin Xia
- Division of Mathematical Sciences, School of Physical and Mathematical Sciences, Nanyang Technological University, Singapore
- * E-mail:
| |
Collapse
|
13
|
Wee J, Xia K. Persistent spectral based ensemble learning (PerSpect-EL) for protein-protein binding affinity prediction. Brief Bioinform 2022; 23:6533501. [PMID: 35189639 DOI: 10.1093/bib/bbac024] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2021] [Revised: 12/30/2021] [Accepted: 01/17/2022] [Indexed: 12/14/2022] Open
Abstract
Protein-protein interactions (PPIs) play a significant role in nearly all cellular and biological activities. Data-driven machine learning models have demonstrated great power in PPIs. However, the design of efficient molecular featurization poses a great challenge for all learning models for PPIs. Here, we propose persistent spectral (PerSpect) based PPI representation and featurization, and PerSpect-based ensemble learning (PerSpect-EL) models for PPI binding affinity prediction, for the first time. In our model, a sequence of Hodge (or combinatorial) Laplacian (HL) matrices at various different scales are generated from a specially designed filtration process. PerSpect attributes, which are statistical and combinatorial properties of spectrum information from these HL matrices, are used as features for PPI characterization. Each PerSpect attribute is input into a 1D convolutional neural network (CNN), and these CNN networks are stacked together in our PerSpect-based ensemble learning models. We systematically test our model on the two most commonly used datasets, i.e. SKEMPI and AB-Bind. It has been found that our model can achieve state-of-the-art results and outperform all existing models to the best of our knowledge.
Collapse
Affiliation(s)
- JunJie Wee
- Division of Mathematical Sciences, School of Physical and Mathematical Sciences, Nanyang Technological University, Singapore 637371
| | - Kelin Xia
- Division of Mathematical Sciences, School of Physical and Mathematical Sciences, Nanyang Technological University, Singapore 637371
| |
Collapse
|
14
|
Jiang P, Chi Y, Li XS, Liu X, Hua XS, Xia K. Molecular persistent spectral image (Mol-PSI) representation for machine learning models in drug design. Brief Bioinform 2022; 23:6485012. [PMID: 34958660 DOI: 10.1093/bib/bbab527] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2021] [Revised: 11/01/2021] [Accepted: 11/14/2021] [Indexed: 01/05/2023] Open
Abstract
Artificial intelligence (AI)-based drug design has great promise to fundamentally change the landscape of the pharmaceutical industry. Even though there are great progress from handcrafted feature-based machine learning models, 3D convolutional neural networks (CNNs) and graph neural networks, effective and efficient representations that characterize the structural, physical, chemical and biological properties of molecular structures and interactions remain to be a great challenge. Here, we propose an equal-sized molecular 2D image representation, known as the molecular persistent spectral image (Mol-PSI), and combine it with CNN model for AI-based drug design. Mol-PSI provides a unique one-to-one image representation for molecular structures and interactions. In general, deep models are empowered to achieve better performance with systematically organized representations in image format. A well-designed parallel CNN architecture for adapting Mol-PSIs is developed for protein-ligand binding affinity prediction. Our results, for the three most commonly used databases, including PDBbind-v2007, PDBbind-v2013 and PDBbind-v2016, are better than all traditional machine learning models, as far as we know. Our Mol-PSI model provides a powerful molecular representation that can be widely used in AI-based drug design and molecular data analysis.
Collapse
Affiliation(s)
- Peiran Jiang
- Drug Discovery Intelligence, AI Center, Alibaba Group DAMO Academy, Wen Yi Xi Road, Yuhang District, Hangzhou City , 310000, Zhejiang, China
| | - Ying Chi
- Drug Discovery Intelligence, AI Center, Alibaba Group DAMO Academy, Wen Yi Xi Road, Yuhang District, Hangzhou City , 310000, Zhejiang, China
| | - Xiao-Shuang Li
- Drug Discovery Intelligence, AI Center, Alibaba Group DAMO Academy, Wen Yi Xi Road, Yuhang District, Hangzhou City , 310000, Zhejiang, China
| | - Xiang Liu
- Division of Mathematical Sciences, School of Physical and Mathematical Sciences, Nanyang Technological University, 637371, Singapore
- Chern Institute of Mathematics and LPMC, Nankai University, 300071, Tianjin, China
| | - Xian-Sheng Hua
- Drug Discovery Intelligence, AI Center, Alibaba Group DAMO Academy, Wen Yi Xi Road, Yuhang District, Hangzhou City , 310000, Zhejiang, China
| | - Kelin Xia
- Division of Mathematical Sciences, School of Physical and Mathematical Sciences, Nanyang Technological University, 637371, Singapore
| |
Collapse
|
15
|
Abstract
The biological significance of proteins attracted the scientific community in exploring their characteristics. The studies shed light on the interaction patterns and functions of proteins in a living body. Due to their practical difficulties, reliable experimental techniques pave the way for introducing computational methods in the interaction prediction. Automated methods reduced the difficulties but could not yet replace experimental studies as the field is still evolving. Interaction prediction problem being critical needs highly accurate results, but none of the existing methods could offer reliable performance that can parallel with experimental results yet. This article aims to assess the existing computational docking algorithms, their challenges, and future scope. Blind docking techniques are quite helpful when no information other than the individual structures are available. As more and more complex structures are being added to different databases, information-driven approaches can be a good alternative. Artificial intelligence, ruling over the major fields, is expected to take over this domain very shortly.
Collapse
|
16
|
Aggarwal R, Gupta A, Chelur V, Jawahar CV, Priyakumar UD. DeepPocket: Ligand Binding Site Detection and Segmentation using 3D Convolutional Neural Networks. J Chem Inf Model 2021; 62:5069-5079. [PMID: 34374539 DOI: 10.1021/acs.jcim.1c00799] [Citation(s) in RCA: 35] [Impact Index Per Article: 11.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
A structure-based drug design pipeline involves the development of potential drug molecules or ligands that form stable complexes with a given receptor at its binding site. A prerequisite to this is finding druggable and functionally relevant binding sites on the 3D structure of the protein. Although several methods for detecting binding sites have been developed beforehand, a majority of them surprisingly fail in the identification and ranking of binding sites accurately. The rapid adoption and success of deep learning algorithms in various sections of structural biology beckons the usage of such algorithms for accurate binding site detection. As a combination of geometry based software and deep learning, we report a novel framework, DeepPocket that utilizes 3D convolutional neural networks for the rescoring of pockets identified by Fpocket and further segments these identified cavities on the protein surface. Apart from this, we also propose another data set SC6K containing protein structures submitted in the Protein Data Bank (PDB) from January 1st, 2018, until February 28th, 2020, for ligand binding site (LBS) detection. DeepPocket's results on various binding site data sets and SC6K highlight its better performance over current state-of-the-art methods and good generalization ability over novel structures.
Collapse
Affiliation(s)
- Rishal Aggarwal
- International Institute of Information Technology, Hyderabad 500 032, India
| | - Akash Gupta
- International Institute of Information Technology, Hyderabad 500 032, India
| | - Vineeth Chelur
- International Institute of Information Technology, Hyderabad 500 032, India
| | - C V Jawahar
- International Institute of Information Technology, Hyderabad 500 032, India
| | - U Deva Priyakumar
- International Institute of Information Technology, Hyderabad 500 032, India
| |
Collapse
|
17
|
Wee J, Xia K. Forman persistent Ricci curvature (FPRC)-based machine learning models for protein-ligand binding affinity prediction. Brief Bioinform 2021; 22:6262241. [PMID: 33940588 DOI: 10.1093/bib/bbab136] [Citation(s) in RCA: 28] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2021] [Revised: 03/14/2021] [Accepted: 03/23/2021] [Indexed: 01/01/2023] Open
Abstract
Artificial intelligence (AI) techniques have already been gradually applied to the entire drug design process, from target discovery, lead discovery, lead optimization and preclinical development to the final three phases of clinical trials. Currently, one of the central challenges for AI-based drug design is molecular featurization, which is to identify or design appropriate molecular descriptors or fingerprints. Efficient and transferable molecular descriptors are key to the success of all AI-based drug design models. Here we propose Forman persistent Ricci curvature (FPRC)-based molecular featurization and feature engineering, for the first time. Molecular structures and interactions are modeled as simplicial complexes, which are generalization of graphs to their higher dimensional counterparts. Further, a multiscale representation is achieved through a filtration process, during which a series of nested simplicial complexes at different scales are generated. Forman Ricci curvatures (FRCs) are calculated on the series of simplicial complexes, and the persistence and variation of FRCs during the filtration process is defined as FPRC. Moreover, persistent attributes, which are FPRC-based functions and properties, are employed as molecular descriptors, and combined with machine learning models, in particular, gradient boosting tree (GBT). Our FPRC-GBT models are extensively trained and tested on three most commonly-used datasets, including PDBbind-2007, PDBbind-2013 and PDBbind-2016. It has been found that our results are better than the ones from machine learning models with traditional molecular descriptors.
Collapse
Affiliation(s)
- JunJie Wee
- Division of Mathematical Sciences, School of Physical and Mathematical Sciences, Nanyang Technological University, Singapore 637371
| | - Kelin Xia
- Division of Mathematical Sciences, School of Physical and Mathematical Sciences, Nanyang Technological University, Singapore 637371
| |
Collapse
|
18
|
Liu X, Feng H, Wu J, Xia K. Persistent spectral hypergraph based machine learning (PSH-ML) for protein-ligand binding affinity prediction. Brief Bioinform 2021; 22:6219114. [PMID: 33837771 DOI: 10.1093/bib/bbab127] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2021] [Revised: 03/14/2021] [Accepted: 03/16/2021] [Indexed: 12/21/2022] Open
Abstract
Molecular descriptors are essential to not only quantitative structure activity/property relationship (QSAR/QSPR) models, but also machine learning based chemical and biological data analysis. In this paper, we propose persistent spectral hypergraph (PSH) based molecular descriptors or fingerprints for the first time. Our PSH-based molecular descriptors are used in the characterization of molecular structures and interactions, and further combined with machine learning models, in particular gradient boosting tree (GBT), for protein-ligand binding affinity prediction. Different from traditional molecular descriptors, which are usually based on molecular graph models, a hypergraph-based topological representation is proposed for protein-ligand interaction characterization. Moreover, a filtration process is introduced to generate a series of nested hypergraphs in different scales. For each of these hypergraphs, its eigen spectrum information can be obtained from the corresponding (Hodge) Laplacain matrix. PSH studies the persistence and variation of the eigen spectrum of the nested hypergraphs during the filtration process. Molecular descriptors or fingerprints can be generated from persistent attributes, which are statistical or combinatorial functions of PSH, and combined with machine learning models, in particular, GBT. We test our PSH-GBT model on three most commonly used datasets, including PDBbind-2007, PDBbind-2013 and PDBbind-2016. Our results, for all these databases, are better than all existing machine learning models with traditional molecular descriptors, as far as we know.
Collapse
Affiliation(s)
- Xiang Liu
- Division of Mathematical Sciences, School of Physical and Mathematical Sciences, Nanyang Technological University, Singapore 637371.,Chern Institute of Mathematics and LPMC, Nankai University, Tianjin, China, 300071.,Center for Topology and Geometry Based Technology, Hebei Normal University, Hebei, China, 050024
| | - Huitao Feng
- Chern Institute of Mathematics and LPMC, Nankai University, Tianjin, China, 300071.,Mathematical Science Research Center, Chongqing University of Technology, Chongqing, China, 400054
| | - Jie Wu
- Center for Topology and Geometry Based Technology, Hebei Normal University, Hebei, China, 050024.,School of Mathematical Sciences, Hebei Normal University, Hebei, China, 050024
| | - Kelin Xia
- Division of Mathematical Sciences, School of Physical and Mathematical Sciences, Nanyang Technological University, Singapore 637371
| |
Collapse
|
19
|
Zhao R, Wang M, Chen J, Tong Y, Wei GW. The de Rham-Hodge Analysis and Modeling of Biomolecules. Bull Math Biol 2020; 82:108. [PMID: 32770408 PMCID: PMC8137271 DOI: 10.1007/s11538-020-00783-2] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2019] [Accepted: 07/20/2020] [Indexed: 12/18/2022]
Abstract
Biological macromolecules have intricate structures that underpin their biological functions. Understanding their structure-function relationships remains a challenge due to their structural complexity and functional variability. Although de Rham-Hodge theory, a landmark of twentieth-century mathematics, has had a tremendous impact on mathematics and physics, it has not been devised for macromolecular modeling and analysis. In this work, we introduce de Rham-Hodge theory as a unified paradigm for analyzing the geometry, topology, flexibility, and Hodge mode analysis of biological macromolecules. Geometric characteristics and topological invariants are obtained either from the Helmholtz-Hodge decomposition of the scalar, vector, and/or tensor fields of a macromolecule or from the spectral analysis of various Laplace-de Rham operators defined on the molecular manifolds. We propose Laplace-de Rham spectral-based models for predicting macromolecular flexibility. We further construct a Laplace-de Rham-Helfrich operator for revealing cryo-EM natural frequencies. Extensive experiments are carried out to demonstrate that the proposed de Rham-Hodge paradigm is one of the most versatile tools for the multiscale modeling and analysis of biological macromolecules and subcellular organelles. Accurate, reliable, and topological structure-preserving algorithms for implementing discrete exterior calculus (DEC) have been developed to facilitate the aforementioned modeling and analysis of biological macromolecules. The proposed de Rham-Hodge paradigm has potential applications to subcellular organelles and the structure construction from medium- or low-resolution cryo-EM maps, and functional predictions from massive biomolecular datasets.
Collapse
Affiliation(s)
- Rundong Zhao
- Department of Computer Science and Engineering, Michigan State University, East Lansing, MI, 48824, USA
| | - Menglun Wang
- Department of Mathematics, Michigan State University, East Lansing, MI, 48824, USA
| | - Jiahui Chen
- Department of Mathematics, Michigan State University, East Lansing, MI, 48824, USA
| | - Yiying Tong
- Department of Computer Science and Engineering, Michigan State University, East Lansing, MI, 48824, USA.
| | - Guo-Wei Wei
- Department of Mathematics, Michigan State University, East Lansing, MI, 48824, USA.
- Department of Electrical and Computer Engineering, Michigan State University, East Lansing, MI, 48824, USA.
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, MI, 48824, USA.
| |
Collapse
|
20
|
Abstract
Recently, machine learning (ML) has established itself in various worldwide benchmarking competitions in computational biology, including Critical Assessment of Structure Prediction (CASP) and Drug Design Data Resource (D3R) Grand Challenges. However, the intricate structural complexity and high ML dimensionality of biomolecular datasets obstruct the efficient application of ML algorithms in the field. In addition to data and algorithm, an efficient ML machinery for biomolecular predictions must include structural representation as an indispensable component. Mathematical representations that simplify the biomolecular structural complexity and reduce ML dimensionality have emerged as a prime winner in D3R Grand Challenges. This review is devoted to the recent advances in developing low-dimensional and scalable mathematical representations of biomolecules in our laboratory. We discuss three classes of mathematical approaches, including algebraic topology, differential geometry, and graph theory. We elucidate how the physical and biological challenges have guided the evolution and development of these mathematical apparatuses for massive and diverse biomolecular data. We focus the performance analysis on protein-ligand binding predictions in this review although these methods have had tremendous success in many other applications, such as protein classification, virtual screening, and the predictions of solubility, solvation free energies, toxicity, partition coefficients, protein folding stability changes upon mutation, etc.
Collapse
Affiliation(s)
- Duc Duy Nguyen
- Department of Mathematics, Michigan State University, MI 48824, USA.
| | - Zixuan Cang
- Department of Mathematics, Michigan State University, MI 48824, USA.
| | - Guo-Wei Wei
- Department of Mathematics, Michigan State University, MI 48824, USA. and Department of Biochemistry and Molecular Biology, Michigan State University, MI 48824, USA and Department of Electrical and Computer Engineering, Michigan State University, MI 48824, USA
| |
Collapse
|