1
|
Suwayyid F, Wei GW. Persistent Mayer Dirac. JOURNAL OF PHYSICS. COMPLEXITY 2024; 5:045005. [PMID: 39429974 PMCID: PMC11488505 DOI: 10.1088/2632-072x/ad83a5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/23/2024] [Revised: 09/11/2024] [Accepted: 10/04/2024] [Indexed: 10/22/2024]
Abstract
Topological data analysis (TDA) has made significant progress in developing a new class of fundamental operators known as the Dirac operator, particularly in topological signals and molecular representations. However, the current approaches being used are based on the classical case of chain complexes. The present study establishes Mayer Dirac operators based on N-chain complexes. These operators interconnect an alternating sequence of Mayer Laplacian operators, providing a generalization of the classical resultD 2 = L . Furthermore, the research presents an explicit formulation of the Laplacian for N-chain complexes induced by vertex sequences on a finite set. Weighted versions of Mayer Laplacian and Dirac operators are introduced to expand the scope and improve applicability, showcasing their effectiveness in capturing physical attributes in various practical scenarios. The study presents a generalized version for factorizing Laplacian operators as an operator's product and its 'adjoint'. Additionally, the proposed persistent Mayer Dirac operators and extensions are applied to biological and chemical domains, particularly in the analysis of molecular structures. The study also highlights the potential applications of persistent Mayer Dirac operators in data science.
Collapse
Affiliation(s)
- Faisal Suwayyid
- Department of Mathematics, King Fahd University of Petroleum and Minerals, Dhahran 31261, Saudi Arabia
- Department of Mathematics, Michigan State University, East Lansing, MI 48824, United States of America
| | - Guo-Wei Wei
- Department of Mathematics, Michigan State University, East Lansing, MI 48824, United States of America
- Department of Electrical and Computer Engineering, Michigan State University, East Lansing, MI 48824, United States of America
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, MI 48824, United States of America
| |
Collapse
|
2
|
Fu S, Chen Z, Luo Z, Nie M, Fu T, Zhou Y, Yang Q, Zhu F, Ni F. Chem(Pro)2: the atlas of chemoproteomic probes labelling human proteins. Nucleic Acids Res 2024:gkae943. [PMID: 39436046 DOI: 10.1093/nar/gkae943] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2024] [Revised: 09/25/2024] [Accepted: 10/11/2024] [Indexed: 10/23/2024] Open
Abstract
Chemoproteomic probes (CPPs) have been widely considered as powerful molecular biological tools that enable the highly efficient discovery of both binding proteins and modes of action for the studied compounds. They have been successfully used to validate targets and identify binders. The design of CPP has been considered extremely challenging, which asks for the generalization using a large number of probe data. However, none of the existing databases gives such valuable data of CPPs. Herein, a database entitled 'Chem(Pro)2' was therefore developed to systematically describe the atlas of diverse types of CPPs labelling human protein in living cell/lysate. With the booming application of chemoproteomic technique and artificial intelligence in current chemical biology study, Chem(Pro)2 was expected to facilitate the AI-based learning of interacting pattern among molecules for discovering innovative targets and new drugs. Till now, Chem(Pro)2 has been open to all users without any login requirement at: https://idrblab.org/chemprosquare/.
Collapse
Affiliation(s)
- Songsen Fu
- Institute of Drug Discovery Technology, Ningbo University, Ningbo 315211, China
- LeadArt Biotechnologies Ltd., Ningbo 315201, China
| | - Zhen Chen
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, State Key Laboratory of Advanced Drug Delivery and Release Systems, Zhejiang University, Hangzhou 310058, China
| | - Zhiming Luo
- LeadArt Biotechnologies Ltd., Ningbo 315201, China
| | - Meiyun Nie
- LeadArt Biotechnologies Ltd., Ningbo 315201, China
| | - Tingting Fu
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, State Key Laboratory of Advanced Drug Delivery and Release Systems, Zhejiang University, Hangzhou 310058, China
| | - Ying Zhou
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, State Key Laboratory of Advanced Drug Delivery and Release Systems, Zhejiang University, Hangzhou 310058, China
| | - Qingxia Yang
- Zhejiang Provincial Key Laboratory of Precision Diagnosis and Therapy for Major Gynecological Diseases, Women's Hospital, Zhejiang University School of Medicine, Hangzhou 310058, China
| | - Feng Zhu
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, State Key Laboratory of Advanced Drug Delivery and Release Systems, Zhejiang University, Hangzhou 310058, China
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Alibaba-Zhejiang University Joint Research Center of Future Digital Healthcare, Hangzhou 330110, China
| | - Feng Ni
- Institute of Drug Discovery Technology, Ningbo University, Ningbo 315211, China
- LeadArt Biotechnologies Ltd., Ningbo 315201, China
| |
Collapse
|
3
|
Shen L, Feng H, Li F, Lei F, Wu J, Wei GW. Knot data analysis using multiscale Gauss link integral. Proc Natl Acad Sci U S A 2024; 121:e2408431121. [PMID: 39392667 PMCID: PMC11494316 DOI: 10.1073/pnas.2408431121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2024] [Accepted: 07/17/2024] [Indexed: 10/12/2024] Open
Abstract
In the past decade, topological data analysis has emerged as a powerful algebraic topology approach in data science. Although knot theory and related subjects are a focus of study in mathematics, their success in practical applications is quite limited due to the lack of localization and quantization. We address these challenges by introducing knot data analysis (KDA), a paradigm that incorporates curve segmentation and multiscale analysis into the Gauss link integral. The resulting multiscale Gauss link integral (mGLI) recovers the global topological properties of knots and links at an appropriate scale and offers a multiscale geometric topology approach to capture the local structures and connectivities in data. By integration with machine learning or deep learning, the proposed mGLI significantly outperforms other state-of-the-art methods across various benchmark problems in 13 intricately complex biological datasets, including protein flexibility analysis, protein-ligand interactions, human Ether-à-go-go-Related Gene potassium channel blockade screening, and quantitative toxicity assessment. Our KDA opens a research area-knot deep learning-in data science.
Collapse
Affiliation(s)
- Li Shen
- Department of Mathematics, Michigan State University, East Lansing, MI 48824
| | - Hongsong Feng
- Department of Mathematics, Michigan State University, East Lansing, MI 48824
| | - Fengling Li
- School of Mathematical Sciences, Dalian University of Technology, Dalian 116024, China
| | - Fengchun Lei
- School of Mathematical Sciences, Dalian University of Technology, Dalian 116024, China
| | - Jie Wu
- Beijing Institute of Mathematical Sciences and Applications, 101408, China
| | - Guo-Wei Wei
- Department of Mathematics, Michigan State University, East Lansing, MI 48824
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, MI 48824
- Department of Electrical and Computer Engineering, Michigan State University, East Lansing, MI 48824
| |
Collapse
|
4
|
Zhang Y, Shen C, Xia K. Multi-Cover Persistence (MCP)-based machine learning for polymer property prediction. Brief Bioinform 2024; 25:bbae465. [PMID: 39323091 PMCID: PMC11424509 DOI: 10.1093/bib/bbae465] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2024] [Revised: 08/07/2024] [Accepted: 09/05/2024] [Indexed: 09/27/2024] Open
Abstract
Accurate and efficient prediction of polymers properties is crucial for polymer design. Recently, data-driven artificial intelligence (AI) models have demonstrated great promise in polymers property analysis. Even with the great progresses, a pivotal challenge in all the AI-driven models remains to be the effective representation of molecules. Here we introduce Multi-Cover Persistence (MCP)-based molecular representation and featurization for the first time. Our MCP-based polymer descriptors are combined with machine learning models, in particular, Gradient Boosting Tree (GBT) models, for polymers property prediction. Different from all previous molecular representation, polymer molecular structure and interactions are represented as MCP, which utilizes Delaunay slices at different dimensions and Rhomboid tiling to characterize the complicated geometric and topological information within the data. Statistic features from the generated persistent barcodes are used as polymer descriptors, and further combined with GBT model. Our model has been extensively validated on polymer benchmark datasets. It has been found that our models can outperform traditional fingerprint-based models and has similar accuracy with geometric deep learning models. In particular, our model tends to be more effective on large-sized monomer structures, demonstrating the great potential of MCP in characterizing more complicated polymer data. This work underscores the potential of MCP in polymer informatics, presenting a novel perspective on molecular representation and its application in polymer science.
Collapse
Affiliation(s)
- Yipeng Zhang
- Division of Mathematical Sciences, School of Physical and Mathematical Sciences, Nanyang Technological University, Singapore 637371, Singapore
| | - Cong Shen
- Department of Mathematics, National University of Singapore, Singapore 119076, Singapore
| | - Kelin Xia
- Division of Mathematical Sciences, School of Physical and Mathematical Sciences, Nanyang Technological University, Singapore 637371, Singapore
| |
Collapse
|
5
|
Li B, Tan K, Lao AR, Wang H, Zheng H, Zhang L. A comprehensive review of artificial intelligence for pharmacology research. Front Genet 2024; 15:1450529. [PMID: 39290983 PMCID: PMC11405247 DOI: 10.3389/fgene.2024.1450529] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2024] [Accepted: 08/26/2024] [Indexed: 09/19/2024] Open
Abstract
With the innovation and advancement of artificial intelligence, more and more artificial intelligence techniques are employed in drug research, biomedical frontier research, and clinical medicine practice, especially, in the field of pharmacology research. Thus, this review focuses on the applications of artificial intelligence in drug discovery, compound pharmacokinetic prediction, and clinical pharmacology. We briefly introduced the basic knowledge and development of artificial intelligence, presented a comprehensive review, and then summarized the latest studies and discussed the strengths and limitations of artificial intelligence models. Additionally, we highlighted several important studies and pointed out possible research directions.
Collapse
Affiliation(s)
- Bing Li
- College of Computer Science, Sichuan University, Chengdu, China
| | - Kan Tan
- College of Computer Science, Sichuan University, Chengdu, China
| | - Angelyn R Lao
- Department of Mathematics and Statistics, De La Salle University, Manila, Philippines
| | - Haiying Wang
- School of Computing, Ulster University, Belfast, United Kingdom
| | - Huiru Zheng
- School of Computing, Ulster University, Belfast, United Kingdom
| | - Le Zhang
- College of Computer Science, Sichuan University, Chengdu, China
| |
Collapse
|
6
|
Arango AS, Park H, Tajkhorshid E. Topological Learning Approach to Characterizing Biological Membranes. J Chem Inf Model 2024; 64:5242-5252. [PMID: 38912752 DOI: 10.1021/acs.jcim.4c00552] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/25/2024]
Abstract
Biological membranes play key roles in cellular compartmentalization, structure, and its signaling pathways. At varying temperatures, individual membrane lipids sample from different configurations, a process that frequently leads to higher-order phase behavior and phenomena. Here, we present a persistent homology (PH)-based method for quantifying the structural features of individual and bulk lipids, providing local and contextual information on lipid tail organization. Our method leverages the mathematical machinery of algebraic topology and machine learning to infer temperature-dependent structural information on lipids from static coordinates. To train our model, we generated multiple molecular dynamics trajectories of dipalmitoyl-phosphatidylcholine membranes at varying temperatures. A fingerprint was then constructed for each set of lipid coordinates by PH filtration, in which interaction spheres were grown around the lipid atoms while tracking their intersections. The sphere filtration formed a simplicial complex that captures enduring key topological features of the configuration landscape using homology, yielding persistence data. Following fingerprint extraction for physiologically relevant temperatures, the persistence data were used to train an attention-based neural network for assignment of effective temperature values to selected membrane regions. Our persistence homology-based method captures the local structural effects, via effective temperature, of lipids adjacent to other membrane constituents, e.g., sterols and proteins. This topological learning approach can predict lipid effective temperatures from static coordinates across multiple spatial resolutions. The tool, called MembTDA, can be accessed at https://github.com/hyunp2/Memb-TDA.
Collapse
Affiliation(s)
- Andres S Arango
- Theoretical and Computational Biophysics Group, NIH Resource Center for Macromolecular Modeling and Visualization, Beckman Institute for Advanced Science and Technology, Department of Biochemistry, and Center for Biophysics and Quantitative Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States
| | - Hyun Park
- Theoretical and Computational Biophysics Group, NIH Resource Center for Macromolecular Modeling and Visualization, Beckman Institute for Advanced Science and Technology, Department of Biochemistry, and Center for Biophysics and Quantitative Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States
| | - Emad Tajkhorshid
- Theoretical and Computational Biophysics Group, NIH Resource Center for Macromolecular Modeling and Visualization, Beckman Institute for Advanced Science and Technology, Department of Biochemistry, and Center for Biophysics and Quantitative Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States
| |
Collapse
|
7
|
Hong N, Jiang D, Wang Z, Sun H, Luo H, Bao L, Song M, Kang Y, Hou T. TransfIGN: A Structure-Based Deep Learning Method for Modeling the Interaction between HLA-A*02:01 and Antigen Peptides. J Chem Inf Model 2024; 64:5016-5027. [PMID: 38920330 DOI: 10.1021/acs.jcim.4c00678] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/27/2024]
Abstract
The intricate interaction between major histocompatibility complexes (MHCs) and antigen peptides with diverse amino acid sequences plays a pivotal role in immune responses and T cell activity. In recent years, deep learning (DL)-based models have emerged as promising tools for accelerating antigen peptide screening. However, most of these models solely rely on one-dimensional amino acid sequences, overlooking crucial information required for the three-dimensional (3-D) space binding process. In this study, we propose TransfIGN, a structure-based DL model that is inspired by our previously developed framework, Interaction Graph Network (IGN), and incorporates sequence information from transformers to predict the interactions between HLA-A*02:01 and antigen peptides. Our model, trained on a comprehensive data set containing 61,816 sequences with 9051 binding affinity labels and 56,848 eluted ligand labels, achieves an area under the curve (AUC) of 0.893 on the binary data set, better than state-of-the-art sequence-based models trained on larger data sets such as NetMHCpan4.1, ANN, and TransPHLA. Furthermore, when evaluated on the IEDB weekly benchmark data sets, our predictions (AUC = 0.816) are better than those of the recommended methods like the IEDB consensus (AUC = 0.795). Notably, the interaction weight matrices generated by our method highlight the strong interactions at specific positions within peptides, emphasizing the model's ability to provide physical interpretability. This capability to unveil binding mechanisms through intricate structural features holds promise for new immunotherapeutic avenues.
Collapse
Affiliation(s)
- Nanqi Hong
- College of Computer Science and Technology, Zhejiang University, Hangzhou, Zhejiang 310027, China
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Zhejiang University, Hangzhou, Zhejiang 310058, China
| | - Dejun Jiang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Zhejiang University, Hangzhou, Zhejiang 310058, China
| | - Zhe Wang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Zhejiang University, Hangzhou, Zhejiang 310058, China
| | - Huiyong Sun
- Department of Medicinal Chemistry, China Pharmaceutical University, Nanjing, Jiangsu 210009, China
| | - Hao Luo
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Zhejiang University, Hangzhou, Zhejiang 310058, China
| | - Lingjie Bao
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Zhejiang University, Hangzhou, Zhejiang 310058, China
| | - Mingli Song
- College of Computer Science and Technology, Zhejiang University, Hangzhou, Zhejiang 310027, China
| | - Yu Kang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Zhejiang University, Hangzhou, Zhejiang 310058, China
| | - Tingjun Hou
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Zhejiang University, Hangzhou, Zhejiang 310058, China
| |
Collapse
|
8
|
Kairys V, Baranauskiene L, Kazlauskiene M, Zubrienė A, Petrauskas V, Matulis D, Kazlauskas E. Recent advances in computational and experimental protein-ligand affinity determination techniques. Expert Opin Drug Discov 2024; 19:649-670. [PMID: 38715415 DOI: 10.1080/17460441.2024.2349169] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2024] [Accepted: 04/25/2024] [Indexed: 05/22/2024]
Abstract
INTRODUCTION Modern drug discovery revolves around designing ligands that target the chosen biomolecule, typically proteins. For this, the evaluation of affinities of putative ligands is crucial. This has given rise to a multitude of dedicated computational and experimental methods that are constantly being developed and improved. AREAS COVERED In this review, the authors reassess both the industry mainstays and the newest trends among the methods for protein - small-molecule affinity determination. They discuss both computational affinity predictions and experimental techniques, describing their basic principles, main limitations, and advantages. Together, this serves as initial guide to the currently most popular and cutting-edge ligand-binding assays employed in rational drug design. EXPERT OPINION The affinity determination methods continue to develop toward miniaturization, high-throughput, and in-cell application. Moreover, the availability of data analysis tools has been constantly increasing. Nevertheless, cross-verification of data using at least two different techniques and careful result interpretation remain of utmost importance.
Collapse
Affiliation(s)
- Visvaldas Kairys
- Department of Bioinformatics, Institute of Biotechnology, Life Sciences Center, Vilnius University, Vilnius, Lithuania
| | - Lina Baranauskiene
- Department of Biothermodynamics and Drug Design, Institute of Biotechnology, Life Sciences Center, Vilnius University, Vilnius, Lithuania
| | | | - Asta Zubrienė
- Department of Biothermodynamics and Drug Design, Institute of Biotechnology, Life Sciences Center, Vilnius University, Vilnius, Lithuania
| | - Vytautas Petrauskas
- Department of Biothermodynamics and Drug Design, Institute of Biotechnology, Life Sciences Center, Vilnius University, Vilnius, Lithuania
| | - Daumantas Matulis
- Department of Biothermodynamics and Drug Design, Institute of Biotechnology, Life Sciences Center, Vilnius University, Vilnius, Lithuania
| | - Egidijus Kazlauskas
- Department of Biothermodynamics and Drug Design, Institute of Biotechnology, Life Sciences Center, Vilnius University, Vilnius, Lithuania
| |
Collapse
|
9
|
Ai H, Wu D, Zhou H, Xu J, Gu Q. dMXP: A De Novo Small-Molecule 3D Structure Predictor with Graph Attention Networks. J Chem Inf Model 2024; 64:3744-3755. [PMID: 38662925 DOI: 10.1021/acs.jcim.4c00391] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/14/2024]
Abstract
Generating the three-dimensional (3D) structure of small molecules is crucial in both structure- and ligand-based drug design. Structure-based drug design needs bioactive conformations of compounds for lead identification and optimization. Ligand-based drug design techniques, such as 3D shape similarity search, 3D pharmacophore model, 3D-QSAR, etc., all require high-quality small-molecule ligand conformations to obtain reliable results. Although predicting a small molecular bioactive conformer requires information from the receptor, a crystal structure of the molecule is a proper approximation to its bioactive conformer in a specific receptor because the binding pose of a small molecule in its receptor's binding pockets should be energetically close to the crystal structures. This study presents a de novo small molecular structure predictor (dMXP) with graph attention networks based on crystal data derived from the Cambridge Structural Database (CSD) combined with molecular electrostatic information calculated by density-functional theory (DFT). Two featuring strategies (topological and atomic partial change features) were employed to explore the relation between these features and the 3D crystal structure of a small molecule. These features were then assembled to construct the holistic 3D crystal structure of a molecule. Molecular graphs were encoded using a graph attention mechanism to deal with the issues of the inconsistencies of local substructures contributing to the entire molecular structure. The root-mean-square deviation (RMSDs) of approximately 80% dMXP predicted structures and the native binding poses within receptors are less than 2.0 Å.
Collapse
Affiliation(s)
- Haopeng Ai
- Research Center for Drug Discovery, School of Pharmaceutical Sciences, Sun Yat-Sen University, 132 East Circle at University City, Guangzhou 510006, China
| | - Deyin Wu
- Research Center for Drug Discovery, School of Pharmaceutical Sciences, Sun Yat-Sen University, 132 East Circle at University City, Guangzhou 510006, China
| | - Huihao Zhou
- Research Center for Drug Discovery, School of Pharmaceutical Sciences, Sun Yat-Sen University, 132 East Circle at University City, Guangzhou 510006, China
| | - Jun Xu
- Research Center for Drug Discovery, School of Pharmaceutical Sciences, Sun Yat-Sen University, 132 East Circle at University City, Guangzhou 510006, China
| | - Qiong Gu
- Research Center for Drug Discovery, School of Pharmaceutical Sciences, Sun Yat-Sen University, 132 East Circle at University City, Guangzhou 510006, China
| |
Collapse
|
10
|
Kumar N, Acharya V. Advances in machine intelligence-driven virtual screening approaches for big-data. Med Res Rev 2024; 44:939-974. [PMID: 38129992 DOI: 10.1002/med.21995] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2022] [Revised: 07/15/2023] [Accepted: 10/29/2023] [Indexed: 12/23/2023]
Abstract
Virtual screening (VS) is an integral and ever-evolving domain of drug discovery framework. The VS is traditionally classified into ligand-based (LB) and structure-based (SB) approaches. Machine intelligence or artificial intelligence has wide applications in the drug discovery domain to reduce time and resource consumption. In combination with machine intelligence algorithms, VS has emerged into revolutionarily progressive technology that learns within robust decision orders for data curation and hit molecule screening from large VS libraries in minutes or hours. The exponential growth of chemical and biological data has evolved as "big-data" in the public domain demands modern and advanced machine intelligence-driven VS approaches to screen hit molecules from ultra-large VS libraries. VS has evolved from an individual approach (LB and SB) to integrated LB and SB techniques to explore various ligand and target protein aspects for the enhanced rate of appropriate hit molecule prediction. Current trends demand advanced and intelligent solutions to handle enormous data in drug discovery domain for screening and optimizing hits or lead with fewer or no false positive hits. Following the big-data drift and tremendous growth in computational architecture, we presented this review. Here, the article categorized and emphasized individual VS techniques, detailed literature presented for machine learning implementation, modern machine intelligence approaches, and limitations and deliberated the future prospects.
Collapse
Affiliation(s)
- Neeraj Kumar
- Artificial Intelligence for Computational Biology Lab (AICoB), Biotechnology Division, CSIR-Institute of Himalayan Bioresource Technology, Palampur, Himachal Pradesh, India
- Academy of Scientific and Innovative Research, Ghaziabad, India
| | - Vishal Acharya
- Artificial Intelligence for Computational Biology Lab (AICoB), Biotechnology Division, CSIR-Institute of Himalayan Bioresource Technology, Palampur, Himachal Pradesh, India
- Academy of Scientific and Innovative Research, Ghaziabad, India
| |
Collapse
|
11
|
Cottrell S, Wang R, Wei GW. PLPCA: Persistent Laplacian-Enhanced PCA for Microarray Data Analysis. J Chem Inf Model 2024; 64:2405-2420. [PMID: 37738663 PMCID: PMC10999748 DOI: 10.1021/acs.jcim.3c01023] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/24/2023]
Abstract
Over the years, Principal Component Analysis (PCA) has served as the baseline approach for dimensionality reduction in gene expression data analysis. Its primary objective is to identify a subset of disease-causing genes from a vast pool of thousands of genes. However, PCA possesses inherent limitations that hinder its interpretability, introduce class ambiguity, and fail to capture complex geometric structures in the data. Although these limitations have been partially addressed in the literature by incorporating various regularizers, such as graph Laplacian regularization, existing PCA based methods still face challenges related to multiscale analysis and capturing higher-order interactions in the data. To address these challenges, we propose a novel approach called Persistent Laplacian-enhanced Principal Component Analysis (PLPCA). PLPCA amalgamates the advantages of earlier regularized PCA methods with persistent spectral graph theory, specifically persistent Laplacians derived from algebraic topology. In contrast to graph Laplacians, persistent Laplacians enable multiscale analysis through filtration and can incorporate higher-order simplicial complexes to capture higher-order interactions in the data. We evaluate and validate the performance of PLPCA using ten benchmark microarray data sets that exhibit a wide range of dimensions and data imbalance ratios. Our extensive studies over these data sets demonstrate that PLPCA provides up to 12% improvement to the current state-of-the-art PCA models on five evaluation metrics for classification tasks after dimensionality reduction.
Collapse
Affiliation(s)
- Sean Cottrell
- Department of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
| | - Rui Wang
- Department of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
| | - Guo-Wei Wei
- Department of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
- Department of Electrical and Computer Engineering, Michigan State University, East Lansing, Michigan 48824, United States
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, Michigan 48824, United States
| |
Collapse
|
12
|
Qu X, Dong L, Luo D, Si Y, Wang B. Water Network-Augmented Two-State Model for Protein-Ligand Binding Affinity Prediction. J Chem Inf Model 2024; 64:2263-2274. [PMID: 37433009 DOI: 10.1021/acs.jcim.3c00567] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/13/2023]
Abstract
Water network rearrangement from the ligand-unbound state to the ligand-bound state is known to have significant effects on the protein-ligand binding interactions, but most of the current machine learning-based scoring functions overlook these effects. In this study, we endeavor to construct a comprehensive and realistic deep learning model by incorporating water network information into both ligand-unbound and -bound states. In particular, extended connectivity interaction features were integrated into graph representation, and graph transformer operator was employed to extract features of the ligand-unbound and -bound states. Through these efforts, we developed a water network-augmented two-state model called ECIFGraph::HM-Holo-Apo. Our new model exhibits satisfactory performance in terms of scoring, ranking, docking, screening, and reverse screening power tests on the CASF-2016 benchmark. In addition, it can achieve superior performance in large-scale docking-based virtual screening tests on the DEKOIS2.0 data set. Our study highlights that the use of a water network-augmented two-state model can be an effective strategy to bolster the robustness and applicability of machine learning-based scoring functions, particularly for targets with hydrophilic or solvent-exposed binding pockets.
Collapse
Affiliation(s)
- Xiaoyang Qu
- State Key Laboratory of Physical Chemistry of Solid Surfaces and Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, P. R. China
| | - Lina Dong
- State Key Laboratory of Physical Chemistry of Solid Surfaces and Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, P. R. China
| | - Ding Luo
- State Key Laboratory of Physical Chemistry of Solid Surfaces and Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, P. R. China
| | - Yubing Si
- College of Chemistry, Zhengzhou University, Zhengzhou 450001, P. R. China
| | - Binju Wang
- State Key Laboratory of Physical Chemistry of Solid Surfaces and Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, P. R. China
- Innovation Laboratory for Sciences and Technologies of Energy Materials of Fujian Province (IKKEM), Xiamen 361005, P. R. China
| |
Collapse
|
13
|
Zhang X, Gao H, Wang H, Chen Z, Zhang Z, Chen X, Li Y, Qi Y, Wang R. PLANET: A Multi-objective Graph Neural Network Model for Protein-Ligand Binding Affinity Prediction. J Chem Inf Model 2024; 64:2205-2220. [PMID: 37319418 DOI: 10.1021/acs.jcim.3c00253] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/17/2023]
Abstract
Predicting protein-ligand binding affinity is a central issue in drug design. Various deep learning models have been published in recent years, where many of them rely on 3D protein-ligand complex structures as input and tend to focus on the single task of reproducing binding affinity. In this study, we have developed a graph neural network model called PLANET (Protein-Ligand Affinity prediction NETwork). This model takes the graph-represented 3D structure of the binding pocket on the target protein and the 2D chemical structure of the ligand molecule as input. It was trained through a multi-objective process with three related tasks, including deriving the protein-ligand binding affinity, protein-ligand contact map, and ligand distance matrix. Besides the protein-ligand complexes with known binding affinity data retrieved from the PDBbind database, a large number of non-binder decoys were also added to the training data for deriving the final model of PLANET. When tested on the CASF-2016 benchmark, PLANET exhibited a scoring power comparable to the best result yielded by other deep learning models as well as a reasonable ranking power and docking power. In virtual screening trials conducted on the DUD-E benchmark, PLANET's performance was notably better than several deep learning and machine learning models. As on the LIT-PCBA benchmark, PLANET achieved comparable accuracy as the conventional docking program Glide, but it only spent less than 1% of Glide's computation time to finish the same job because PLANET did not need exhaustive conformational sampling. Considering the decent accuracy and efficiency of PLANET in binding affinity prediction, it may become a useful tool for conducting large-scale virtual screening.
Collapse
Affiliation(s)
- Xiangying Zhang
- Department of Medicinal Chemistry, School of Pharmacy, Fudan University, 826 Zhangheng Road, Shanghai 201203, People's Republic of China
| | - Haotian Gao
- Department of Medicinal Chemistry, School of Pharmacy, Fudan University, 826 Zhangheng Road, Shanghai 201203, People's Republic of China
| | - Haojie Wang
- Department of Medicinal Chemistry, School of Pharmacy, Fudan University, 826 Zhangheng Road, Shanghai 201203, People's Republic of China
| | - Zhihang Chen
- Department of Medicinal Chemistry, School of Pharmacy, Fudan University, 826 Zhangheng Road, Shanghai 201203, People's Republic of China
| | - Zhe Zhang
- Department of Medicinal Chemistry, School of Pharmacy, Fudan University, 826 Zhangheng Road, Shanghai 201203, People's Republic of China
| | - Xinchong Chen
- Department of Medicinal Chemistry, School of Pharmacy, Fudan University, 826 Zhangheng Road, Shanghai 201203, People's Republic of China
| | - Yan Li
- Department of Medicinal Chemistry, School of Pharmacy, Fudan University, 826 Zhangheng Road, Shanghai 201203, People's Republic of China
| | - Yifei Qi
- Department of Medicinal Chemistry, School of Pharmacy, Fudan University, 826 Zhangheng Road, Shanghai 201203, People's Republic of China
| | - Renxiao Wang
- Department of Medicinal Chemistry, School of Pharmacy, Fudan University, 826 Zhangheng Road, Shanghai 201203, People's Republic of China
| |
Collapse
|
14
|
Rayka M, Mirzaei M, Mohammad Latifi A. An ensemble-based approach to estimate confidence of predicted protein-ligand binding affinity values. Mol Inform 2024; 43:e202300292. [PMID: 38358080 DOI: 10.1002/minf.202300292] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2023] [Revised: 01/22/2024] [Accepted: 02/02/2024] [Indexed: 02/16/2024]
Abstract
When designing a machine learning-based scoring function, we access a limited number of protein-ligand complexes with experimentally determined binding affinity values, representing only a fraction of all possible protein-ligand complexes. Consequently, it is crucial to report a measure of confidence and quantify the uncertainty in the model's predictions during test time. Here, we adopt the conformal prediction technique to evaluate the confidence of a prediction for each member of the core set of the CASF 2016 benchmark. The conformal prediction technique requires a diverse ensemble of predictors for uncertainty estimation. To this end, we introduce ENS-Score as an ensemble predictor, which includes 30 models with different protein-ligand representation approaches and achieves Pearson's correlation of 0.842 on the core set of the CASF 2016 benchmark. Also, we comprehensively investigate the residual error of each data point to assess the normality behavior of the distribution of the residual errors and their correlation to the structural features of the ligands, such as hydrophobic interactions and halogen bonding. In the end, we provide a local host web application to facilitate the usage of ENS-Score. All codes to repeat results are provided at https://github.com/miladrayka/ENS_Score.
Collapse
Affiliation(s)
- Milad Rayka
- Applied Biotechnology Research Center, Baqiyatallah University of Medical Sciences, Tehran, Iran
| | - Morteza Mirzaei
- Applied Biotechnology Research Center, Baqiyatallah University of Medical Sciences, Tehran, Iran
| | - Ali Mohammad Latifi
- Applied Biotechnology Research Center, Baqiyatallah University of Medical Sciences, Tehran, Iran
| |
Collapse
|
15
|
Wang Z, Wang S, Li Y, Guo J, Wei Y, Mu Y, Zheng L, Li W. A new paradigm for applying deep learning to protein-ligand interaction prediction. Brief Bioinform 2024; 25:bbae145. [PMID: 38581420 PMCID: PMC10998640 DOI: 10.1093/bib/bbae145] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2023] [Revised: 02/21/2024] [Accepted: 03/18/2024] [Indexed: 04/08/2024] Open
Abstract
Protein-ligand interaction prediction presents a significant challenge in drug design. Numerous machine learning and deep learning (DL) models have been developed to accurately identify docking poses of ligands and active compounds against specific targets. However, current models often suffer from inadequate accuracy or lack practical physical significance in their scoring systems. In this research paper, we introduce IGModel, a novel approach that utilizes the geometric information of protein-ligand complexes as input for predicting the root mean square deviation of docking poses and the binding strength (pKd, the negative value of the logarithm of binding affinity) within the same prediction framework. This ensures that the output scores carry intuitive meaning. We extensively evaluate the performance of IGModel on various docking power test sets, including the CASF-2016 benchmark, PDBbind-CrossDocked-Core and DISCO set, consistently achieving state-of-the-art accuracies. Furthermore, we assess IGModel's generalizability and robustness by evaluating it on unbiased test sets and sets containing target structures generated by AlphaFold2. The exceptional performance of IGModel on these sets demonstrates its efficacy. Additionally, we visualize the latent space of protein-ligand interactions encoded by IGModel and conduct interpretability analysis, providing valuable insights. This study presents a novel framework for DL-based prediction of protein-ligand interactions, contributing to the advancement of this field. The IGModel is available at GitHub repository https://github.com/zchwang/IGModel.
Collapse
Affiliation(s)
- Zechen Wang
- School of Physics, Shandong University, South Shanda Road, 250100 Shandong, China
| | - Sheng Wang
- Shanghai Zelixir Biotech, Xiangke Road, 200030, Shanghai, China
| | - Yangyang Li
- School of Physics, Shandong University, South Shanda Road, 250100 Shandong, China
| | - Jingjing Guo
- Centre in Artificial Intelligence Driven Drug Discovery, Faculty of Applied Sciences, Macao Polytechnic University, Rua de Luís Gonzaga Gomes, Macao, China
| | - Yanjie Wei
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Xueyuan Road 1068, Shenzhen, 518055 Guang Dong, China
| | - Yuguang Mu
- School of Biological Sciences, Nanyang Technological University, Singapore
| | - Liangzhen Zheng
- Shanghai Zelixir Biotech, Xiangke Road, 200030, Shanghai, China
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Xueyuan Road 1068, Shenzhen, 518055 Guang Dong, China
| | - Weifeng Li
- School of Physics, Shandong University, South Shanda Road, 250100 Shandong, China
| |
Collapse
|
16
|
Guo L, Wang J. GSScore: a novel Graphormer-based shell-like scoring method for protein-ligand docking. Brief Bioinform 2024; 25:bbae201. [PMID: 38706316 PMCID: PMC11070652 DOI: 10.1093/bib/bbae201] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2023] [Revised: 02/05/2024] [Accepted: 04/16/2024] [Indexed: 05/07/2024] Open
Abstract
Protein-ligand interactions (PLIs) are essential for cellular activities and drug discovery. But due to the complexity and high cost of experimental methods, there is a great demand for computational approaches to recognize PLI patterns, such as protein-ligand docking. In recent years, more and more models based on machine learning have been developed to directly predict the root mean square deviation (RMSD) of a ligand docking pose with reference to its native binding pose. However, new scoring methods are pressingly needed in methodology for more accurate RMSD prediction. We present a new deep learning-based scoring method for RMSD prediction of protein-ligand docking poses based on a Graphormer method and Shell-like graph architecture, named GSScore. To recognize near-native conformations from a set of poses, GSScore takes atoms as nodes and then establishes the docking interface of protein-ligand into multiple bipartite graphs within different shell ranges. Benefiting from the Graphormer and Shell-like graph architecture, GSScore can effectively capture the subtle differences between energetically favorable near-native conformations and unfavorable non-native poses without extra information. GSScore was extensively evaluated on diverse test sets including a subset of PDBBind version 2019, CASF2016 as well as DUD-E, and obtained significant improvements over existing methods in terms of RMSE, $R$ (Pearson correlation coefficient), Spearman correlation coefficient and Docking power.
Collapse
Affiliation(s)
- Linyuan Guo
- School of Computer Science and Engineering, Central South University, Rd. Lu Shan Nan, 410083, Changsha, P.R. China
- Hunan Provincial Key Lab on Bioinformatics, Central South University, Rd. Lu Shan Nan, 410083, Changsha, P.R. China
| | - Jianxin Wang
- School of Computer Science and Engineering, Central South University, Rd. Lu Shan Nan, 410083, Changsha, P.R. China
- Hunan Provincial Key Lab on Bioinformatics, Central South University, Rd. Lu Shan Nan, 410083, Changsha, P.R. China
| |
Collapse
|
17
|
Metcalf D, Glick ZL, Bortolato A, Jiang A, Cheney DL, Sherrill CD. Directional Δ G Neural Network (DrΔ G-Net): A Modular Neural Network Approach to Binding Free Energy Prediction. J Chem Inf Model 2024; 64:1907-1918. [PMID: 38470995 PMCID: PMC10966643 DOI: 10.1021/acs.jcim.3c02054] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2023] [Revised: 02/23/2024] [Accepted: 02/26/2024] [Indexed: 03/14/2024]
Abstract
The protein-ligand binding free energy is a central quantity in structure-based computational drug discovery efforts. Although popular alchemical methods provide sound statistical means of computing the binding free energy of a large breadth of systems, they are generally too costly to be applied at the same frequency as end point or ligand-based methods. By contrast, these data-driven approaches are typically fast enough to address thousands of systems but with reduced transferability to unseen systems. We introduce DrΔG-Net (or simply Dragnet), an equivariant graph neural network that can blend ligand-based and protein-ligand data-driven approaches. It is based on a 3D fingerprint representation of the ligand alone and in complex with the protein target. Dragnet is a global scoring function to predict the binding affinity of arbitrary protein-ligand complexes, but can be easily tuned via transfer learning to specific systems or end points, performing similarly to common 2D ligand-based approaches in these tasks. Dragnet is evaluated on a total of 28 validation proteins with a set of congeneric ligands derived from the Binding DB and one custom set extracted from the ChEMBL Database. In general, a handful of experimental binding affinities are sufficient to optimize the scoring function for a particular protein and ligand scaffold. When not available, predictions from physics-based methods such as absolute free energy perturbation can be used for the transfer learning tuning of Dragnet. Furthermore, we use our data to illustrate the present limitations of data-driven modeling of binding free energy predictions.
Collapse
Affiliation(s)
- Derek
P. Metcalf
- Center
for Computational Molecular Science and Technology, School of Chemistry
and Biochemistry and School of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, Georgia 30332-0400, United
States
| | - Zachary L. Glick
- Center
for Computational Molecular Science and Technology, School of Chemistry
and Biochemistry and School of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, Georgia 30332-0400, United
States
| | - Andrea Bortolato
- Molecular
Structure and Design, Bristol-Myers Squibb
Company, P.O. Box 5400, Princeton, New Jersey 08543, United States
| | - Andy Jiang
- Center
for Computational Molecular Science and Technology, School of Chemistry
and Biochemistry and School of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, Georgia 30332-0400, United
States
| | - Daniel L. Cheney
- Molecular
Structure and Design, Bristol-Myers Squibb
Company, P.O. Box 5400, Princeton, New Jersey 08543, United States
| | - C. David Sherrill
- Center
for Computational Molecular Science and Technology, School of Chemistry
and Biochemistry and School of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, Georgia 30332-0400, United
States
| |
Collapse
|
18
|
Luo D, Liu D, Qu X, Dong L, Wang B. Enhancing Generalizability in Protein-Ligand Binding Affinity Prediction with Multimodal Contrastive Learning. J Chem Inf Model 2024; 64:1892-1906. [PMID: 38441880 DOI: 10.1021/acs.jcim.3c01961] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/26/2024]
Abstract
Improving the generalization ability of scoring functions remains a major challenge in protein-ligand binding affinity prediction. Many machine learning methods are limited by their reliance on single-modal representations, hindering a comprehensive understanding of protein-ligand interactions. We introduce a graph-neural-network-based scoring function that utilizes a triplet contrastive learning loss to improve protein-ligand representations. In this model, three-dimensional complex representations and the fusion of two-dimensional ligand and coarse-grained pocket representations converge while distancing from decoy representations in latent space. After rigorous validation on multiple external data sets, our model exhibits commendable generalization capabilities compared to those of other deep learning-based scoring functions, marking it as a promising tool in the realm of drug discovery. In the future, our training framework can be extended to other biophysical- and biochemical-related problems such as protein-protein interaction and protein mutation prediction.
Collapse
Affiliation(s)
- Ding Luo
- State Key Laboratory of Physical Chemistry of Solid Surfaces and Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, P. R. China
| | - Dandan Liu
- State Key Laboratory of Physical Chemistry of Solid Surfaces and Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, P. R. China
| | - Xiaoyang Qu
- School of Pharmacy and Medical Technology, Putian University, Putian 351100, P. R. China
- Key Laboratory of Pharmaceutical Analysis and Laboratory Medicine (Putian University), Fujian Province University, Putian 351100, P. R. China
| | - Lina Dong
- State Key Laboratory of Physical Chemistry of Solid Surfaces and Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, P. R. China
| | - Binju Wang
- State Key Laboratory of Physical Chemistry of Solid Surfaces and Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, P. R. China
- Innovation Laboratory for Sciences and Technologies of Energy Materials of Fujian Province (IKKEM), Xiamen 361005, P. R. China
| |
Collapse
|
19
|
Kaffash M, Tolou-Shikhzadeh-Yazdi S, Soleimani S, Hoseinpoor S, Saberi MR, Chamani J. Spectroscopy and molecular simulation on the interaction of Nano-Kaempferol prepared by oil-in-water with two carrier proteins: An investigation of protein-protein interaction. SPECTROCHIMICA ACTA. PART A, MOLECULAR AND BIOMOLECULAR SPECTROSCOPY 2024; 309:123815. [PMID: 38154302 DOI: 10.1016/j.saa.2023.123815] [Citation(s) in RCA: 23] [Impact Index Per Article: 23.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/03/2023] [Revised: 11/28/2023] [Accepted: 12/23/2023] [Indexed: 12/30/2023]
Abstract
In this work, the interaction of human serum albumin (HSA) and human holo-transferrin (HTF) with the prepared Nano-Kaempferol (Nano-KMP) through oil-in-water procedure was investigated in the form of binary and ternary systems by the utilization of different spectroscopy techniques along with molecular simulation and cancer cell experiments. According to fluorescence spectroscopy outcomes, Nano-KMP is capable of quenching both proteins as binary systems by a static mechanism, while in the form of (HSA-HTF) Nano-KMP as the ternary system, an unlinear Stern-Volmer plot was elucidated with the occurrence of both dynamic and static fluorescence quenching mechanisms in the binding interaction. In addition, the two acquired Ksv values in the ternary system signified the existence of two sets of binding sites with two different interaction behaviors. The binding constant values of HSA-Nano KMP, HTF-Nano-KMP, and (HSA-HTF) Nano-KMP complexes formation were (2.54 ± 0.03) × 104, (2.15 ± 0.02) × 104 and (1.43 ± 0.04) × 104M-1at the first set of binding sites and (4.68 ± 0.05) × 104 M-1 at the second set of binding sites, respectively. The data of thermodynamic parameters confirmed the major roles of hydrogen binding and van der Waals forces in the formation of HSA-Nano KMP and HTF-Nano KMP complexes. The thermodynamic parameter values of (HSA-HTF) Nano KMP revealed the dominance of hydrogen binding and van der Waals forces in the first set of binding sites and hydrophobic forces for the second set of binding sites. Resonance light scattering (RLS) analysis displayed the existence of a different interaction behavior for HSA-HTF complex in the presence of Nano-KMP as the ternary system. Moreover, circular dichroism (CD) technique affirmed the conformational changes of the secondary structure of proteins as binary and ternary systems. Molecular docking and molecular dynamics simulations (for 100 ns) were performed to investigate the mechanism of KMP binding to HSA, HTF, and HSA-HTF. Next to observing a concentration and time-dependent cytotoxicity, the down regulation of PI3K/AkT/mTOR pathway resulted in cell cycle arrest in SW480 cells.
Collapse
Affiliation(s)
- Maryam Kaffash
- Department of Biology, Mashhad Branch, Islamic Azad University, Mashhad, Iran
| | | | - Samane Soleimani
- Department of Biology, Mashhad Branch, Islamic Azad University, Mashhad, Iran
| | - Saeideh Hoseinpoor
- Department of Biology, Mashhad Branch, Islamic Azad University, Mashhad, Iran
| | - Mohammad Reza Saberi
- Department of Medicinal Chemistry, School of Pharmacy, Mashhad University of Medical Sciences, Mashhad, Iran
| | - Jamshidkhan Chamani
- Department of Biology, Mashhad Branch, Islamic Azad University, Mashhad, Iran.
| |
Collapse
|
20
|
Zhang Y, Li S, Meng K, Sun S. Machine Learning for Sequence and Structure-Based Protein-Ligand Interaction Prediction. J Chem Inf Model 2024; 64:1456-1472. [PMID: 38385768 DOI: 10.1021/acs.jcim.3c01841] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/23/2024]
Abstract
Developing new drugs is too expensive and time -consuming. Accurately predicting the interaction between drugs and targets will likely change how the drug is discovered. Machine learning-based protein-ligand interaction prediction has demonstrated significant potential. In this paper, computational methods, focusing on sequence and structure to study protein-ligand interactions, are examined. Therefore, this paper starts by presenting an overview of the data sets applied in this area, as well as the various approaches applied for representing proteins and ligands. Then, sequence-based and structure-based classification criteria are subsequently utilized to categorize and summarize both the classical machine learning models and deep learning models employed in protein-ligand interaction studies. Moreover, the evaluation methods and interpretability of these models are proposed. Furthermore, delving into the diverse applications of protein-ligand interaction models in drug research is presented. Lastly, the current challenges and future directions in this field are addressed.
Collapse
Affiliation(s)
- Yunjiang Zhang
- Beijing Key Laboratory for Green Catalysis and Separation, The Faculty of Environment and Life, Beijing University of Technology, Beijing 100124, P. R. China
| | - Shuyuan Li
- Beijing Key Laboratory for Green Catalysis and Separation, The Faculty of Environment and Life, Beijing University of Technology, Beijing 100124, P. R. China
| | - Kong Meng
- Beijing Key Laboratory for Green Catalysis and Separation, The Faculty of Environment and Life, Beijing University of Technology, Beijing 100124, P. R. China
| | - Shaorui Sun
- Beijing Key Laboratory for Green Catalysis and Separation, The Faculty of Environment and Life, Beijing University of Technology, Beijing 100124, P. R. China
| |
Collapse
|
21
|
Chen D, Liu J, Wei GW. TopoFormer: Multiscale Topology-enabled Structure-to-Sequence Transformer for Protein-Ligand Interaction Predictions. RESEARCH SQUARE 2024:rs.3.rs-3640878. [PMID: 38405777 PMCID: PMC10889053 DOI: 10.21203/rs.3.rs-3640878/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/27/2024]
Abstract
Pre-trained deep Transformers have had tremendous success in a wide variety of disciplines. However, in computational biology, essentially all Transformers are built upon the biological sequences, which ignores vital stereochemical information and may result in crucial errors in downstream predictions. On the other hand, three-dimensional (3D) molecular structures are incompatible with the sequential architecture of Transformer and natural language processing (NLP) models in general. This work addresses this foundational challenge by a topological Transformer (TopoFormer). TopoFormer is built by integrating NLP and a multiscale topology techniques, the persistent topological hyperdigraph Laplacian (PTHL), which systematically converts intricate 3D protein-ligand complexes at various spatial scales into a NLP-admissible sequence of topological invariants and homotopic shapes. Element-specific PTHLs are further developed to embed crucial physical, chemical, and biological interactions into topological sequences. TopoFormer surges ahead of conventional algorithms and recent deep learning variants and gives rise to exemplary scoring accuracy and superior performance in ranking, docking, and screening tasks in a number of benchmark datasets. The proposed topological sequences can be extracted from all kinds of structural data in data science to facilitate various NLP models, heralding a new era in AI-driven discovery.
Collapse
Affiliation(s)
- Dong Chen
- Department of Mathematics, Michigan State University, MI, 48824, USA
| | - Jian Liu
- Department of Mathematics, Michigan State University, MI, 48824, USA
- Mathematical Science Research Center, Chongqing University of Technology, Chongqing 400054, China
| | - Guo-Wei Wei
- Department of Mathematics, Michigan State University, MI, 48824, USA
- Department of Electrical and Computer Engineering, Michigan State University, MI 48824, USA
- Department of Biochemistry and Molecular Biology, Michigan State University, MI 48824, USA
| |
Collapse
|
22
|
Guo J. Improving structure-based protein-ligand affinity prediction by graph representation learning and ensemble learning. PLoS One 2024; 19:e0296676. [PMID: 38232063 DOI: 10.1371/journal.pone.0296676] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2023] [Accepted: 12/15/2023] [Indexed: 01/19/2024] Open
Abstract
Predicting protein-ligand binding affinity presents a viable solution for accelerating the discovery of new lead compounds. The recent widespread application of machine learning approaches, especially graph neural networks, has brought new advancements in this field. However, some existing structure-based methods treat protein macromolecules and ligand small molecules in the same way and ignore the data heterogeneity, potentially leading to incomplete exploration of the biochemical information of ligands. In this work, we propose LGN, a graph neural network-based fusion model with extra ligand feature extraction to effectively capture local features and global features within the protein-ligand complex, and make use of interaction fingerprints. By combining the ligand-based features and interaction fingerprints, LGN achieves Pearson correlation coefficients of up to 0.842 on the PDBbind 2016 core set, compared to 0.807 when using the features of complex graphs alone. Finally, we verify the rationalization and generalization of our model through comprehensive experiments. We also compare our model with state-of-the-art baseline methods, which validates the superiority of our model. To reduce the impact of data similarity, we increase the robustness of the model by incorporating ensemble learning.
Collapse
Affiliation(s)
- Jia Guo
- Chongqing Institute of Green and Intelligent Technology, Chinese Academy of Sciences, Beijing, P.R. China
- Chongqing School, University of Chinese Academy of Sciences, Chongqing, China
| |
Collapse
|
23
|
Li Y, Fan Z, Rao J, Chen Z, Chu Q, Zheng M, Li X. An overview of recent advances and challenges in predicting compound-protein interaction (CPI). MEDICAL REVIEW (2021) 2023; 3:465-486. [PMID: 38282802 PMCID: PMC10808869 DOI: 10.1515/mr-2023-0030] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/18/2023] [Accepted: 08/30/2023] [Indexed: 01/30/2024]
Abstract
Compound-protein interactions (CPIs) are critical in drug discovery for identifying therapeutic targets, drug side effects, and repurposing existing drugs. Machine learning (ML) algorithms have emerged as powerful tools for CPI prediction, offering notable advantages in cost-effectiveness and efficiency. This review provides an overview of recent advances in both structure-based and non-structure-based CPI prediction ML models, highlighting their performance and achievements. It also offers insights into CPI prediction-related datasets and evaluation benchmarks. Lastly, the article presents a comprehensive assessment of the current landscape of CPI prediction, elucidating the challenges faced and outlining emerging trends to advance the field.
Collapse
Affiliation(s)
- Yanbei Li
- School of Pharmaceutical Science and Technology, Hangzhou Institute for Advanced Study, UCAS, Hangzhou, Zhejiang Province, China
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Zhehuan Fan
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Jingxin Rao
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Zhiyi Chen
- School of Pharmaceutical Science and Technology, Hangzhou Institute for Advanced Study, UCAS, Hangzhou, Zhejiang Province, China
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Qinyu Chu
- School of Pharmaceutical Science and Technology, Hangzhou Institute for Advanced Study, UCAS, Hangzhou, Zhejiang Province, China
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Mingyue Zheng
- School of Pharmaceutical Science and Technology, Hangzhou Institute for Advanced Study, UCAS, Hangzhou, Zhejiang Province, China
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Xutong Li
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- University of Chinese Academy of Sciences, Beijing, China
| |
Collapse
|
24
|
Arango AS, Park H, Tajkhorshid E. Topological Learning Approach to Characterizing Biological Membranes. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.11.28.569053. [PMID: 38076911 PMCID: PMC10705453 DOI: 10.1101/2023.11.28.569053] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/20/2023]
Abstract
Biological membranes play key roles in cellular compartmentalization, structure, and its signaling pathways. At varying temperatures, individual membrane lipids sample from different configurations, a process that frequently leads to higher-order phase behavior and phenomena. Here we present a persistent homology-based method for quantifying the structural features of individual and bulk lipids, providing local and contextual information on lipid tail organization. Our method leverages the mathematical machinery of algebraic topology and machine learning to infer temperature-dependent structural information of lipids from static coordinates. To train our model, we generated multiple molecular dynamics trajectories of DPPC membranes at varying temperatures. A fingerprint was then constructed for each set of lipid coordinates by a persistent homology filtration, in which interactions spheres were grown around the lipid atoms while tracking their intersections. The sphere filtration formed a simplicial complex that captures enduring key topological features of the configuration landscape, using homology, yielding persistence data. Following fingerprint extraction for physiologically relevant temperatures, the persistence data were used to train an attention-based neural network for assignment of effective temperature values to selected membrane regions. Our persistence homology-based method captures the local structural effects, via effective temperature, of lipids adjacent to other membrane constituents, e.g. sterols and proteins. This topological learning approach can predict lipid effective temperatures from static coordinates across multiple spatial resolutions. The tool, called MembTDA, can be accessed at https://github.com/hyunp2/Memb-TDA.
Collapse
Affiliation(s)
- Andres S Arango
- Theoretical and Computational Biophysics Group, NIH Resource Center for Macromolecular Modeling and Visualization, Beckman Institute for Advanced Science and Technology, Department of Biochemistry, and Center for Biophysics and Quantitative Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, USA
| | - Hyun Park
- Theoretical and Computational Biophysics Group, NIH Resource Center for Macromolecular Modeling and Visualization, Beckman Institute for Advanced Science and Technology, Department of Biochemistry, and Center for Biophysics and Quantitative Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, USA
| | - Emad Tajkhorshid
- Theoretical and Computational Biophysics Group, NIH Resource Center for Macromolecular Modeling and Visualization, Beckman Institute for Advanced Science and Technology, Department of Biochemistry, and Center for Biophysics and Quantitative Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, USA
| |
Collapse
|
25
|
Wang Y, Jiao Q, Wang J, Cai X, Zhao W, Cui X. Prediction of protein-ligand binding affinity with deep learning. Comput Struct Biotechnol J 2023; 21:5796-5806. [PMID: 38213884 PMCID: PMC10782002 DOI: 10.1016/j.csbj.2023.11.009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2023] [Revised: 11/03/2023] [Accepted: 11/03/2023] [Indexed: 01/13/2024] Open
Abstract
The prediction of binding affinities between target proteins and small molecule drugs is essential for speeding up the drug research and design process. To attain precise and effective affinity prediction, computer-aided methods are employed in the drug discovery pipeline. In the last decade, a variety of computational methods has been developed, with deep learning being the most commonly used approach. We have gathered several deep learning methods and classified them into convolutional neural networks (CNNs), graph neural networks (GNNs), and Transformers for analysis and discussion. Initially, we conducted an analysis of the different deep learning methods, focusing on their feature construction and model architecture. We discussed the advantages and disadvantages of each model. Subsequently, we conducted experiments using four deep learning methods on the PDBbind v.2016 core set. We evaluated their prediction capabilities in various affinity intervals and statistically and visually analyzed the samples of correct and incorrect predictions for each model. Through visual analysis, we attempted to combine the strengths of the four models to improve the Root Mean Square Error (RMSE) of predicted affinities by 1.6% (reducing the absolute value to 1.101) and the Pearson Correlation Coefficient (R) by 2.9% (increasing the absolute value to 0.894) compared to the current state-of-the-art method. Lastly, we discussed the challenges faced by current deep learning methods in affinity prediction and proposed potential solutions to address these issues.
Collapse
Affiliation(s)
- Yuxiao Wang
- School of Computer Science and Technology, Shandong University, Qingdao 266237, Shandong, China
| | - Qihong Jiao
- School of Computer Science and Technology, Shandong University, Qingdao 266237, Shandong, China
| | - Jingxuan Wang
- School of Computer Science and Technology, Shandong University, Qingdao 266237, Shandong, China
| | - Xiaojun Cai
- School of Computer Science and Technology, Shandong University, Qingdao 266237, Shandong, China
| | - Wei Zhao
- State Key Laboratory of Microbial Technology, Shandong University, Qingdao 266237, Shandong, China
| | - Xuefeng Cui
- School of Computer Science and Technology, Shandong University, Qingdao 266237, Shandong, China
| |
Collapse
|
26
|
Shiota K, Akutsu T. Multi-shelled ECIF: improved extended connectivity interaction features for accurate binding affinity prediction. BIOINFORMATICS ADVANCES 2023; 3:vbad155. [PMID: 37928345 PMCID: PMC10625475 DOI: 10.1093/bioadv/vbad155] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/01/2023] [Revised: 09/20/2023] [Accepted: 10/19/2023] [Indexed: 11/07/2023]
Abstract
Motivation Extended connectivity interaction features (ECIF) is a method developed to predict protein-ligand binding affinity, allowing for detailed atomic representation. It performed very well in terms of Comparative Assessment of Scoring Functions 2016 (CASF-2016) scoring power. However, ECIF has the limitation of not being able to adequately account for interatomic distances. Results To investigate what kind of distance representation is effective for P-L binding affinity prediction, we have developed two algorithms that improved ECIF's feature extraction method to take distance into account. One is multi-shelled ECIF, which takes into account the distance between atoms by dividing the distance between atoms into multiple layers. The other is weighted ECIF, which weights the importance of interactions according to the distance between atoms. A comparison of these two methods shows that multi-shelled ECIF outperforms weighted ECIF and the original ECIF, achieving a CASF-2016 scoring power Pearson correlation coefficient of 0.877. Availability and implementation All the codes and data are available on GitHub (https://github.com/koji11235/MSECIFv2).
Collapse
Affiliation(s)
- Koji Shiota
- Department of Intelligence Science and Technology, Graduate School of Informatics, Kyoto University, Kyoto, Kyoto 606-8501, Japan
| | - Tatsuya Akutsu
- Department of Intelligence Science and Technology, Graduate School of Informatics, Kyoto University, Kyoto, Kyoto 606-8501, Japan
| |
Collapse
|
27
|
Guo L, Qiu T, Wang J. ViTScore: A Novel Three-Dimensional Vision Transformer Method for Accurate Prediction of Protein-Ligand Docking Poses. IEEE Trans Nanobioscience 2023; 22:734-743. [PMID: 37159314 DOI: 10.1109/tnb.2023.3274640] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/11/2023]
Abstract
Protein-ligand interactions (PLIs) are essential for cellular activities and drug discovery, and due to the complexity and high cost of experimental methods, there is a great demand for computational approaches, such as protein-ligand docking, to decipher PLI patterns. One of the most challenging aspects of protein-ligand docking is to identify near-native conformations from a set of poses, but traditional scoring functions still have limited accuracy. Therefore, new scoring methods are urgently needed for methodological and/or practical implications. We present a novel deep learning-based scoring function for ranking protein-ligand docking poses based on Vision Transformer (ViT), named ViTScore. To recognize near-native poses from a set of poses, ViTScore voxelizes the protein-ligand interactional pocket into a 3D grid labeled by the occupancy contribution of atoms in different physicochemical classes. This allows ViTScore to capture the subtle differences between spatially and energetically favorable near-native poses and unfavorable non-native poses without needing extra information. After that, ViTScore will output the prediction of the root mean square deviation (rmsd) of a docking pose with reference to the native binding pose. ViTScore is extensively evaluated on diverse test sets including PDBbind2019 and CASF2016, and obtains significant improvements over existing methods in terms of RMSE, R and docking power. Moreover, the results demonstrate that ViTScore is a promising scoring function for protein-ligand docking, and it can be used to accurately identify near-native poses from a set of poses. Furthermore, the results suggest that ViTScore is a powerful tool for protein-ligand docking, and it can be used to accurately identify near-native poses from a set of poses. Additionally, ViTScore can be used to identify potential drug targets and to design new drugs with improved efficacy and safety.
Collapse
|
28
|
Challapa-Mamani MR, Tomás-Alvarado E, Espinoza-Baigorria A, León-Figueroa DA, Sah R, Rodriguez-Morales AJ, Barboza JJ. Molecular Docking and Molecular Dynamics Simulations in Related to Leishmania donovani: An Update and Literature Review. Trop Med Infect Dis 2023; 8:457. [PMID: 37888585 PMCID: PMC10610989 DOI: 10.3390/tropicalmed8100457] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2023] [Revised: 09/19/2023] [Accepted: 09/20/2023] [Indexed: 10/28/2023] Open
Abstract
Leishmaniasis, a disease caused by Leishmania parasites and transmitted via sandflies, presents in two main forms: cutaneous and visceral, the latter being more severe. With 0.7 to 1 million new cases each year, primarily in Brazil, diagnosing remains challenging due to diverse disease manifestations. Traditionally, the identification of Leishmania species is inferred from clinical and epidemiological data. Advances in disease management depend on technological progress and the improvement of parasite identification programs. Current treatments, despite the high incidence, show limited efficacy due to factors like cost, toxicity, and lengthy regimens causing poor adherence and resistance development. Diagnostic techniques have improved but a significant gap remains between scientific progress and application in endemic areas. Complete genomic sequence knowledge of Leishmania allows for the identification of therapeutic targets. With the aid of computational tools, testing, searching, and detecting affinity in molecular docking are optimized, and strategies that assess advantages among different options are developed. The review focuses on the use of molecular docking and molecular dynamics (MD) simulation for drug development. It also discusses the limitations and advancements of current treatments, emphasizing the importance of new techniques in improving disease management.
Collapse
Affiliation(s)
- Mabel R. Challapa-Mamani
- Escuela de Medicina, Universidad Cesar Vallejo, Trujillo 13007, Peru;
- Sociedad Científica de Estudiantes de Medicina de la Universidad César Vallejo, Trujillo 13007, Peru
| | - Eduardo Tomás-Alvarado
- Hospital General Regional 17, Instituto Mexicano del Seguro Social, Cancún 75533, Mexico;
| | | | | | - Ranjit Sah
- Department of Clinical Microbiology, Institute of Medicine, Tribhuvan University Teaching Hospital, Kathmandu 44600, Nepal;
- Department of Microbiology, Dr. D. Y. Patil Medical College, Hospital and Research Centre, Dr. D. Y. Patil Vidyapeeth, Pune 411018, Maharashtra, India
| | - Alfonso J. Rodriguez-Morales
- Faculty of Health Sciences, Universidad Científica del Sur, Lima 150152, Peru;
- Gilbert and Rose-Marie Chagoury School of Medicine, Lebanese American University, Beirut 350000, Lebanon
| | | |
Collapse
|
29
|
Qiu Y, Wei GW. Artificial intelligence-aided protein engineering: from topological data analysis to deep protein language models. Brief Bioinform 2023; 24:bbad289. [PMID: 37580175 PMCID: PMC10516362 DOI: 10.1093/bib/bbad289] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2023] [Revised: 07/14/2023] [Accepted: 07/26/2023] [Indexed: 08/16/2023] Open
Abstract
Protein engineering is an emerging field in biotechnology that has the potential to revolutionize various areas, such as antibody design, drug discovery, food security, ecology, and more. However, the mutational space involved is too vast to be handled through experimental means alone. Leveraging accumulative protein databases, machine learning (ML) models, particularly those based on natural language processing (NLP), have considerably expedited protein engineering. Moreover, advances in topological data analysis (TDA) and artificial intelligence-based protein structure prediction, such as AlphaFold2, have made more powerful structure-based ML-assisted protein engineering strategies possible. This review aims to offer a comprehensive, systematic, and indispensable set of methodological components, including TDA and NLP, for protein engineering and to facilitate their future development.
Collapse
Affiliation(s)
- Yuchi Qiu
- Department of Mathematics, Michigan State University, East Lansing, 48824 MI, USA
| | - Guo-Wei Wei
- Department of Mathematics, Michigan State University, East Lansing, 48824 MI, USA
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, 48824 MI, USA
- Department of Electrical and Computer Engineering, Michigan State University, East Lansing, 48824 MI, USA
| |
Collapse
|
30
|
Dong L, Shi S, Qu X, Luo D, Wang B. Ligand binding affinity prediction with fusion of graph neural networks and 3D structure-based complex graph. Phys Chem Chem Phys 2023; 25:24110-24120. [PMID: 37655493 DOI: 10.1039/d3cp03651k] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/02/2023]
Abstract
Accurate prediction of protein-ligand binding affinity is pivotal for drug design and discovery. Here, we proposed a novel deep fusion graph neural networks framework named FGNN to learn the protein-ligand interactions from the 3D structures of protein-ligand complexes. Unlike 1D sequences for proteins or 2D graphs for ligands, the 3D graph of protein-ligand complex enables the more accurate representations of the protein-ligand interactions. Benchmark studies have shown that our fusion models FGNN can achieve more accurate prediction of binding affinity than any individual algorithm. The advantages of fusion strategies have been demonstrated in terms of expressive power of data, learning efficiency and model interpretability. Our fusion models show satisfactory performances on diverse data sets, demonstrating their generalization ability. Given the good performances in both binding affinity prediction and virtual screening, our fusion models are expected to be practically applied for drug screening and design. Our work highlights the potential of the fusion graph neural network algorithm in solving complex prediction problems in computational biology and chemistry. The fusion graph neural networks (FGNN) model is freely available in https://github.com/LinaDongXMU/FGNN.
Collapse
Affiliation(s)
- Lina Dong
- State Key Laboratory of Physical Chemistry of Solid Surfaces and Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, iChEM, College of Chemistry and Chemical Engineering, Xiamen University, Xiamen, 361005, China.
| | - Shuai Shi
- Department of Algorithm, TuringQ Co., Ltd., Shanghai, 200240, China
| | - Xiaoyang Qu
- State Key Laboratory of Physical Chemistry of Solid Surfaces and Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, iChEM, College of Chemistry and Chemical Engineering, Xiamen University, Xiamen, 361005, China.
| | - Ding Luo
- State Key Laboratory of Physical Chemistry of Solid Surfaces and Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, iChEM, College of Chemistry and Chemical Engineering, Xiamen University, Xiamen, 361005, China.
| | - Binju Wang
- State Key Laboratory of Physical Chemistry of Solid Surfaces and Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, iChEM, College of Chemistry and Chemical Engineering, Xiamen University, Xiamen, 361005, China.
- Innovation Laboratory for Sciences and Technologies of Energy Materials of Fujian Province (IKKEM), Xiamen, 361005, China
| |
Collapse
|
31
|
Zhu Y, Zhao L, Wen N, Wang J, Wang C. DataDTA: a multi-feature and dual-interaction aggregation framework for drug-target binding affinity prediction. Bioinformatics 2023; 39:btad560. [PMID: 37688568 PMCID: PMC10516524 DOI: 10.1093/bioinformatics/btad560] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2022] [Revised: 05/09/2023] [Accepted: 09/07/2023] [Indexed: 09/11/2023] Open
Abstract
MOTIVATION Accurate prediction of drug-target binding affinity (DTA) is crucial for drug discovery. The increase in the publication of large-scale DTA datasets enables the development of various computational methods for DTA prediction. Numerous deep learning-based methods have been proposed to predict affinities, some of which only utilize original sequence information or complex structures, but the effective combination of various information and protein-binding pockets have not been fully mined. Therefore, a new method that integrates available key information is urgently needed to predict DTA and accelerate the drug discovery process. RESULTS In this study, we propose a novel deep learning-based predictor termed DataDTA to estimate the affinities of drug-target pairs. DataDTA utilizes descriptors of predicted pockets and sequences of proteins, as well as low-dimensional molecular features and SMILES strings of compounds as inputs. Specifically, the pockets were predicted from the three-dimensional structure of proteins and their descriptors were extracted as the partial input features for DTA prediction. The molecular representation of compounds based on algebraic graph features was collected to supplement the input information of targets. Furthermore, to ensure effective learning of multiscale interaction features, a dual-interaction aggregation neural network strategy was developed. DataDTA was compared with state-of-the-art methods on different datasets, and the results showed that DataDTA is a reliable prediction tool for affinities estimation. Specifically, the concordance index (CI) of DataDTA is 0.806 and the Pearson correlation coefficient (R) value is 0.814 on the test dataset, which is higher than other methods. AVAILABILITY AND IMPLEMENTATION The codes and datasets of DataDTA are available at https://github.com/YanZhu06/DataDTA.
Collapse
Affiliation(s)
- Yan Zhu
- Faculty of Computing, Harbin Institute of Technology, Harbin 150001, China
| | - Lingling Zhao
- Faculty of Computing, Harbin Institute of Technology, Harbin 150001, China
| | - Naifeng Wen
- School of Mechanical and Electrical Engineering, Dalian Minzu University, Dalian 116600, China
| | - Junjie Wang
- Department of Medical Informatics, School of Biomedical Engineering and Informatics, Nanjing Medical University, Nanjing 211166, China
| | - Chunyu Wang
- Faculty of Computing, Harbin Institute of Technology, Harbin 150001, China
| |
Collapse
|
32
|
Rana MM, Nguyen DD. Geometric graph learning with extended atom-types features for protein-ligand binding affinity prediction. Comput Biol Med 2023; 164:107250. [PMID: 37515872 DOI: 10.1016/j.compbiomed.2023.107250] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2023] [Revised: 06/12/2023] [Accepted: 07/07/2023] [Indexed: 07/31/2023]
Abstract
Understanding and accurately predicting protein-ligand binding affinity are essential in the drug design and discovery process. At present, machine learning-based methodologies are gaining popularity as a means of predicting binding affinity due to their efficiency and accuracy, as well as the increasing availability of structural and binding affinity data for protein-ligand complexes. In biomolecular studies, graph theory has been widely applied since graphs can be used to model molecules or molecular complexes in a natural manner. In the present work, we upgrade the graph-based learners for the study of protein-ligand interactions by integrating extensive atom types such as SYBYL and extended connectivity interactive features (ECIF) into multiscale weighted colored graphs (MWCG). By pairing with the gradient boosting decision tree (GBDT) machine learning algorithm, our approach results in two different methods, namely sybylGGL-Score and ecifGGL-Score. Both of our models are extensively validated in their scoring power using three commonly used benchmark datasets in the drug design area, namely CASF-2007, CASF-2013, and CASF-2016. The performance of our best model sybylGGL-Score is compared with other state-of-the-art models in the binding affinity prediction for each benchmark. While both of our models achieve state-of-the-art results, the SYBYL atom-type model sybylGGL-Score outperforms other methods by a wide margin in all benchmarks. Finally, the best-performing SYBYL atom-type model is evaluated on two test sets that are independent of CASF benchmarks.
Collapse
Affiliation(s)
- Md Masud Rana
- Department of Mathematics, University of Kentucky, Lexington, 40506, KY, USA.
| | - Duc Duy Nguyen
- Department of Mathematics, University of Kentucky, Lexington, 40506, KY, USA.
| |
Collapse
|
33
|
Chen P, Shen H, Zhang Y, Wang B, Gu P. SGNet: Sequence-Based Convolution and Ligand Graph Network for Protein Binding Affinity Prediction. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:3257-3266. [PMID: 37030867 DOI: 10.1109/tcbb.2023.3262821] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Protein-ligand binding can play an important role in many fields. It is of great importance to accurately predict the binding affinity between molecules by computational methods. Most computational binding affinity methods require molecular structures. However, there are still a large number of protein molecules with known amino acid sequences whose structures have not yet been solved. To address this issue, this paper proposes a sequence-based convolution and ligand graph network, called SGNet, to fuse the molecular graph information and the amino acid sequence information. This method integrates Conjoint Triad (CT) encoding of amino acid sequence and one-dimensional convolutional neural network module to extract protein molecules, develops graph attention network to extract molecular features of ligand, and then fuses the two feature sets to predict the binding affinity between molecules from the fully connected layer. As a result, SGNet achieves good prediction performance on both KIKD and IC50 data sets, with prediction error RMSEs of 1.287 and 1.58, and correlation Pearson Rs of 0.687 and 0.592, respectively. Comparative experimental results under the same conditions showed that SGNet outperformed Kdeep and GraphDTA in predicting binding affinities between protein-ligand molecules.
Collapse
|
34
|
Li S, Tian T, Zhang Z, Zou Z, Zhao D, Zeng J. PocketAnchor: Learning structure-based pocket representations for protein-ligand interaction prediction. Cell Syst 2023; 14:692-705.e6. [PMID: 37516103 DOI: 10.1016/j.cels.2023.05.005] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2022] [Revised: 11/25/2022] [Accepted: 05/19/2023] [Indexed: 07/31/2023]
Abstract
Protein-ligand interactions are essential for cellular activities and drug discovery processes. Appropriately and effectively representing protein features is of vital importance for developing computational approaches, especially data-driven methods, for predicting protein-ligand interactions. However, existing approaches may not fully investigate the features of the ligand-occupying regions in the protein pockets. Here, we design a structure-based protein representation method, named PocketAnchor, for capturing the local environmental and spatial features of protein pockets to facilitate protein-ligand interaction-related learning tasks. We define "anchors" as probe points reaching into the cavities and those located near the surface of proteins, and we design a specific message passing strategy for gathering local information from the atoms and surface neighboring these anchors. Comprehensive evaluation of our method demonstrated its successful applications in pocket detection and binding affinity prediction, which indicated that our anchor-based approach can provide effective protein feature representations for improving the prediction of protein-ligand interactions.
Collapse
Affiliation(s)
- Shuya Li
- Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing 100084, China
| | - Tingzhong Tian
- Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing 100084, China
| | - Ziting Zhang
- Department of Automation, Tsinghua University, Beijing 100084, China; MOE Key Laboratory of Bioinformatics, Tsinghua University, Beijing 100084, China
| | - Ziheng Zou
- Silexon AI Technology, Nanjing, Jiangsu Province 210023, China
| | - Dan Zhao
- Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing 100084, China.
| | - Jianyang Zeng
- Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing 100084, China.
| |
Collapse
|
35
|
Liu J, Wan J, Ren Y, Shao X, Xu X, Rao L. DOX_BDW: Incorporating Solvation and Desolvation Effects of Cavity Water into Nonfitting Protein-Ligand Binding Affinity Prediction. J Chem Inf Model 2023; 63:4850-4863. [PMID: 37539963 DOI: 10.1021/acs.jcim.3c00776] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/05/2023]
Abstract
Accurate prediction of the protein-ligand binding affinity (PLBA) with an affordable cost is one of the ultimate goals in the field of structure-based drug design (SBDD), as well as a great challenge in the computational and theoretical chemistry. Herein, we have systematically addressed the complicated solvation and desolvation effects on the PLBA brought by the difference of the explicit water in the protein cavity before and after ligands bind to the protein-binding site. Based on the new solvation model, a nonfitting method at the first-principles level for the PLBA prediction was developed by taking the bridging and displaced water (BDW) molecules into account simultaneously. The newly developed method, DOX_BDW, was validated against a total of 765 noncovalent and covalent protein-ligand binding pairs, including the CASF2016 core set, Cov_2022 covalent binding testing set, and six testing sets for the hit and lead compound optimization (HLO) simulation. In all of the testing sets, the DOX_BDW method was able to produce PLBA predictions that were strongly correlated with the corresponding experimental data (R = 0.66-0.85). The overall performance of DOX_BDW is better than the current empirical scoring functions that are heavily parameterized. DOX_BDW is particularly outstanding for the covalent binding situation, implying the need for considering an electronic structure in covalent drug design. Furthermore, the method is especially recommended to be used in the HLO scenario of SBDD, where hundreds of similar derivatives need to be screened and refined. The computational cost of DOX_BDW is affordable, and its accuracy is remarkable.
Collapse
Affiliation(s)
- Jiaqi Liu
- National Key Laboratory of Green Pesticide, Key Laboratory of Pesticide & Chemical Biology of Ministry of Education, Hubei International Scientific and Technological Cooperation Base of Pesticide and Green Synthesis, College of Chemistry, Central China Normal University, Wuhan 43009, People's Republic of China
| | - Jian Wan
- National Key Laboratory of Green Pesticide, Key Laboratory of Pesticide & Chemical Biology of Ministry of Education, Hubei International Scientific and Technological Cooperation Base of Pesticide and Green Synthesis, College of Chemistry, Central China Normal University, Wuhan 43009, People's Republic of China
| | - Yanliang Ren
- National Key Laboratory of Green Pesticide, Key Laboratory of Pesticide & Chemical Biology of Ministry of Education, Hubei International Scientific and Technological Cooperation Base of Pesticide and Green Synthesis, College of Chemistry, Central China Normal University, Wuhan 43009, People's Republic of China
| | - Xubo Shao
- National Key Laboratory of Green Pesticide, Key Laboratory of Pesticide & Chemical Biology of Ministry of Education, Hubei International Scientific and Technological Cooperation Base of Pesticide and Green Synthesis, College of Chemistry, Central China Normal University, Wuhan 43009, People's Republic of China
| | - Xin Xu
- Shanghai Key Laboratory of Molecular Catalysis and Innovative Materials, Ministry of Education (MOE) Laboratory for Computational Physical Science, Department of Chemistry, Fudan University, Shanghai 200433, People's Republic of China
| | - Li Rao
- National Key Laboratory of Green Pesticide, Key Laboratory of Pesticide & Chemical Biology of Ministry of Education, Hubei International Scientific and Technological Cooperation Base of Pesticide and Green Synthesis, College of Chemistry, Central China Normal University, Wuhan 43009, People's Republic of China
| |
Collapse
|
36
|
Abdel-Rehim A, Orhobor O, Hang L, Ni H, King RD. Protein-ligand binding affinity prediction exploiting sequence constituent homology. Bioinformatics 2023; 39:btad502. [PMID: 37572302 PMCID: PMC10463547 DOI: 10.1093/bioinformatics/btad502] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2023] [Revised: 07/10/2023] [Accepted: 08/11/2023] [Indexed: 08/14/2023] Open
Abstract
MOTIVATION Molecular docking is a commonly used approach for estimating binding conformations and their resultant binding affinities. Machine learning has been successfully deployed to enhance such affinity estimations. Many methods of varying complexity have been developed making use of some or all the spatial and categorical information available in these structures. The evaluation of such methods has mainly been carried out using datasets from PDBbind. Particularly the Comparative Assessment of Scoring Functions (CASF) 2007, 2013, and 2016 datasets with dedicated test sets. This work demonstrates that only a small number of simple descriptors is necessary to efficiently estimate binding affinity for these complexes without the need to know the exact binding conformation of a ligand. RESULTS The developed approach of using a small number of ligand and protein descriptors in conjunction with gradient boosting trees demonstrates high performance on the CASF datasets. This includes the commonly used benchmark CASF2016 where it appears to perform better than any other approach. This methodology is also useful for datasets where the spatial relationship between the ligand and protein is unknown as demonstrated using a large ChEMBL-derived dataset. AVAILABILITY AND IMPLEMENTATION Code and data uploaded to https://github.com/abbiAR/PLBAffinity.
Collapse
Affiliation(s)
- Abbi Abdel-Rehim
- Department of Chemical Engineering and Biotechnology, University of Cambridge, Cambridge CB3 0AS, United Kingdom
| | | | - Lou Hang
- Department of Mathematics, University College London, London WC1H 0AY, United Kingdom
| | - Hao Ni
- Department of Mathematics, University College London, London WC1H 0AY, United Kingdom
- The Alan Turing Institute, London NW1 2DB, United Kingdom
| | - Ross D King
- Department of Chemical Engineering and Biotechnology, University of Cambridge, Cambridge CB3 0AS, United Kingdom
- The Alan Turing Institute, London NW1 2DB, United Kingdom
- Department of Biology and Biological Engineering, Chalmers University of Technology, Gothenburg 412 96, Sweden
- Department of Computer Science and Engineering, Chalmers University of Technology, Gothenburg 412 96, Sweden
| |
Collapse
|
37
|
Qiu Y, Wei GW. Artificial intelligence-aided protein engineering: from topological data analysis to deep protein language models. ARXIV 2023:arXiv:2307.14587v1. [PMID: 37547662 PMCID: PMC10402185] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Subscribe] [Scholar Register] [Indexed: 08/08/2023]
Abstract
Protein engineering is an emerging field in biotechnology that has the potential to revolutionize various areas, such as antibody design, drug discovery, food security, ecology, and more. However, the mutational space involved is too vast to be handled through experimental means alone. Leveraging accumulative protein databases, machine learning (ML) models, particularly those based on natural language processing (NLP), have considerably expedited protein engineering. Moreover, advances in topological data analysis (TDA) and artificial intelligence-based protein structure prediction, such as AlphaFold2, have made more powerful structure-based ML-assisted protein engineering strategies possible. This review aims to offer a comprehensive, systematic, and indispensable set of methodological components, including TDA and NLP, for protein engineering and to facilitate their future development.
Collapse
Affiliation(s)
- Yuchi Qiu
- Department of Mathematics, Michigan State University, East Lansing, 48824, MI, USA
| | - Guo-Wei Wei
- Department of Mathematics, Michigan State University, East Lansing, 48824, MI, USA
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, 48824, MI, USA
- Department of Electrical and Computer Engineering, Michigan State University, East Lansing, 48824, MI, USA
| |
Collapse
|
38
|
Wee J, Bianconi G, Xia K. Persistent Dirac for molecular representation. Sci Rep 2023; 13:11183. [PMID: 37433870 DOI: 10.1038/s41598-023-37853-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2023] [Accepted: 06/28/2023] [Indexed: 07/13/2023] Open
Abstract
Molecular representations are of fundamental importance for the modeling and analysing molecular systems. The successes in drug design and materials discovery have been greatly contributed by molecular representation models. In this paper, we present a computational framework for molecular representation that is mathematically rigorous and based on the persistent Dirac operator. The properties of the discrete weighted and unweighted Dirac matrix are systematically discussed, and the biological meanings of both homological and non-homological eigenvectors are studied. We also evaluate the impact of various weighting schemes on the weighted Dirac matrix. Additionally, a set of physical persistent attributes that characterize the persistence and variation of spectrum properties of Dirac matrices during a filtration process is proposed to be molecular fingerprints. Our persistent attributes are used to classify molecular configurations of nine different types of organic-inorganic halide perovskites. The combination of persistent attributes with gradient boosting tree model has achieved great success in molecular solvation free energy prediction. The results show that our model is effective in characterizing the molecular structures, demonstrating the power of our molecular representation and featurization approach.
Collapse
Affiliation(s)
- Junjie Wee
- Division of Mathematical Sciences, School of Physical and Mathematical Sciences, Nanyang Technological University, Singapore, 637371, Singapore.
| | - Ginestra Bianconi
- School of Mathematical Sciences, Queen Mary University of London, London, E1 4NS, UK
- The Alan Turing Institute, London, NW1 2DB, UK
| | - Kelin Xia
- Division of Mathematical Sciences, School of Physical and Mathematical Sciences, Nanyang Technological University, Singapore, 637371, Singapore
| |
Collapse
|
39
|
Zhang S, Jin Y, Liu T, Wang Q, Zhang Z, Zhao S, Shan B. SS-GNN: A Simple-Structured Graph Neural Network for Affinity Prediction. ACS OMEGA 2023; 8:22496-22507. [PMID: 37396234 PMCID: PMC10308598 DOI: 10.1021/acsomega.3c00085] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/02/2023] [Accepted: 06/01/2023] [Indexed: 07/04/2023]
Abstract
Efficient and effective drug-target binding affinity (DTBA) prediction is a challenging task due to the limited computational resources in practical applications and is a crucial basis for drug screening. Inspired by the good representation ability of graph neural networks (GNNs), we propose a simple-structured GNN model named SS-GNN to accurately predict DTBA. By constructing a single undirected graph based on a distance threshold to represent protein-ligand interactions, the scale of the graph data is greatly reduced. Moreover, ignoring covalent bonds in the protein further reduces the computational cost of the model. The graph neural network-multilayer perceptron (GNN-MLP) module takes the latent feature extraction of atoms and edges in the graph as two mutually independent processes. We also develop an edge-based atom-pair feature aggregation method to represent complex interactions and a graph pooling-based method to predict the binding affinity of the complex. We achieve state-of-the-art prediction performance using a simple model (with only 0.6 M parameters) without introducing complicated geometric feature descriptions. SS-GNN achieves Pearson's Rp = 0.853 on the PDBbind v2016 core set, outperforming state-of-the-art GNN-based methods by 5.2%. Moreover, the simplified model structure and concise data processing procedure improve the prediction efficiency of the model. For a typical protein-ligand complex, affinity prediction takes only 0.2 ms. All codes are freely accessible at https://github.com/xianyuco/SS-GNN.
Collapse
Affiliation(s)
- Shuke Zhang
- Software
College, Hebei Normal University, Shijiazhuang 050024, China
- Shijiazhuang
Xianyu Digital Biotechnology Co., Ltd, Shijiazhuang 050024, China
| | - Yanzhao Jin
- Software
College, Hebei Normal University, Shijiazhuang 050024, China
- Shijiazhuang
Xianyu Digital Biotechnology Co., Ltd, Shijiazhuang 050024, China
| | - Tianmeng Liu
- Software
College, Hebei Normal University, Shijiazhuang 050024, China
- Shijiazhuang
Xianyu Digital Biotechnology Co., Ltd, Shijiazhuang 050024, China
| | - Qi Wang
- Software
College, Hebei Normal University, Shijiazhuang 050024, China
- Shijiazhuang
Xianyu Digital Biotechnology Co., Ltd, Shijiazhuang 050024, China
| | - Zhaohui Zhang
- Software
College, Hebei Normal University, Shijiazhuang 050024, China
- College
of Computer and Cyber Security, Hebei Normal
University, Shijiazhuang 050024, China
| | - Shuliang Zhao
- College
of Computer and Cyber Security, Hebei Normal
University, Shijiazhuang 050024, China
- Hebei
Provincial Key Laboratory of Network and Information Security, Shijiazhuang 050024, China
- Hebei
Provincial Engineering Research Center for Supply Chain Big Data Analytics
& Data Security, Shijiazhuang 050024, China
| | - Bo Shan
- Software
College, Hebei Normal University, Shijiazhuang 050024, China
- Shijiazhuang
Xianyu Digital Biotechnology Co., Ltd, Shijiazhuang 050024, China
| |
Collapse
|
40
|
Afolabi OB, Olasehinde OR, Olanipon DG, Mabayoje SO, Familua OM, Jaiyesimi KF, Agboola EK, Idowu TO, Obafemi OT, Olaoye OA, Oloyede OI. Antioxidant evaluation and computational prediction of prospective drug-like compounds from polyphenolic-rich extract of Hibiscus cannabinus L. seed as antidiabetic and neuroprotective targets: assessment through in vitro and in silico studies. BMC Complement Med Ther 2023; 23:203. [PMID: 37337198 DOI: 10.1186/s12906-023-04023-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2023] [Accepted: 06/03/2023] [Indexed: 06/21/2023] Open
Abstract
BACKGROUND Reports have implicated diabetes mellitus (DM) and Alzheimer's disease (AD) as some of the global persistent health challenges with no lasting solutions, despite of significant inputs of modern-day pharmaceutical firms. This study therefore, aimed to appraise the in vitro antioxidant potential, enzymes inhibitory activities, and as well carry out in silico study on bioactive compounds from polyphenolic-rich extract of Hibiscus cannabinus seed (PEHc). METHODS In vitro antioxidant assays were performed on PEHc using standard methods while the identification of phytoconstituents was carried out with high performance liquid chromatography (HPLC). For the in silico molecular docking using Schrodinger's Grid-based ligand docking with energetics software, seven target proteins were retrieved from the database ( https://www.rcsb.org/ ). RESULTS HPLC technique identified twelve chemical compounds in PEHc, while antioxidant quantification revealed higher total phenolic contents (243.5 ± 0.71 mg GAE/g) than total flavonoid contents (54.06 ± 0.09 mg QE/g) with a significant (p < 0.05) inhibition of ABTS (IC50 = 218.30 ± 0.87 µg/ml) and 1, 1-diphenyl-2-picrylhydrazyl free radicals (IC50 = 227.79 ± 0.74 µg/ml). In a similar manner, the extract demonstrated a significant (p < 0.05) inhibitory activity against α-amylase (IC50 = 256.88 ± 6.15 µg/ml) and α-glucosidase (IC50 = 183.19 ± 0.23 µg/ml) as well as acetylcholinesterase (IC50 = 262.95 ± 1.47 µg/ml) and butyrylcholinesterase (IC50 = 189.97 ± 0.82 µg/ml), respectively. Furthermore, In silico study showed that hibiscetin (a lead) revealed a very strong binding affinity energies for DPP-4, (PDB ID: 1RWQ) and α-amylase (PDB ID: 1SMD), gamma-tocopherol ( for peptide-1 receptor; PDB ID: 3C59, AChE; PDB ID: 4EY7 and BChE; PDB ID: 7B04), cianidanol for α-glucosidase; PDB ID: 7KBJ and kaempferol for Poly [ADP-ribose] polymerase 1 (PARP-1); PDB ID: 6BHV, respectively. More so, ADMET scores revealed drug-like potentials of the lead compounds identified in PEHc. CONCLUSION As a result, the findings of this study point to potential drug-able compounds in PEHc that could be useful for the management of DM and AD.
Collapse
Affiliation(s)
- Olakunle Bamikole Afolabi
- Phytomedicine and Toxicology Unit, Biochemistry Programme, Department of Chemical Sciences, College of Sciences, Afe-Babalola University, P.M.B 5454, Ado-Ekiti, Ekiti State, Nigeria.
| | - Oluwaseun Ruth Olasehinde
- Department of Medical Biochemistry, College of Medicine and Health Sciences, Afe Babalola University, P.M.B 5454, Ado-Ekiti, Ekiti State, Nigeria
| | - Damilola Grace Olanipon
- Department of Biological Sciences, College of Sciences, Afe Babalola University, P.M.B. 5454, Ado-Ekiti, Ekiti State, Nigeria
| | - Samson Olatunde Mabayoje
- Department of Biological Sciences, College of Sciences, Afe Babalola University, P.M.B. 5454, Ado-Ekiti, Ekiti State, Nigeria
| | - Olufemi Michael Familua
- Department of Pharmacology and Toxicology, College of Pharmacy, Afe Babalola University, P.M.B. 5454, Ado-Ekiti, Ekiti State, Nigeria
| | - Kikelomo Folake Jaiyesimi
- Phytomedicine and Toxicology Unit, Biochemistry Programme, Department of Chemical Sciences, College of Sciences, Afe-Babalola University, P.M.B 5454, Ado-Ekiti, Ekiti State, Nigeria
| | - Esther Kemi Agboola
- Phytomedicine and Toxicology Unit, Biochemistry Programme, Department of Chemical Sciences, College of Sciences, Afe-Babalola University, P.M.B 5454, Ado-Ekiti, Ekiti State, Nigeria
| | - Tolulope Olajumoke Idowu
- Medicinal Plant Unit, Chemistry Programme, Department of Chemical Sciences, College of Sciences, Afe-Babalola University, P.M.B 5454, Ado- Ekiti, Ekiti State, Nigeria
| | - Olabisi Tajudeen Obafemi
- Phytomedicine and Toxicology Unit, Biochemistry Programme, Department of Chemical Sciences, College of Sciences, Afe-Babalola University, P.M.B 5454, Ado-Ekiti, Ekiti State, Nigeria
| | - Oyindamola Adeniyi Olaoye
- Phytomedicine and Toxicology Unit, Biochemistry Programme, Department of Chemical Sciences, College of Sciences, Afe-Babalola University, P.M.B 5454, Ado-Ekiti, Ekiti State, Nigeria
| | - Omotade Ibidun Oloyede
- Department of Biochemistry, Ekiti State University, P.M.B 5363, Ado-Ekiti, Ekiti State, Nigeria
| |
Collapse
|
41
|
Merkurjev E, Nguyen DD, Wei GW. Multiscale Laplacian Learning. APPL INTELL 2023; 53:15727-15746. [PMID: 38031564 PMCID: PMC10686291 DOI: 10.1007/s10489-022-04333-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/08/2022] [Indexed: 11/29/2022]
Abstract
Machine learning has greatly influenced many fields, including science. However, despite of the tremendous accomplishments of machine learning, one of the key limitations of most existing machine learning approaches is their reliance on large labeled sets, and thus, data with limited labeled samples remains a challenge. Moreover, the performance of machine learning methods often severely hindered in case of diverse data, usually associated with smaller data sets or data associated with areas of study where the size of the data sets is constrained by high experimental cost and/or ethics. These challenges call for innovative strategies for dealing with these types of data. In this work, the aforementioned challenges are addressed by integrating graph-based frameworks, semi-supervised techniques, multiscale structures, and modified and adapted optimization procedures. This results in two innovative multiscale Laplacian learning (MLL) approaches for machine learning tasks, such as data classification, and for tackling data with limited samples, diverse data, and small data sets. The first approach, multikernel manifold learning (MML), integrates manifold learning with multikernel information and incorporates a warped kernel regularizer using multiscale graph Laplacians. The second approach, the multiscale MBO (MMBO) method, introduces multiscale Laplacians to the modification of the famous classical Merriman-Bence-Osher (MBO) scheme, and makes use of fast solvers. We demonstrate the performance of our algorithms experimentally on a variety of benchmark data sets, and compare them favorably to the state-of-art approaches.
Collapse
Affiliation(s)
| | - Duc Duy Nguyen
- Department of Mathematics, University of Kentucky, KY 40506, USA
| | - Guo-Wei Wei
- Department of Mathematics, Department of Biochemistry and Molecular Biology, Department of Electrical and Computer Engineering Michigan State University, MI 48824, USA
| |
Collapse
|
42
|
Choo HY, Wee J, Shen C, Xia K. Fingerprint-Enhanced Graph Attention Network (FinGAT) Model for Antibiotic Discovery. J Chem Inf Model 2023; 63:2928-2935. [PMID: 37167016 DOI: 10.1021/acs.jcim.3c00045] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/13/2023]
Abstract
Artificial Intelligence (AI) techniques are of great potential to fundamentally change antibiotic discovery industries. Efficient and effective molecular featurization is key to all highly accurate learning models for antibiotic discovery. In this paper, we propose a fingerprint-enhanced graph attention network (FinGAT) model by the combination of sequence-based 2D fingerprints and structure-based graph representation. In our feature learning process, sequence information is transformed into a fingerprint vector, and structural information is encoded through a GAT module into another vector. These two vectors are concatenated and input into a multilayer perceptron (MLP) for antibiotic activity classification. Our model is extensively tested and compared with existing models. It has been found that our FinGAT can outperform various state-of-the-art GNN models in antibiotic discovery.
Collapse
Affiliation(s)
- Hou Yee Choo
- Division of Mathematical Sciences, School of Physical and Mathematical Sciences Nanyang Technological University, Singapore 637371
| | - JunJie Wee
- Division of Mathematical Sciences, School of Physical and Mathematical Sciences Nanyang Technological University, Singapore 637371
| | - Cong Shen
- Division of Mathematical Sciences, School of Physical and Mathematical Sciences Nanyang Technological University, Singapore 637371
- College of Computer Science and Electronic Engineering, Hunan University, Changsha 410083, China
| | - Kelin Xia
- Division of Mathematical Sciences, School of Physical and Mathematical Sciences Nanyang Technological University, Singapore 637371
| |
Collapse
|
43
|
Zhu Z, Dou B, Cao Y, Jiang J, Zhu Y, Chen D, Feng H, Liu J, Zhang B, Zhou T, Wei GW. TIDAL: Topology-Inferred Drug Addiction Learning. J Chem Inf Model 2023; 63:1472-1489. [PMID: 36826415 DOI: 10.1021/acs.jcim.3c00046] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/25/2023]
Abstract
Drug addiction is a global public health crisis, and the design of antiaddiction drugs remains a major challenge due to intricate mechanisms. Since experimental drug screening and optimization are too time-consuming and expensive, there is urgent need to develop innovative artificial intelligence (AI) methods for addressing the challenge. We tackle this challenge by topology-inferred drug addiction learning (TIDAL) built from integrating multiscale topological Laplacians, deep bidirectional transformer, and ensemble-assisted neural networks (EANNs). Multiscale topological Laplacians are a novel class of algebraic topology tools that embed molecular topological invariants and algebraic invariants into its harmonic spectra and nonharmonic spectra, respectively. These invariants complement sequence information extracted from a bidirectional transformer. We validate the proposed TIDAL framework on 22 drug addiction related, 4 hERG, and 12 DAT data sets, which suggests that the proposed TIDAL is a state-of-the-art framework for the modeling and analysis of drug addiction data. We carry out cross-target analysis of the current drug addiction candidates to alert their side effects and identify their repurposing potentials. Our analysis reveals drug-mediated linear and bilinear target correlations. Finally, TIDAL is applied to shed light on relative efficacy, repurposing potential, and potential side effects of 12 existing antiaddiction medications. Our results suggest that TIDAL provides a new computational strategy for pressingly needed antisubstance addiction drug development.
Collapse
Affiliation(s)
- Zailiang Zhu
- School of Computer Science and Artificial Intelligence, Wuhan Textile University, Wuhan, 430200, P R. China
| | - Bozheng Dou
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences, Wuhan Textile University, Wuhan, 430200, P R. China
| | - Yukang Cao
- School of Computer Science and Artificial Intelligence, Wuhan Textile University, Wuhan, 430200, P R. China
| | - Jian Jiang
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences, Wuhan Textile University, Wuhan, 430200, P R. China.,Department of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
| | - Yueying Zhu
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences, Wuhan Textile University, Wuhan, 430200, P R. China
| | - Dong Chen
- Department of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
| | - Hongsong Feng
- Department of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
| | - Jie Liu
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences, Wuhan Textile University, Wuhan, 430200, P R. China
| | - Bengong Zhang
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences, Wuhan Textile University, Wuhan, 430200, P R. China
| | - Tianshou Zhou
- Key Laboratory of Computational Mathematics, Guangdong Province, and School of Mathematics, Sun Yat-sen University, Guangzhou, 510006, P R. China
| | - Guo-Wei Wei
- Department of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States.,Department of Electrical and Computer Engineering Michigan State University, East Lansing, Michigan 48824, United States.,Department of Biochemistry and Molecular Biology Michigan State University, East Lansing, Michigan 48824, United States
| |
Collapse
|
44
|
Rayka M, Firouzi R. GB-score: Minimally designed machine learning scoring function based on distance-weighted interatomic contact features. Mol Inform 2023; 42:e2200135. [PMID: 36722733 DOI: 10.1002/minf.202200135] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2022] [Revised: 11/24/2022] [Accepted: 11/28/2022] [Indexed: 02/02/2023]
Abstract
In recent years, thanks to advances in computer hardware and dataset availability, data-driven approaches (like machine learning) have become one of the essential parts of the drug design framework to accelerate drug discovery procedures. Constructing a new scoring function, a function that can predict the binding score for a generated protein-ligand pose during docking procedure or a crystal complex, based on machine and deep learning has become an active research area in computer-aided drug design. GB-Score is a state-of-the-art machine learning-based scoring function that utilizes distance-weighted interatomic contact features, PDBbind-v2019 general set, and Gradient Boosting Trees algorithm to the binding affinity prediction. The distance-weighted interatomic contact featurization method used the distance between different ligand and protein atom types for numerical representation of the protein-ligand complex. GB-Score attains Pearson's correlation 0.862 and RMSE 1.190 on the CASF-2016 benchmark test in the scoring power metric. GB-Score's codes are freely available on the web at https://github.com/miladrayka/GB_Score.
Collapse
Affiliation(s)
- Milad Rayka
- Department of Physical Chemistry, Chemistry and Chemical Engineering Research Center of Iran, Tehran, Iran
| | - Rohoullah Firouzi
- Department of Physical Chemistry, Chemistry and Chemical Engineering Research Center of Iran, Tehran, Iran
| |
Collapse
|
45
|
Liu R, Liu X, Wu J. Persistent Path-Spectral (PPS) Based Machine Learning for Protein-Ligand Binding Affinity Prediction. J Chem Inf Model 2023; 63:1066-1075. [PMID: 36647267 DOI: 10.1021/acs.jcim.2c01251] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]
Abstract
Molecular descriptors are essential to quantitative structure activity/property relationship (QSAR/QSPR) models and machine learning models. Here we propose persistent path-spectral (PPS), PPS-based molecular descriptors, and PPS-based machine learning model for the prediction of the protein-ligand binding affinity, for the first time. For the graph, simplicial complex, and hypergraph representation of molecular structures and interactions, the path-Laplacian can be constructed and the derived path-spectral naturally gives a quantitative description of molecules. Further, by introducing the filtration process of the representation, the persistent path-spectral can be derived, which gives a multiscale characterization of molecules. Molecular descriptors from the persistent path-spectral attributes then are combined with the machine learning model, in particular, the gradient boosting tree, to form our PPS-ML model. We test our model on three most commonly used data sets, i.e., PDBbind-v2007, PDBbind-v2013, and PDBbind-v2016, and our model can achieve competitive results.
Collapse
Affiliation(s)
- Ran Liu
- Hebei Normal University, Shijiazhuang, Hebei050024, China.,Yanqi Lake Beijing Institute of Mathematical Sciences and Applications, Beijing, 101408, China
| | - Xiang Liu
- Chern Institute of Mathematics, Nankai University, Tianjin, 300071, China
| | - Jie Wu
- Yanqi Lake Beijing Institute of Mathematical Sciences and Applications, Beijing, 101408, China
| |
Collapse
|
46
|
Zou Y, Wang R, Du M, Wang X, Xu D. Identifying Protein-Ligand Interactions via a Novel Distance Self-Feedback Biomolecular Interaction Network. J Phys Chem B 2023; 127:899-911. [PMID: 36657025 DOI: 10.1021/acs.jpcb.2c07592] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023]
Abstract
Efficient and accurate characterizations of protein-ligand interactions are key to understanding biology at the molecular level. They are particularly useful in pharmaceutical industry applications. They are usually computationally demanding for those widely applied dynamics-based methods in identifying important residues or calculating ligand binding free energy. In this work, we proposed a graph deep learning (DL) framework, namely, the distance self-feedback biomolecular interaction network (DSBIN), in which the relationship between the complex structure and binding affinity can be established by means of a carefully designed distance self-feedback module and interaction layer. Our model can directly provide a quantitative evaluation of inhibitor binding affinities (pKd). More importantly, the DSBIN model efficiently identifies key interactions for inhibitor binding and thus intrinsically bears the interpretability. Its generalization performance was further verified using 1405 unseen structures. The predicted binding free energies' deviations were calculated to be less than 1.37 kcal/mol for more than 55% structures. Moreover, we also compared the DSBIN model with a commonly used theoretical method in calculating the substrate binding free energy, MM/GBSA. Our results show that the current DL model has generally better performance in predicting the binding free energy. For a specific complex system, mannopentaose/TmCBM27, the DSBIN predicted binding free energy is -8.21 kcal/mol, which is very close to experimentally measured -7.76 kcal/mol and MM/GBSA calculated -7.16 kcal/mol. Meanwhile, all important aromatic residues around the binding pocket can be identified by our DL model. Considering the accuracy and efficiency of the newly developed DL model, it may be very helpful in the field of drug design and molecular recognition.
Collapse
Affiliation(s)
- Yurong Zou
- MOE Key Laboratory of Green Chemistry and Technology, College of Chemistry, Sichuan University, Chengdu, Sichuan610064, PR China
| | - Ruihan Wang
- MOE Key Laboratory of Green Chemistry and Technology, College of Chemistry, Sichuan University, Chengdu, Sichuan610064, PR China
| | - Meng Du
- MOE Key Laboratory of Green Chemistry and Technology, College of Chemistry, Sichuan University, Chengdu, Sichuan610064, PR China
| | - Xin Wang
- MOE Key Laboratory of Green Chemistry and Technology, College of Chemistry, Sichuan University, Chengdu, Sichuan610064, PR China
| | - Dingguo Xu
- MOE Key Laboratory of Green Chemistry and Technology, College of Chemistry, Sichuan University, Chengdu, Sichuan610064, PR China.,Research Center for Materials Genome Engineering, Sichuan University, Chengdu, Sichuan610065, PR China
| |
Collapse
|
47
|
Chen D, Liu J, Wu J, Wei GW, Pan F, Yau ST. Path Topology in Molecular and Materials Sciences. J Phys Chem Lett 2023; 14:954-964. [PMID: 36688834 PMCID: PMC10799224 DOI: 10.1021/acs.jpclett.2c03706] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/17/2023]
Abstract
The structures of molecules and materials determine their functions. Understanding the structure and function relationship is the holy grail of molecular and materials sciences. However, the rational design of molecules and materials with desirable functions remains a grand challenge despite decades of efforts. A major obstacle is the lack of an intrinsic mathematical characteristic that attributes to a specific function. This work introduces persistent path topology (PPT) to effectively characterize directed networks extracted from functional units, such as constitutional isomers, cis-trans isomers, chiral molecules, Jahn-Teller isomerism, and high-entropy alloy catalysts. Path homology (PH) theory is utilized to decipher the role of mirror-symmetric sublattices that hinder the formation of periodic unit cells in amorphous solids. Topological perturbation analysis (TPA) is proposed to reveal the critical target in the blood coagulation system. The proposed topological tools can be directly applied to systems biology, omics sciences, topological materials, and machine learning study of molecular and materials sciences.
Collapse
Affiliation(s)
- Dong Chen
- School of Advanced Materials, Peking University, Shenzhen Graduate School, Shenzhen518055, China
- Department of Mathematics, Michigan State University, East Lansing, Michigan48824, United States
| | - Jian Liu
- School of Mathematical Sciences, Hebei Normal University, Heibei, 050024, China
- Yanqi Lake Beijing Institute of Mathematical Sciences and Applications, Beijing101408, China
| | - Jie Wu
- Yanqi Lake Beijing Institute of Mathematical Sciences and Applications, Beijing101408, China
| | - Guo-Wei Wei
- Department of Mathematics, Michigan State University, East Lansing, Michigan48824, United States
- Department of Electrical and Computer Engineering, Michigan State University, East Lansing, Michigan48824, United States
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, Michigan48824, United States
| | - Feng Pan
- School of Advanced Materials, Peking University, Shenzhen Graduate School, Shenzhen518055, China
| | - Shing-Tung Yau
- Yanqi Lake Beijing Institute of Mathematical Sciences and Applications, Beijing101408, China
- Yau Mathematical Sciences Center, Tsinghua University, Beijing100084, China
| |
Collapse
|
48
|
Feng H, Wei GW. Virtual screening of DrugBank database for hERG blockers using topological Laplacian-assisted AI models. Comput Biol Med 2023; 153:106491. [PMID: 36599209 PMCID: PMC10120853 DOI: 10.1016/j.compbiomed.2022.106491] [Citation(s) in RCA: 10] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2022] [Revised: 11/29/2022] [Accepted: 12/27/2022] [Indexed: 12/29/2022]
Abstract
The human ether-a-go-go (hERG) potassium channel (Kv11.1) plays a critical role in mediating cardiac action potential. The blockade of this ion channel can potentially lead fatal disorder and/or long QT syndrome. Many drugs have been withdrawn because of their serious hERG-cardiotoxicity. It is crucial to assess the hERG blockade activity in the early stage of drug discovery. We are particularly interested in the hERG-cardiotoxicity of compounds collected in the DrugBank database considering that many DrugBank compounds have been approved for therapeutic treatments or have high potential to become drugs. Machine learning-based in silico tools offer a rapid and economical platform to virtually screen DrugBank compounds. We design accurate and robust classifiers for blockers/non-blockers and then build regressors to quantitatively analyze the binding potency of the DrugBank compounds on the hERG channel. Molecular sequences are embedded with two natural language processing (NLP) methods, namely, autoencoder and transformer. Complementary three-dimensional (3D) molecular structures are embedded with two advanced mathematical approaches, i.e., topological Laplacians and algebraic graphs. With our state-of-the-art tools, we reveal that 227 out of the 8641 DrugBank compounds are potential hERG blockers, suggesting serious drug safety problems. Our predictions provide guidance for the further experimental interrogation of DrugBank compounds' hERG-cardiotoxicity.
Collapse
Affiliation(s)
- Hongsong Feng
- Department of Mathematics, Michigan State University, MI 48824, USA
| | - Guo-Wei Wei
- Department of Mathematics, Michigan State University, MI 48824, USA; Department of Electrical and Computer Engineering, Michigan State University, MI 48824, USA; Department of Biochemistry and Molecular Biology, Michigan State University, MI 48824, USA.
| |
Collapse
|
49
|
Wang Z, Zheng L, Wang S, Lin M, Wang Z, Kong AWK, Mu Y, Wei Y, Li W. A fully differentiable ligand pose optimization framework guided by deep learning and a traditional scoring function. Brief Bioinform 2023; 24:6887112. [PMID: 36502369 DOI: 10.1093/bib/bbac520] [Citation(s) in RCA: 13] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2022] [Revised: 10/17/2022] [Accepted: 10/31/2022] [Indexed: 12/14/2022] Open
Abstract
The recently reported machine learning- or deep learning-based scoring functions (SFs) have shown exciting performance in predicting protein-ligand binding affinities with fruitful application prospects. However, the differentiation between highly similar ligand conformations, including the native binding pose (the global energy minimum state), remains challenging that could greatly enhance the docking. In this work, we propose a fully differentiable, end-to-end framework for ligand pose optimization based on a hybrid SF called DeepRMSD+Vina combined with a multi-layer perceptron (DeepRMSD) and the traditional AutoDock Vina SF. The DeepRMSD+Vina, which combines (1) the root mean square deviation (RMSD) of the docking pose with respect to the native pose and (2) the AutoDock Vina score, is fully differentiable; thus is capable of optimizing the ligand binding pose to the energy-lowest conformation. Evaluated by the CASF-2016 docking power dataset, the DeepRMSD+Vina reaches a success rate of 94.4%, which outperforms most reported SFs to date. We evaluated the ligand conformation optimization framework in practical molecular docking scenarios (redocking and cross-docking tasks), revealing the high potentialities of this framework in drug design and discovery. Structural analysis shows that this framework has the ability to identify key physical interactions in protein-ligand binding, such as hydrogen-bonding. Our work provides a paradigm for optimizing ligand conformations based on deep learning algorithms. The DeepRMSD+Vina model and the optimization framework are available at GitHub repository https://github.com/zchwang/DeepRMSD-Vina_Optimization.
Collapse
Affiliation(s)
- Zechen Wang
- School of Physics, Shandong University, Jinan, Shandong 250100, China
| | - Liangzhen Zheng
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, Guangdong 518055, China.,Shanghai Zelixir Biotech Company Ltd., Shanghai 200030, China
| | - Sheng Wang
- Shanghai Zelixir Biotech Company Ltd., Shanghai 200030, China
| | - Mingzhi Lin
- Shanghai Zelixir Biotech Company Ltd., Shanghai 200030, China
| | - Zhihao Wang
- School of Physics, Shandong University, Jinan, Shandong 250100, China
| | - Adams Wai-Kin Kong
- Rolls-Royce Corporate Lab, Nanyang Technological University, Singapore 637551, Singapore
| | - Yuguang Mu
- School of Biological Sciences, Nanyang Technological University, Singapore 637551, Singapore
| | - Yanjie Wei
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, Guangdong 518055, China
| | - Weifeng Li
- School of Physics, Shandong University, Jinan, Shandong 250100, China
| |
Collapse
|
50
|
Xia C, Feng SH, Xia Y, Pan X, Shen HB. Leveraging scaffold information to predict protein-ligand binding affinity with an empirical graph neural network. Brief Bioinform 2023; 24:6982728. [PMID: 36627113 DOI: 10.1093/bib/bbac603] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2022] [Revised: 11/01/2022] [Accepted: 12/08/2022] [Indexed: 01/12/2023] Open
Abstract
Protein-ligand binding affinity prediction is an important task in structural bioinformatics for drug discovery and design. Although various scoring functions (SFs) have been proposed, it remains challenging to accurately evaluate the binding affinity of a protein-ligand complex with the known bound structure because of the potential preference of scoring system. In recent years, deep learning (DL) techniques have been applied to SFs without sophisticated feature engineering. Nevertheless, existing methods cannot model the differential contribution of atoms in various regions of proteins, and the relationship between atom properties and intermolecular distance is also not fully explored. We propose a novel empirical graph neural network for accurate protein-ligand binding affinity prediction (EGNA). Graphs of protein, ligand and their interactions are constructed based on different regions of each bound complex. Proteins and ligands are effectively represented by graph convolutional layers, enabling the EGNA to capture interaction patterns precisely by simulating empirical SFs. The contributions of different factors on binding affinity can thus be transparently investigated. EGNA is compared with the state-of-the-art machine learning-based SFs on two widely used benchmark data sets. The results demonstrate the superiority of EGNA and its good generalization capability.
Collapse
Affiliation(s)
- Chunqiu Xia
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, 200240 Shanghai, China
| | - Shi-Hao Feng
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, 200240 Shanghai, China
| | - Ying Xia
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, 200240 Shanghai, China
| | - Xiaoyong Pan
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, 200240 Shanghai, China
| | - Hong-Bin Shen
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, 200240 Shanghai, China
| |
Collapse
|