1
|
Vittorio S, Lunghini F, Morerio P, Gadioli D, Orlandini S, Silva P, Jan Martinovic, Pedretti A, Bonanni D, Del Bue A, Palermo G, Vistoli G, Beccari AR. Addressing docking pose selection with structure-based deep learning: Recent advances, challenges and opportunities. Comput Struct Biotechnol J 2024; 23:2141-2151. [PMID: 38827235 PMCID: PMC11141151 DOI: 10.1016/j.csbj.2024.05.024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2024] [Revised: 05/15/2024] [Accepted: 05/15/2024] [Indexed: 06/04/2024] Open
Abstract
Molecular docking is a widely used technique in drug discovery to predict the binding mode of a given ligand to its target. However, the identification of the near-native binding pose in docking experiments still represents a challenging task as the scoring functions currently employed by docking programs are parametrized to predict the binding affinity, and, therefore, they often fail to correctly identify the ligand native binding conformation. Selecting the correct binding mode is crucial to obtaining meaningful results and to conveniently optimizing new hit compounds. Deep learning (DL) algorithms have been an area of a growing interest in this sense for their capability to extract the relevant information directly from the protein-ligand structure. Our review aims to present the recent advances regarding the development of DL-based pose selection approaches, discussing limitations and possible future directions. Moreover, a comparison between the performances of some classical scoring functions and DL-based methods concerning their ability to select the correct binding mode is reported. In this regard, two novel DL-based pose selectors developed by us are presented.
Collapse
Affiliation(s)
- Serena Vittorio
- Dipartimento di Scienze Farmaceutiche, Università degli Studi di Milano, Via Luigi Mangiagalli 25, I-20133 Milano, Italy
| | - Filippo Lunghini
- EXSCALATE, Dompé Farmaceutici SpA, Via Tommaso de Amicis 95, 80123 Naples, Italy
| | - Pietro Morerio
- Pattern Analysis and Computer Vision, Fondazione Istituto Italiano di Tecnologia, Via Morego, 30, 16163 Genova, Italy
| | - Davide Gadioli
- Dipartimento di Elettronica Informazione e Bioingegneria, Politecnico di Milano, Via Ponzio 34/5, I-20133 Milano, Italy
| | - Sergio Orlandini
- SCAI, SuperComputing Applications and Innovation Department, CINECA, Via dei Tizii 6, Rome 00185, Italy
| | - Paulo Silva
- IT4Innovations, VSB – Technical University of Ostrava, 17. listopadu 2172/15, 70800 Ostrava-Poruba, Czech Republic
| | - Jan Martinovic
- IT4Innovations, VSB – Technical University of Ostrava, 17. listopadu 2172/15, 70800 Ostrava-Poruba, Czech Republic
| | - Alessandro Pedretti
- Dipartimento di Scienze Farmaceutiche, Università degli Studi di Milano, Via Luigi Mangiagalli 25, I-20133 Milano, Italy
| | - Domenico Bonanni
- Department of Physical and Chemical Sciences, University of L′Aquila, via Vetoio, L′Aquila 67010, Italy
| | - Alessio Del Bue
- Pattern Analysis and Computer Vision, Fondazione Istituto Italiano di Tecnologia, Via Morego, 30, 16163 Genova, Italy
| | - Gianluca Palermo
- Dipartimento di Elettronica Informazione e Bioingegneria, Politecnico di Milano, Via Ponzio 34/5, I-20133 Milano, Italy
| | - Giulio Vistoli
- Dipartimento di Scienze Farmaceutiche, Università degli Studi di Milano, Via Luigi Mangiagalli 25, I-20133 Milano, Italy
| | - Andrea R. Beccari
- EXSCALATE, Dompé Farmaceutici SpA, Via Tommaso de Amicis 95, 80123 Naples, Italy
| |
Collapse
|
2
|
Kumar N, Acharya V. Advances in machine intelligence-driven virtual screening approaches for big-data. Med Res Rev 2024; 44:939-974. [PMID: 38129992 DOI: 10.1002/med.21995] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2022] [Revised: 07/15/2023] [Accepted: 10/29/2023] [Indexed: 12/23/2023]
Abstract
Virtual screening (VS) is an integral and ever-evolving domain of drug discovery framework. The VS is traditionally classified into ligand-based (LB) and structure-based (SB) approaches. Machine intelligence or artificial intelligence has wide applications in the drug discovery domain to reduce time and resource consumption. In combination with machine intelligence algorithms, VS has emerged into revolutionarily progressive technology that learns within robust decision orders for data curation and hit molecule screening from large VS libraries in minutes or hours. The exponential growth of chemical and biological data has evolved as "big-data" in the public domain demands modern and advanced machine intelligence-driven VS approaches to screen hit molecules from ultra-large VS libraries. VS has evolved from an individual approach (LB and SB) to integrated LB and SB techniques to explore various ligand and target protein aspects for the enhanced rate of appropriate hit molecule prediction. Current trends demand advanced and intelligent solutions to handle enormous data in drug discovery domain for screening and optimizing hits or lead with fewer or no false positive hits. Following the big-data drift and tremendous growth in computational architecture, we presented this review. Here, the article categorized and emphasized individual VS techniques, detailed literature presented for machine learning implementation, modern machine intelligence approaches, and limitations and deliberated the future prospects.
Collapse
Affiliation(s)
- Neeraj Kumar
- Artificial Intelligence for Computational Biology Lab (AICoB), Biotechnology Division, CSIR-Institute of Himalayan Bioresource Technology, Palampur, Himachal Pradesh, India
- Academy of Scientific and Innovative Research, Ghaziabad, India
| | - Vishal Acharya
- Artificial Intelligence for Computational Biology Lab (AICoB), Biotechnology Division, CSIR-Institute of Himalayan Bioresource Technology, Palampur, Himachal Pradesh, India
- Academy of Scientific and Innovative Research, Ghaziabad, India
| |
Collapse
|
3
|
Dong T, Yang Z, Zhou J, Chen CYC. Equivariant Flexible Modeling of the Protein-Ligand Binding Pose with Geometric Deep Learning. J Chem Theory Comput 2023; 19:8446-8459. [PMID: 37938978 DOI: 10.1021/acs.jctc.3c00273] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2023]
Abstract
Flexible modeling of the protein-ligand complex structure is a fundamental challenge for in silico drug development. Recent studies have improved commonly used docking tools by incorporating extra-deep learning-based steps. However, such strategies limit their accuracy and efficiency because they retain massive sampling pressure and lack consideration for flexible biomolecular changes. In this study, we propose FlexPose, a geometric graph network capable of direct flexible modeling of complex structures in Euclidean space without the following conventional sampling and scoring strategies. Our model adopts two key designs: scalar-vector dual feature representation and SE(3)-equivariant network, to manage dynamic structural changes, as well as two strategies: conformation-aware pretraining and weakly supervised learning, to boost model generalizability in unseen chemical space. Benefiting from these paradigms, our model dramatically outperforms all tested popular docking tools and recently advanced deep learning methods, especially in tasks involving protein conformation changes. We further investigate the impact of protein and ligand similarity on the model performance with two conformation-aware strategies. Moreover, FlexPose provides an affinity estimation and model confidence for postanalysis.
Collapse
Affiliation(s)
- Tiejun Dong
- Intelligent Medical Research Center, School of Intelligent Systems Engineering, Sun Yat-sen University, Shenzhen, Guangdong 510275, China
| | - Ziduo Yang
- Intelligent Medical Research Center, School of Intelligent Systems Engineering, Sun Yat-sen University, Shenzhen, Guangdong 510275, China
| | - Jun Zhou
- Intelligent Medical Research Center, School of Intelligent Systems Engineering, Sun Yat-sen University, Shenzhen, Guangdong 510275, China
| | - Calvin Yu-Chian Chen
- Intelligent Medical Research Center, School of Intelligent Systems Engineering, Sun Yat-sen University, Shenzhen, Guangdong 510275, China
- AI for Science (AI4S)-Preferred Program, Peking University Shenzhen Graduate School, Shenzhen, Guangdong 518055, China
- School of Electronic and Computer Engineering, Peking University Shenzhen Graduate School, Shenzhen, Guangdong 518055, China
- Department of Medical Research, China Medical University Hospital, Taichung 40447, Taiwan
- Department of Bioinformatics and Medical Engineering, Asia University, Taichung 41354, Taiwan
| |
Collapse
|
4
|
Weber JK, Morrone JA, Kang SG, Zhang L, Lang L, Chowell D, Krishna C, Huynh T, Parthasarathy P, Luan B, Alban TJ, Cornell WD, Chan TA. Unsupervised and supervised AI on molecular dynamics simulations reveals complex characteristics of HLA-A2-peptide immunogenicity. Brief Bioinform 2023; 25:bbad504. [PMID: 38233090 PMCID: PMC10793977 DOI: 10.1093/bib/bbad504] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2023] [Revised: 11/03/2023] [Accepted: 12/03/2023] [Indexed: 01/19/2024] Open
Abstract
Immunologic recognition of peptide antigens bound to class I major histocompatibility complex (MHC) molecules is essential to both novel immunotherapeutic development and human health at large. Current methods for predicting antigen peptide immunogenicity rely primarily on simple sequence representations, which allow for some understanding of immunogenic features but provide inadequate consideration of the full scale of molecular mechanisms tied to peptide recognition. We here characterize contributions that unsupervised and supervised artificial intelligence (AI) methods can make toward understanding and predicting MHC(HLA-A2)-peptide complex immunogenicity when applied to large ensembles of molecular dynamics simulations. We first show that an unsupervised AI method allows us to identify subtle features that drive immunogenicity differences between a cancer neoantigen and its wild-type peptide counterpart. Next, we demonstrate that a supervised AI method for class I MHC(HLA-A2)-peptide complex classification significantly outperforms a sequence model on small datasets corrected for trivial sequence correlations. Furthermore, we show that both unsupervised and supervised approaches reveal determinants of immunogenicity based on time-dependent molecular fluctuations and anchor position dynamics outside the MHC binding groove. We discuss implications of these structural and dynamic immunogenicity correlates for the induction of T cell responses and therapeutic T cell receptor design.
Collapse
Affiliation(s)
- Jeffrey K Weber
- IBM Thomas J. Watson Research Center, Yorktown Heights, NY 10598USA
| | - Joseph A Morrone
- IBM Thomas J. Watson Research Center, Yorktown Heights, NY 10598USA
| | - Seung-gu Kang
- IBM Thomas J. Watson Research Center, Yorktown Heights, NY 10598USA
| | - Leili Zhang
- IBM Thomas J. Watson Research Center, Yorktown Heights, NY 10598USA
| | - Lijun Lang
- IBM Thomas J. Watson Research Center, Yorktown Heights, NY 10598USA
| | - Diego Chowell
- Precision Immunology Institute, Icahn School of Medicine at Mount Sinai, New York, NY 10029
| | - Chirag Krishna
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Tien Huynh
- IBM Thomas J. Watson Research Center, Yorktown Heights, NY 10598USA
| | - Prerana Parthasarathy
- Center for Immunotherapy and Precision Immuno-Oncology, Cleveland Clinic, Cleveland, OH 44195USA
- Lerner Research Institute, Cleveland Clinic, Cleveland, OH 44015USA
| | - Binquan Luan
- IBM Thomas J. Watson Research Center, Yorktown Heights, NY 10598USA
| | - Tyler J Alban
- Center for Immunotherapy and Precision Immuno-Oncology, Cleveland Clinic, Cleveland, OH 44195USA
- Lerner Research Institute, Cleveland Clinic, Cleveland, OH 44015USA
| | - Wendy D Cornell
- IBM Thomas J. Watson Research Center, Yorktown Heights, NY 10598USA
| | - Timothy A Chan
- Center for Immunotherapy and Precision Immuno-Oncology, Cleveland Clinic, Cleveland, OH 44195USA
- Lerner Research Institute, Cleveland Clinic, Cleveland, OH 44015USA
- Department of Radiation Oncology, Memorial Sloan Kettering Cancer Center, New York, NY 10065USA
- Taussig Cancer Institute, Cleveland Clinic, Cleveland, OH 44015USA
- National Center for Regenerative Medicine, Cleveland Clinic, Cleveland, OH 44015USA
| |
Collapse
|
5
|
Torres F, Stadler G, Kwiatkowski W, Orts J. A Benchmark Study of Protein-Fragment Complex Structure Calculations with NMR 2. Int J Mol Sci 2023; 24:14329. [PMID: 37762631 PMCID: PMC10531959 DOI: 10.3390/ijms241814329] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2023] [Revised: 09/05/2023] [Accepted: 09/07/2023] [Indexed: 09/29/2023] Open
Abstract
Protein-fragment complex structures are particularly sought after in medicinal chemistry to rationally design lead molecules. These structures are usually derived using X-ray crystallography, but the failure rate is non-neglectable. NMR is a possible alternative for the calculation of weakly interacting complexes. Nevertheless, the time-consuming protein signal assignment step remains a barrier to its routine application. NMR Molecular Replacement (NMR2) is a versatile and rapid method that enables the elucidation of a protein-ligand complex structure. It has been successfully applied to peptides, drug-like molecules, and more recently to fragments. Due to the small size of the fragments, ca < 300 Da, solving the structures of the protein-fragment complexes is particularly challenging. Here, we present the expected performances of NMR2 when applied to protein-fragment complexes. The NMR2 approach has been benchmarked with the SERAPhic fragment library to identify the technical challenges in protein-fragment NMR structure calculation. A straightforward strategy is proposed to increase the method's success rate further. The presented work confirms that NMR2 is an alternative method to X-ray crystallography for solving protein-fragment complex structures.
Collapse
Affiliation(s)
- Felix Torres
- Institute of Molecular Physical Science, Swiss Federal Institute of Technology, ETH-Hönggerberg, 8093 Zurich, Switzerland (G.S.); (W.K.)
| | - Gabriela Stadler
- Institute of Molecular Physical Science, Swiss Federal Institute of Technology, ETH-Hönggerberg, 8093 Zurich, Switzerland (G.S.); (W.K.)
| | - Witek Kwiatkowski
- Institute of Molecular Physical Science, Swiss Federal Institute of Technology, ETH-Hönggerberg, 8093 Zurich, Switzerland (G.S.); (W.K.)
| | - Julien Orts
- Department of Pharmaceutical Sciences, Faculty of Life Sciences, University of Vienna, Josef-Holaubek-Platz 2, 1090 Vienna, Austria
| |
Collapse
|
6
|
Abstract
Drug development is a wide scientific field that faces many challenges these days. Among them are extremely high development costs, long development times, and a small number of new drugs that are approved each year. New and innovative technologies are needed to solve these problems that make the drug discovery process of small molecules more time and cost efficient, and that allow previously undruggable receptor classes to be targeted, such as protein-protein interactions. Structure-based virtual screenings (SBVSs) have become a leading contender in this context. In this review, we give an introduction to the foundations of SBVSs and survey their progress in the past few years with a focus on ultralarge virtual screenings (ULVSs). We outline key principles of SBVSs, recent success stories, new screening techniques, available deep learning-based docking methods, and promising future research directions. ULVSs have an enormous potential for the development of new small-molecule drugs and are already starting to transform early-stage drug discovery.
Collapse
Affiliation(s)
- Christoph Gorgulla
- Harvard Medical School and Physics Department, Harvard University, Boston, Massachusetts, USA;
- Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, Massachusetts, USA
- Current affiliation: Department of Structural Biology, St. Jude Children's Research Hospital, Memphis, Tennessee, USA
| |
Collapse
|
7
|
Shen C, Zhang X, Hsieh CY, Deng Y, Wang D, Xu L, Wu J, Li D, Kang Y, Hou T, Pan P. A generalized protein-ligand scoring framework with balanced scoring, docking, ranking and screening powers. Chem Sci 2023; 14:8129-8146. [PMID: 37538816 PMCID: PMC10395315 DOI: 10.1039/d3sc02044d] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2023] [Accepted: 07/03/2023] [Indexed: 08/05/2023] Open
Abstract
Applying machine learning algorithms to protein-ligand scoring functions has aroused widespread attention in recent years due to the high predictive accuracy and affordable computational cost. Nevertheless, most machine learning-based scoring functions are only applicable to a specific task, e.g., binding affinity prediction, binding pose prediction or virtual screening, suggesting that the development of a scoring function with balanced performance in all critical tasks remains a grand challenge. To this end, we propose a novel parameterization strategy by introducing an adjustable binding affinity term that represents the correlation between the predicted outcomes and experimental data into the training of mixture density network. The resulting residue-atom distance likelihood potential not only retains the superior docking and screening power over all the other state-of-the-art approaches, but also achieves a remarkable improvement in scoring and ranking performance. We emphatically explore the impacts of several key elements on prediction accuracy as well as the task preference, and demonstrate that the performance of scoring/ranking and docking/screening tasks of a certain model could be well balanced through an appropriate manner. Overall, our study highlights the potential utility of our innovative parameterization strategy as well as the resulting scoring framework in future structure-based drug design.
Collapse
Affiliation(s)
- Chao Shen
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
- State Key Lab of CAD&CG, Zhejiang University Hangzhou 310058 Zhejiang China
- School of Public Health, Zhejiang University Hangzhou 310058 Zhejiang China
- CarbonSilicon AI Technology Co., Ltd Hangzhou 310018 Zhejiang China
| | - Xujun Zhang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| | - Chang-Yu Hsieh
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| | - Yafeng Deng
- CarbonSilicon AI Technology Co., Ltd Hangzhou 310018 Zhejiang China
| | - Dong Wang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| | - Lei Xu
- Institute of Bioinformatics and Medical Engineering, School of Electrical and Information Engineering, Jiangsu University of Technology Changzhou 213001 China
| | - Jian Wu
- School of Public Health, Zhejiang University Hangzhou 310058 Zhejiang China
| | - Dan Li
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| | - Yu Kang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| | - Tingjun Hou
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
- State Key Lab of CAD&CG, Zhejiang University Hangzhou 310058 Zhejiang China
| | - Peichen Pan
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| |
Collapse
|
8
|
Park H, Hong S, Lee M, Kang S, Brahma R, Cho KH, Shin JM. AiKPro: deep learning model for kinome-wide bioactivity profiling using structure-based sequence alignments and molecular 3D conformer ensemble descriptors. Sci Rep 2023; 13:10268. [PMID: 37355672 PMCID: PMC10290719 DOI: 10.1038/s41598-023-37456-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2023] [Accepted: 06/22/2023] [Indexed: 06/26/2023] Open
Abstract
The discovery of selective and potent kinase inhibitors is crucial for the treatment of various diseases, but the process is challenging due to the high structural similarity among kinases. Efficient kinome-wide bioactivity profiling is essential for understanding kinase function and identifying selective inhibitors. In this study, we propose AiKPro, a deep learning model that combines structure-validated multiple sequence alignments and molecular 3D conformer ensemble descriptors to predict kinase-ligand binding affinities. Our deep learning model uses an attention-based mechanism to capture complex patterns in the interactions between the kinase and the ligand. To assess the performance of AiKPro, we evaluated the impact of descriptors, the predictability for untrained kinases and compounds, and kinase activity profiling based on odd ratios. Our model, AiKPro, shows good Pearson's correlation coefficients of 0.88 and 0.87 for the test set and for the untrained sets of compounds, respectively, which also shows the robustness of the model. AiKPro shows good kinase-activity profiles across the kinome, potentially facilitating the discovery of novel interactions and selective inhibitors. Our approach holds potential implications for the discovery of novel, selective kinase inhibitors and guiding rational drug design.
Collapse
Affiliation(s)
- Hyejin Park
- AZothBio Inc., Rm. DA724 Hyundai Knowledge Industry Center, Hanam-si, Gyeonggi-do, Republic of Korea
| | - Sujeong Hong
- AZothBio Inc., Rm. DA724 Hyundai Knowledge Industry Center, Hanam-si, Gyeonggi-do, Republic of Korea
| | - Myeonghun Lee
- AZothBio Inc., Rm. DA724 Hyundai Knowledge Industry Center, Hanam-si, Gyeonggi-do, Republic of Korea
| | - Sungil Kang
- AZothBio Inc., Rm. DA724 Hyundai Knowledge Industry Center, Hanam-si, Gyeonggi-do, Republic of Korea
| | - Rahul Brahma
- School of Systems Biomedical Science, Soongsil University, Seoul, Republic of Korea
| | - Kwang-Hwi Cho
- School of Systems Biomedical Science, Soongsil University, Seoul, Republic of Korea
| | - Jae-Min Shin
- AZothBio Inc., Rm. DA724 Hyundai Knowledge Industry Center, Hanam-si, Gyeonggi-do, Republic of Korea.
| |
Collapse
|
9
|
Hu J, Yu W, Pang C, Jin J, Pham NT, Manavalan B, Wei L. DrugormerDTI: Drug Graphormer for drug-target interaction prediction. Comput Biol Med 2023; 161:106946. [PMID: 37244151 DOI: 10.1016/j.compbiomed.2023.106946] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2023] [Revised: 03/29/2023] [Accepted: 04/15/2023] [Indexed: 05/29/2023]
Abstract
Drug-target interactions (DTI) prediction is a crucial task in drug discovery. Existing computational methods accelerate the drug discovery in this respect. However, most of them suffer from low feature representation ability, significantly affecting the predictive performance. To address the problem, we propose a novel neural network architecture named DrugormerDTI, which uses Graph Transformer to learn both sequential and topological information through the input molecule graph and Resudual2vec to learn the underlying relation between residues from proteins. By conducting ablation experiments, we verify the importance of each part of the DrugormerDTI. We also demonstrate the good feature extraction and expression capabilities of our model via comparing the mapping results of the attention layer and molecular docking results. Experimental results show that our proposed model performs better than baseline methods on four benchmarks. We demonstrate that the introduction of Graph Transformer and the design of residue are appropriate for drug-target prediction.
Collapse
Affiliation(s)
- Jiayue Hu
- School of Software, Shandong University, Jinan, China; Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan, China
| | - Wang Yu
- School of Software, Shandong University, Jinan, China; Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan, China
| | - Chao Pang
- School of Software, Shandong University, Jinan, China; Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan, China
| | - Junru Jin
- School of Software, Shandong University, Jinan, China; Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan, China
| | - Nhat Truong Pham
- Computational Biology and Bioinformatics Laboratory, Department of Integrative Biotechnology, College of Biotechnology and Bioengineering, Sungkyunkwan University, Suwon, 16419, Gyeonggi-do, South Korea
| | - Balachandran Manavalan
- Computational Biology and Bioinformatics Laboratory, Department of Integrative Biotechnology, College of Biotechnology and Bioengineering, Sungkyunkwan University, Suwon, 16419, Gyeonggi-do, South Korea.
| | - Leyi Wei
- School of Software, Shandong University, Jinan, China; Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan, China.
| |
Collapse
|
10
|
Cavasotto CN, Di Filippo JI. The Impact of Supervised Learning Methods in Ultralarge High-Throughput Docking. J Chem Inf Model 2023; 63:2267-2280. [PMID: 37036491 DOI: 10.1021/acs.jcim.2c01471] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/11/2023]
Abstract
Structure-based virtual screening methods are, nowadays, one of the key pillars of computational drug discovery. In recent years, a series of studies have reported docking-based virtual screening campaigns of large databases ranging from hundreds to thousands of millions compounds, further identifying novel hits after experimental validation. As these larg-scale efforts are not generally accessible, machine learning-based protocols have emerged to accelerate the identification of virtual hits within an ultralarge chemical space, reaching impressive reductions in computational time. Herein, we illustrate the motivation and the problem behind the screening of large databases, providing an overview of key concepts and essential applications of machine learning-accelerated protocols, specifically concerning supervised learning methods. We also discuss where the field stands with these novel developments, highlighting possible insights for future studies.
Collapse
Affiliation(s)
- Claudio N Cavasotto
- Computational Drug Design and Biomedical Informatics Laboratory, Instituto de Investigaciones en Medicina Traslacional (IIMT), CONICET-Universidad Austral, Av. Juan Domingo Perón 1500, B1629AHJ Pilar, Argentina
- Facultad de Ciencias Biomédicas, and Facultad de Ingeniería, Universidad Austral, Av. Juan Domingo Perón 1500, B1629AHJ Pilar, Argentina
- Austral Institute for Applied Artificial Intelligence, Universidad Austral, Av. Juan Domingo Perón 1500, B1629AHJ Pilar, Argentina
| | - Juan I Di Filippo
- Computational Drug Design and Biomedical Informatics Laboratory, Instituto de Investigaciones en Medicina Traslacional (IIMT), CONICET-Universidad Austral, Av. Juan Domingo Perón 1500, B1629AHJ Pilar, Argentina
- Facultad de Ciencias Biomédicas, and Facultad de Ingeniería, Universidad Austral, Av. Juan Domingo Perón 1500, B1629AHJ Pilar, Argentina
- Austral Institute for Applied Artificial Intelligence, Universidad Austral, Av. Juan Domingo Perón 1500, B1629AHJ Pilar, Argentina
| |
Collapse
|
11
|
Yang Z, Zhong W, Lv Q, Dong T, Yu-Chian Chen C. Geometric Interaction Graph Neural Network for Predicting Protein-Ligand Binding Affinities from 3D Structures (GIGN). J Phys Chem Lett 2023; 14:2020-2033. [PMID: 36794930 DOI: 10.1021/acs.jpclett.2c03906] [Citation(s) in RCA: 17] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/18/2023]
Abstract
Predicting protein-ligand binding affinities (PLAs) is a core problem in drug discovery. Recent advances have shown great potential in applying machine learning (ML) for PLA prediction. However, most of them omit the 3D structures of complexes and physical interactions between proteins and ligands, which are considered essential to understanding the binding mechanism. This paper proposes a geometric interaction graph neural network (GIGN) that incorporates 3D structures and physical interactions for predicting protein-ligand binding affinities. Specifically, we design a heterogeneous interaction layer that unifies covalent and noncovalent interactions into the message passing phase to learn node representations more effectively. The heterogeneous interaction layer also follows fundamental biological laws, including invariance to translations and rotations of the complexes, thus avoiding expensive data augmentation strategies. GIGN achieves state-of-the-art performance on three external test sets. Moreover, by visualizing learned representations of protein-ligand complexes, we show that the predictions of GIGN are biologically meaningful.
Collapse
Affiliation(s)
- Ziduo Yang
- Intelligent Medical Research Center, School of Intelligent Systems Engineering, Sun Yat-sen University, Shenzhen, Guangdong 510275, China
| | - Weihe Zhong
- Intelligent Medical Research Center, School of Intelligent Systems Engineering, Sun Yat-sen University, Shenzhen, Guangdong 510275, China
| | - Qiujie Lv
- Intelligent Medical Research Center, School of Intelligent Systems Engineering, Sun Yat-sen University, Shenzhen, Guangdong 510275, China
| | - Tiejun Dong
- Intelligent Medical Research Center, School of Intelligent Systems Engineering, Sun Yat-sen University, Shenzhen, Guangdong 510275, China
| | - Calvin Yu-Chian Chen
- Intelligent Medical Research Center, School of Intelligent Systems Engineering, Sun Yat-sen University, Shenzhen, Guangdong 510275, China
- Department of Medical Research, China Medical University Hospital, Taichung 40447, Taiwan
- Department of Bioinformatics and Medical Engineering, Asia University, Taichung 41354, Taiwan
| |
Collapse
|
12
|
Rehman AU, Khurshid B, Ali Y, Rasheed S, Wadood A, Ng HL, Chen HF, Wei Z, Luo R, Zhang J. Computational approaches for the design of modulators targeting protein-protein interactions. Expert Opin Drug Discov 2023; 18:315-333. [PMID: 36715303 PMCID: PMC10149343 DOI: 10.1080/17460441.2023.2171396] [Citation(s) in RCA: 11] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2022] [Accepted: 01/18/2023] [Indexed: 01/31/2023]
Abstract
BACKGROUND Protein-protein interactions (PPIs) are intriguing targets for designing novel small-molecule inhibitors. The role of PPIs in various infectious and neurodegenerative disorders makes them potential therapeutic targets . Despite being portrayed as undruggable targets, due to their flat surfaces, disorderedness, and lack of grooves. Recent progresses in computational biology have led researchers to reconsider PPIs in drug discovery. AREAS COVERED In this review, we introduce in-silico methods used to identify PPI interfaces and present an in-depth overview of various computational methodologies that are successfully applied to annotate the PPIs. We also discuss several successful case studies that use computational tools to understand PPIs modulation and their key roles in various physiological processes. EXPERT OPINION Computational methods face challenges due to the inherent flexibility of proteins, which makes them expensive, and result in the use of rigid models. This problem becomes more significant in PPIs due to their flexible and flat interfaces. Computational methods like molecular dynamics (MD) simulation and machine learning can integrate the chemical structure data into biochemical and can be used for target identification and modulation. These computational methodologies have been crucial in understanding the structure of PPIs, designing PPI modulators, discovering new drug targets, and predicting treatment outcomes.
Collapse
Affiliation(s)
- Ashfaq Ur Rehman
- Departments of Molecular Biology and Biochemistry, Chemical and Biomolecular Engineering, Materials Science and Engineering, and Biomedical Engineering, Graduate Program in Chemical and Materials Physics, University of California Irvine, Irvine, California, USA
- Key Laboratory of Cell Differentiation and Apoptosis of Chinese Ministry of Education, Medicinal Bioinformatics Center, Shanghai Jiao-Tong University School of Medicine, Shanghai, Zhejiang, China
| | - Beenish Khurshid
- Department of Biochemistry, Abdul Wali Khan University Mardan, Pakistan
| | - Yasir Ali
- National Center for Bioinformatics, Quaid-e-Azam University, Islamabad, Pakistan
| | - Salman Rasheed
- National Center for Bioinformatics, Quaid-e-Azam University, Islamabad, Pakistan
| | - Abdul Wadood
- Department of Biochemistry, Abdul Wali Khan University Mardan, Pakistan
| | - Ho-Leung Ng
- Department of Biochemistry and Molecular Biophysics, Kansas State University, Manhattan, Kansas, USA
| | - Hai-Feng Chen
- State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory of Metabolic & Developmental Sciences, Department of Bioinformatics and Biostatistics, National Experimental Teaching Center for Life Sciences and Biotechnology, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, Zhejiang, China
| | - Zhiqiang Wei
- Medicinal Chemistry and Bioinformatics Center, Ocean University of China, Qingdao, Shandong, China
| | - Ray Luo
- Departments of Molecular Biology and Biochemistry, Chemical and Biomolecular Engineering, Materials Science and Engineering, and Biomedical Engineering, Graduate Program in Chemical and Materials Physics, University of California Irvine, Irvine, California, USA
| | - Jian Zhang
- Key Laboratory of Cell Differentiation and Apoptosis of Chinese Ministry of Education, Medicinal Bioinformatics Center, Shanghai Jiao-Tong University School of Medicine, Shanghai, Zhejiang, China
- School of Pharmaceutical Sciences, Zhengzhou University, Zhengzhou, Henan, China
| |
Collapse
|
13
|
Durairaj J, de Ridder D, van Dijk AD. Beyond sequence: Structure-based machine learning. Comput Struct Biotechnol J 2022; 21:630-643. [PMID: 36659927 PMCID: PMC9826903 DOI: 10.1016/j.csbj.2022.12.039] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2022] [Revised: 12/21/2022] [Accepted: 12/21/2022] [Indexed: 12/31/2022] Open
Abstract
Recent breakthroughs in protein structure prediction demarcate the start of a new era in structural bioinformatics. Combined with various advances in experimental structure determination and the uninterrupted pace at which new structures are published, this promises an age in which protein structure information is as prevalent and ubiquitous as sequence. Machine learning in protein bioinformatics has been dominated by sequence-based methods, but this is now changing to make use of the deluge of rich structural information as input. Machine learning methods making use of structures are scattered across literature and cover a number of different applications and scopes; while some try to address questions and tasks within a single protein family, others aim to capture characteristics across all available proteins. In this review, we look at the variety of structure-based machine learning approaches, how structures can be used as input, and typical applications of these approaches in protein biology. We also discuss current challenges and opportunities in this all-important and increasingly popular field.
Collapse
Affiliation(s)
- Janani Durairaj
- Biozentrum, University of Basel, Basel, Switzerland
- Bioinformatics Group, Department of Plant Sciences, Wageningen University and Research, Wageningen, the Netherlands
| | - Dick de Ridder
- Bioinformatics Group, Department of Plant Sciences, Wageningen University and Research, Wageningen, the Netherlands
| | - Aalt D.J. van Dijk
- Bioinformatics Group, Department of Plant Sciences, Wageningen University and Research, Wageningen, the Netherlands
| |
Collapse
|
14
|
Developing a Naïve Bayesian Classification Model with PI3Kγ structural features for virtual screening against PI3Kγ: Combining molecular docking and pharmacophore based on multiple PI3Kγ conformations. Eur J Med Chem 2022; 244:114824. [DOI: 10.1016/j.ejmech.2022.114824] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2022] [Revised: 09/28/2022] [Accepted: 10/01/2022] [Indexed: 11/21/2022]
|
15
|
Boyles F, Deane CM, Morris GM. Learning from Docked Ligands: Ligand-Based Features Rescue Structure-Based Scoring Functions When Trained on Docked Poses. J Chem Inf Model 2022; 62:5329-5341. [PMID: 34469150 DOI: 10.1021/acs.jcim.1c00096] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Machine learning scoring functions for protein-ligand binding affinity have been found to consistently outperform classical scoring functions when trained and tested on crystal structures of bound protein-ligand complexes. However, it is less clear how these methods perform when applied to docked poses of complexes. We explore how the use of docked rather than crystallographic poses for both training and testing affects the performance of machine learning scoring functions. Using the PDBbind Core Sets as benchmarks, we show that the performance of a structure-based machine learning scoring function trained and tested on docked poses is lower than that of the same scoring function trained and tested on crystallographic poses. We construct a hybrid scoring function by combining both structure-based and ligand-based features, and show that its ability to predict binding affinity using docked poses is comparable to that of purely structure-based scoring functions trained and tested on crystal poses. We also present a new, freely available validation set─the Updated DUD-E Diverse Subset─for binding affinity prediction using data from DUD-E and ChEMBL. Despite strong performance on docked poses of the PDBbind Core Sets, we find that our hybrid scoring function sometimes generalizes poorly to a protein target not represented in the training set, demonstrating the need for improved scoring functions and additional validation benchmarks.
Collapse
Affiliation(s)
- Fergus Boyles
- Department of Statistics, University of Oxford, 24-29 St Giles', Oxford, OX1 3LB, United Kingdom
| | - Charlotte M Deane
- Department of Statistics, University of Oxford, 24-29 St Giles', Oxford, OX1 3LB, United Kingdom
| | - Garrett M Morris
- Department of Statistics, University of Oxford, 24-29 St Giles', Oxford, OX1 3LB, United Kingdom
| |
Collapse
|
16
|
Réau M, Renaud N, Xue LC, Bonvin AMJJ. DeepRank-GNN: a graph neural network framework to learn patterns in protein-protein interfaces. Bioinformatics 2022; 39:6845451. [PMID: 36420989 PMCID: PMC9805592 DOI: 10.1093/bioinformatics/btac759] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2021] [Revised: 10/19/2022] [Accepted: 11/23/2022] [Indexed: 11/25/2022] Open
Abstract
MOTIVATION Gaining structural insights into the protein-protein interactome is essential to understand biological phenomena and extract knowledge for rational drug design or protein engineering. We have previously developed DeepRank, a deep-learning framework to facilitate pattern learning from protein-protein interfaces using convolutional neural network (CNN) approaches. However, CNN is not rotation invariant and data augmentation is required to desensitize the network to the input data orientation which dramatically impairs the computation performance. Representing protein-protein complexes as atomic- or residue-scale rotation invariant graphs instead enables using graph neural networks (GNN) approaches, bypassing those limitations. RESULTS We have developed DeepRank-GNN, a framework that converts protein-protein interfaces from PDB 3D coordinates files into graphs that are further provided to a pre-defined or user-defined GNN architecture to learn problem-specific interaction patterns. DeepRank-GNN is designed to be highly modularizable, easily customized and is wrapped into a user-friendly python3 package. Here, we showcase DeepRank-GNN's performance on two applications using a dedicated graph interaction neural network: (i) the scoring of docking poses and (ii) the discriminating of biological and crystal interfaces. In addition to the highly competitive performance obtained in those tasks as compared to state-of-the-art methods, we show a significant improvement in speed and storage requirement using DeepRank-GNN as compared to DeepRank. AVAILABILITY AND IMPLEMENTATION DeepRank-GNN is freely available from https://github.com/DeepRank/DeepRank-GNN. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | | | - Li C Xue
- Center for Molecular and Biomolecular Informatics, Radboudumc, Nijmegen 6525 GA, The Netherlands
| | | |
Collapse
|
17
|
Ahmadi S, Abdolmaleki A, Jebeli Javan M. In silico study of natural antioxidants. VITAMINS AND HORMONES 2022; 121:1-43. [PMID: 36707131 DOI: 10.1016/bs.vh.2022.09.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Antioxidants are the body's defense system against the damage of reactive oxygen species, which are usually produced in the body through various physiological processes. There are various sources of these antioxidants such as endogenous antioxidants in the body and exogenous food sources. This chapter provides important information on methods used to investigate antioxidant activity and sources of plant antioxidants. Over the past two decades, numerous studies have demonstrated the importance of in silico research in the development of novel natural and synthesized antioxidants. In silico methods such as quantitative structure-activity relationships (QSAR), pharmacophore, docking, and virtual screenings are play critical roles in designing effective antioxidants that may be synthesized and tested later. This chapter introduces the available in silico approaches for different classes of antioxidants. Many successful applications of in silico methods in the development and design of novel antioxidants are thoroughly discussed. The QSAR, pharmacophore, molecular docking techniques, and virtual screenings process summarized here would help readers to find out the proper mechanism for the interaction between the free radicals and antioxidant compounds. Furthermore, this chapter focuses on introducing new QSAR models in combination with other in silico methods to predict antioxidants activity and design more active antioxidants. In silico studies are essential to explore largely unknown plant tissue, food sources for antioxidant synthesis, as well as saving time and money in such studies.
Collapse
Affiliation(s)
- Shahin Ahmadi
- Department of Chemistry, Faculty of Pharmaceutical Chemistry, Tehran Medical Sciences, Islamic Azad University, Tehran, Iran.
| | - Azizeh Abdolmaleki
- Department of Chemistry, Tuyserkan Branch, Islamic Azad University, Tuyserkan, Iran
| | - Marjan Jebeli Javan
- Department of Chemistry, Faculty of Pharmaceutical Chemistry, Tehran Medical Sciences, Islamic Azad University, Tehran, Iran
| |
Collapse
|
18
|
Krasoulis A, Antonopoulos N, Pitsikalis V, Theodorakis S. DENVIS: Scalable and High-Throughput Virtual Screening Using Graph Neural Networks with Atomic and Surface Protein Pocket Features. J Chem Inf Model 2022; 62:4642-4659. [PMID: 36154119 DOI: 10.1021/acs.jcim.2c01057] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Computational methods for virtual screening can dramatically accelerate early-stage drug discovery by identifying potential hits for a specified target. Docking algorithms traditionally use physics-based simulations to address this challenge by estimating the binding orientation of a query protein-ligand pair and a corresponding binding affinity score. Over the recent years, classical and modern machine learning architectures have shown potential for outperforming traditional docking algorithms. Nevertheless, most learning-based algorithms still rely on the availability of the protein-ligand complex binding pose, typically estimated via docking simulations, which leads to a severe slowdown of the overall virtual screening process. A family of algorithms processing target information at the amino acid sequence level avoid this requirement, however, at the cost of processing protein data at a higher representation level. We introduce deep neural virtual screening (DENVIS), an end-to-end pipeline for virtual screening using graph neural networks (GNNs). By performing experiments on two benchmark databases, we show that our method performs competitively to several docking-based, machine learning-based, and hybrid docking/machine learning-based algorithms. By avoiding the intermediate docking step, DENVIS exhibits several orders of magnitude faster screening times (i.e., higher throughput) than both docking-based and hybrid models. When compared to an amino acid sequence-based machine learning model with comparable screening times, DENVIS achieves dramatically better performance. Some key elements of our approach include protein pocket modeling using a combination of atomic and surface features, the use of model ensembles, and data augmentation via artificial negative sampling during model training. In summary, DENVIS achieves competitive to state-of-the-art virtual screening performance, while offering the potential to scale to billions of molecules using minimal computational resources.
Collapse
|
19
|
Kalasariya HS, Patel NB, Gacem A, Alsufyani T, Reece LM, Yadav VK, Awwad NS, Ibrahium HA, Ahn Y, Yadav KK, Jeon BH. Marine Alga Ulva fasciata-Derived Molecules for the Potential Treatment of SARS-CoV-2: An In Silico Approach. Mar Drugs 2022; 20:md20090586. [PMID: 36135775 PMCID: PMC9506351 DOI: 10.3390/md20090586] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2022] [Revised: 09/08/2022] [Accepted: 09/09/2022] [Indexed: 12/13/2022] Open
Abstract
SARS-CoV-2 is the causative agent of the COVID-19 pandemic. This in silico study aimed to elucidate therapeutic efficacies against SARS-CoV-2 of phyco-compounds from the seaweed, Ulva fasciata. Twelve phyco-compounds were isolated and toxicity was analyzed by VEGA QSAR. Five compounds were found to be nonmutagenic, noncarcinogenic and nontoxic. Moreover, antiviral activity was evaluated by PASS. Binding affinities of five of these therapeutic compounds were predicted to possess probable biological activity. Fifteen SARS-CoV-2 target proteins were analyzed by the AutoDock Vina program for molecular docking binding energy analysis and the 6Y84 protein was determined to possess optimal binding affinities. The Desmond program from Schrödinger’s suite was used to study high performance molecular dynamic simulation properties for 3,7,11,15-Tetramethyl-2-hexadecen-1-ol—6Y84 for better drug evaluation. The ligand with 6Y84 had stronger binding affinities (−5.9 kcal/mol) over two standard drugs, Chloroquine (−5.6 kcal/mol) and Interferon α-2b (−3.8 kcal/mol). Swiss ADME calculated physicochemical/lipophilicity/water solubility/pharmacokinetic properties for 3,7,11,15-Tetramethyl-2-hexadecen-1-ol, showing that this therapeutic agent may be effective against SARS-CoV-2.
Collapse
Affiliation(s)
- Haresh S. Kalasariya
- Centre for Natural Products Discovery, School of Pharmacy and Biomolecular Sciences, Liverpool John Moores University, Byrom Street, Liverpool L3 3AF, UK
| | - Nikunj B. Patel
- Microbiology Department, Sankalchand Patel University, Visnagar 384315, India
| | - Amel Gacem
- Department of Physics, Faculty of Sciences, University 20 Août 1955, Skikda 21000, Algeria
| | - Taghreed Alsufyani
- Department of Chemistry, College of Science, Taif University, P.O. Box 11099, Taif 21944, Saudi Arabia
| | - Lisa M. Reece
- Reece Life Science Consulting Agency, 819 N Amburn Rd, Texas City, TX 77591, USA
| | - Virendra Kumar Yadav
- Department of Biosciences, School of Liberal Arts & Sciences, Mody University of Science and Technology, Lakshmangarh, Sikar 332311, India
| | - Nasser S. Awwad
- Department of Chemistry, King Khalid University, P.O. Box 9004, Abha 61413, Saudi Arabia
| | - Hala A. Ibrahium
- Biology Department, Faculty of Science, King Khalid University, P.O. Box 9004, Abha 61413, Saudi Arabia
- Department of Semi Pilot Plant, Nuclear Materials Authority, El Maadi, P.O. Box 530, Cairo 11381, Egypt
| | - Yongtae Ahn
- Department of Earth Resources & Environmental Engineering, Hanyang University, 222-Wangsimni-ro, Seongdong-gu, Seoul 04763, Korea
| | - Krishna Kumar Yadav
- Faculty of Science and Technology, Madhyanchal Professional University, Ratibad, Bhopal 462044, India
- Correspondence: (K.K.Y.); (B.-H.J.)
| | - Byong-Hun Jeon
- Department of Earth Resources & Environmental Engineering, Hanyang University, 222-Wangsimni-ro, Seongdong-gu, Seoul 04763, Korea
- Correspondence: (K.K.Y.); (B.-H.J.)
| |
Collapse
|
20
|
Jiang H, Wang J, Cong W, Huang Y, Ramezani M, Sarma A, Dokholyan NV, Mahdavi M, Kandemir MT. Predicting Protein-Ligand Docking Structure with Graph Neural Network. J Chem Inf Model 2022; 62:2923-2932. [PMID: 35699430 PMCID: PMC10279412 DOI: 10.1021/acs.jcim.2c00127] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023]
Abstract
Modern day drug discovery is extremely expensive and time consuming. Although computational approaches help accelerate and decrease the cost of drug discovery, existing computational software packages for docking-based drug discovery suffer from both low accuracy and high latency. A few recent machine learning-based approaches have been proposed for virtual screening by improving the ability to evaluate protein-ligand binding affinity, but such methods rely heavily on conventional docking software to sample docking poses, which results in excessive execution latencies. Here, we propose and evaluate a novel graph neural network (GNN)-based framework, MedusaGraph, which includes both pose-prediction (sampling) and pose-selection (scoring) models. Unlike the previous machine learning-centric studies, MedusaGraph generates the docking poses directly and achieves from 10 to 100 times speedup compared to state-of-the-art approaches, while having a slightly better docking accuracy.
Collapse
Affiliation(s)
- Huaipan Jiang
- Department of Computer Science and Engineering, Pennsylvania State University, State College, Pennsylvania 16802, United States
| | - Jian Wang
- Departments of Pharmacology and Biochemistry and Molecular Biology, Pennsylvania State College of Medicine, Hershey, Pennsylvania 17033, United States
| | - Weilin Cong
- Department of Computer Science and Engineering, Pennsylvania State University, State College, Pennsylvania 16802, United States
| | - Yihe Huang
- Department of Computer Science and Engineering, Pennsylvania State University, State College, Pennsylvania 16802, United States
| | - Morteza Ramezani
- Department of Computer Science and Engineering, Pennsylvania State University, State College, Pennsylvania 16802, United States
| | - Anup Sarma
- Department of Computer Science and Engineering, Pennsylvania State University, State College, Pennsylvania 16802, United States
| | - Nikolay V Dokholyan
- Departments of Pharmacology and Biochemistry and Molecular Biology, Pennsylvania State College of Medicine, Hershey, Pennsylvania 17033, United States
- Departments of Chemistry and Biomedical Engineering, Pennsylvania State University, State College, Pennsylvania 16802, United States
| | - Mehrdad Mahdavi
- Department of Computer Science and Engineering, Pennsylvania State University, State College, Pennsylvania 16802, United States
| | - Mahmut T Kandemir
- Department of Computer Science and Engineering, Pennsylvania State University, State College, Pennsylvania 16802, United States
| |
Collapse
|
21
|
Meli R, Morris GM, Biggin PC. Scoring Functions for Protein-Ligand Binding Affinity Prediction using Structure-Based Deep Learning: A Review. FRONTIERS IN BIOINFORMATICS 2022; 2:885983. [PMID: 36187180 PMCID: PMC7613667 DOI: 10.3389/fbinf.2022.885983] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2022] [Accepted: 05/11/2022] [Indexed: 01/01/2023] Open
Abstract
The rapid and accurate in silico prediction of protein-ligand binding free energies or binding affinities has the potential to transform drug discovery. In recent years, there has been a rapid growth of interest in deep learning methods for the prediction of protein-ligand binding affinities based on the structural information of protein-ligand complexes. These structure-based scoring functions often obtain better results than classical scoring functions when applied within their applicability domain. Here we review structure-based scoring functions for binding affinity prediction based on deep learning, focussing on different types of architectures, featurization strategies, data sets, methods for training and evaluation, and the role of explainable artificial intelligence in building useful models for real drug-discovery applications.
Collapse
Affiliation(s)
- Rocco Meli
- Department of Biochemistry, University of Oxford, Oxford, United Kingdom
| | - Garrett M. Morris
- Department of Statistics, University of Oxford, Oxford, United Kingdom
| | - Philip C. Biggin
- Department of Biochemistry, University of Oxford, Oxford, United Kingdom
| |
Collapse
|
22
|
Moon S, Zhung W, Yang S, Lim J, Kim WY. PIGNet: a physics-informed deep learning model toward generalized drug-target interaction predictions. Chem Sci 2022; 13:3661-3673. [PMID: 35432900 PMCID: PMC8966633 DOI: 10.1039/d1sc06946b] [Citation(s) in RCA: 48] [Impact Index Per Article: 24.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2021] [Accepted: 02/06/2022] [Indexed: 12/21/2022] Open
Abstract
Recently, deep neural network (DNN)-based drug–target interaction (DTI) models were highlighted for their high accuracy with affordable computational costs. Yet, the models' insufficient generalization remains a challenging problem in the practice of in silico drug discovery. We propose two key strategies to enhance generalization in the DTI model. The first is to predict the atom–atom pairwise interactions via physics-informed equations parameterized with neural networks and provides the total binding affinity of a protein–ligand complex as their sum. We further improved the model generalization by augmenting a broader range of binding poses and ligands to training data. We validated our model, PIGNet, in the comparative assessment of scoring functions (CASF) 2016, demonstrating the outperforming docking and screening powers than previous methods. Our physics-informing strategy also enables the interpretation of predicted affinities by visualizing the contribution of ligand substructures, providing insights for further ligand optimization. PIGNet, a deep neural network-based drug–target interaction model guided by physics and extensive data augmentation, shows significantly improved generalization ability and model performance.![]()
Collapse
Affiliation(s)
- Seokhyun Moon
- Department of Chemistry, KAIST 291 Daehak-ro, Yuseong-gu Daejeon 34141 Republic of Korea
| | - Wonho Zhung
- Department of Chemistry, KAIST 291 Daehak-ro, Yuseong-gu Daejeon 34141 Republic of Korea
| | - Soojung Yang
- Department of Chemistry, KAIST 291 Daehak-ro, Yuseong-gu Daejeon 34141 Republic of Korea
| | - Jaechang Lim
- HITS Incorporation 124 Teheran-ro, Gangnam-gu Seoul 06234 Republic of Korea
| | - Woo Youn Kim
- Department of Chemistry, KAIST 291 Daehak-ro, Yuseong-gu Daejeon 34141 Republic of Korea .,HITS Incorporation 124 Teheran-ro, Gangnam-gu Seoul 06234 Republic of Korea.,KI for Artificial Intelligence, KAIST 291 Daehak-ro, Yuseong-gu Daejeon 34141 Republic of Korea
| |
Collapse
|
23
|
Affinity prediction using deep learning based on SMILES input for D3R grand challenge 4. J Comput Aided Mol Des 2022; 36:225-235. [PMID: 35314897 DOI: 10.1007/s10822-022-00448-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2021] [Accepted: 03/08/2022] [Indexed: 10/18/2022]
Abstract
Modern molecular docking comprises the prediction of pose and affinity. Prediction of docking poses is required for affinity prediction when three-dimensional coordinates of the ligand have not been provided. However, a large number of feature engineering is required for existing methods. In addition, there is a need for a robust model for the sequential combination of pose and affinity prediction due to the probabilistic deviation of the ligand position issue. We propose a pipeline using a bipartite graph neural network and transfer learning trained on a re-docking dataset. We evaluated our model on the released data from drug design data resource grand challenge 4 (D3R GC4). The two target protein data provided by the challenge have different patterns. The model outperformed the best participant by 9% on the BACE target protein from stage 2. Further, our model showed competitive performance on the CatS target protein.
Collapse
|
24
|
Choudhury C, Arul Murugan N, Deva Priyakumar U. Structure-based drug repurposing: traditional and advanced AI/ML-aided methods. Drug Discov Today 2022; 27:1847-1861. [PMID: 35301148 PMCID: PMC8920090 DOI: 10.1016/j.drudis.2022.03.006] [Citation(s) in RCA: 27] [Impact Index Per Article: 13.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2021] [Revised: 02/16/2022] [Accepted: 03/10/2022] [Indexed: 02/08/2023]
Abstract
The current global health emergency in the form of the Coronavirus 2019 (COVID-19) pandemic has highlighted the need for fast, accurate, and efficient drug discovery pipelines. Traditional drug discovery projects relying on in vitro high-throughput screening (HTS) involve large investments and sophisticated experimental set-ups, affordable only to big biopharmaceutical companies. In this scenario, application of efficient state-of-the-art computational methods and modern artificial intelligence (AI)-based algorithms for rapid screening of repurposable chemical space [approved drugs and natural products (NPs) with proven pharmacokinetic profiles] to identify the initial leads is a powerful option to save resources and time. Structure-based drug repurposing is a popular in silico repurposing approach. In this review, we discuss traditional and modern AI-based computational methods and tools applied at various stages for structure-based drug discovery (SBDD) pipelines. Additionally, we highlight the role of generative models in generating molecules with scaffolds from repurposable chemical space. Teaser: This review highlights the importance of repurposable chemical space, and the contributions of conventional in silico approaches and modern machine-learning algorithms for rapid structure-based drug repurposing.
Collapse
Affiliation(s)
- Chinmayee Choudhury
- Department of Experimental Medicine and Biotechnology, Postgraduate Institute of Medical Education and Research, Sector-12, Chandigarh 160012, India
| | - N Arul Murugan
- Department of Computer Science, School of Electrical Engineering and Computer Sciences, KTH Royal Institute of Technology, S-100 44, Stockholm, Sweden; Department of Computational Biology, Indraprastha Institute of Information Technology, New Delhi 110020, India.
| | - U Deva Priyakumar
- Center for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad 500 032, India
| |
Collapse
|
25
|
Stafford KA, Anderson BM, Sorenson J, van den Bedem H. AtomNet PoseRanker: Enriching Ligand Pose Quality for Dynamic Proteins in Virtual High-Throughput Screens. J Chem Inf Model 2022; 62:1178-1189. [PMID: 35235748 PMCID: PMC8924924 DOI: 10.1021/acs.jcim.1c01250] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Abstract
Structure-based, virtual High-Throughput Screening (vHTS) methods for predicting ligand activity in drug discovery are important when there are no or relatively few known compounds that interact with a therapeutic target of interest. State-of-the-art computational vHTS necessarily relies on effective methods for pose sampling and docking and generating an accurate affinity score from the docked poses. However, proteins are dynamic; in vivo ligands bind to a conformational ensemble. In silico docking to the single conformation represented by a crystal structure can adversely affect the pose quality. Here, we introduce AtomNet PoseRanker (ANPR), a graph convolutional network trained to identify and rerank crystal-like ligand poses from a sampled ensemble of protein conformations and ligand poses. In contrast to conventional vHTS methods that incorporate receptor flexibility, a deep learning approach can internalize valid cognate and noncognate binding modes corresponding to distinct receptor conformations, thereby learning to infer and account for receptor flexibility even on single conformations. ANPR significantly enriched pose quality in docking to cognate and noncognate receptors of the PDBbind v2019 data set. Improved pose rankings that better represent experimentally observed ligand binding modes improve hit rates in vHTS campaigns and thereby advance computational drug discovery, especially for novel therapeutic targets or novel binding sites.
Collapse
Affiliation(s)
- Kate A Stafford
- Atomwise, Inc., 717 Market Street, Suite 800, San Francisco, California 94103, United States
| | - Brandon M Anderson
- Atomwise, Inc., 717 Market Street, Suite 800, San Francisco, California 94103, United States
| | - Jon Sorenson
- Atomwise, Inc., 717 Market Street, Suite 800, San Francisco, California 94103, United States
| | - Henry van den Bedem
- Atomwise, Inc., 717 Market Street, Suite 800, San Francisco, California 94103, United States.,Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, California 94158, United States
| |
Collapse
|
26
|
Khan MF, Rashid RB, Rashid MA. Identification of Natural Compounds with Analgesic and Antiinflammatory Properties Using Machine Learning and Molecular Docking Studies. LETT DRUG DES DISCOV 2022. [DOI: 10.2174/1570180818666210728162055] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Background:
Natural products have been a rich source of compounds for drug discovery. Usually,
compounds obtained from natural sources have little or no side effects, thus searching for new lead
compounds from traditionally used plant species is still a rational strategy.
Introduction:
Natural products serve as a useful repository of compounds for new drugs; however, their
use has been decreasing, in part because of technical barriers to screening natural products in highthroughput
assays against molecular targets. To address this unmet demand, we have developed and validated
a high throughput in silico machine learning screening method to identify potential compounds
from natural sources.
Methods:
In the current study, three machine learning approaches, including Support Vector Machine
(SVM), Random Forest (RF) and Gradient Boosting Machine (GBM) have been applied to develop the
classification model. The model was generated using the cyclooxygenase-2 (COX-2) inhibitors reported
in the ChEMBL database. The developed model was validated by evaluating the accuracy, sensitivity,
specificity, Matthews correlation coefficient and Cohen’s kappa statistic of the test set. The molecular
docking study was conducted on AutoDock vina and the results were analyzed in PyMOL.
Results:
The accuracy of the model for SVM, RF and GBM was found to be 75.40 %, 74.97 % and 74.60
%, respectively, which indicates the good performance of the developed model. Further, the model has
demonstrated good sensitivity (61.25 % - 68.60 %) and excellent specificity (77.72 %- 81.41 %). Application
of the model on the NuBBE database, a repository of natural compounds, led us to identify a natural
compound, enhydrin possessing analgesic and anti-inflammatory activities. The ML methods and the
molecular docking study suggest that enhydrin likely demonstrates its analgesic and anti-inflammatory
actions by inhibiting COX-2.
Conclusion:
Our developed and validated in silico high throughput ML screening methods may assist in
identifying drug-like compounds from natural sources.
Collapse
Affiliation(s)
- Mohammad Firoz Khan
- Computational Chemistry and Bioinformatics Laboratory, Department of Pharmacy, State University of Bangladesh,
Dhaka, 1205, Bangladesh
| | - Ridwan Bin Rashid
- Computational Chemistry and Bioinformatics Laboratory, Department of Pharmacy, State University of Bangladesh,
Dhaka, 1205, Bangladesh
| | - Mohammad A. Rashid
- Department of Pharmaceutical Chemistry, Faculty of Pharmacy, University of Dhaka, Dhaka,
1000, Bangladesh
| |
Collapse
|
27
|
Kang SG, Morrone JA, Weber JK, Cornell WD. Analysis of Training and Seed Bias in Small Molecules Generated with a Conditional Graph-Based Variational Autoencoder─Insights for Practical AI-Driven Molecule Generation. J Chem Inf Model 2022; 62:801-816. [PMID: 35130440 DOI: 10.1021/acs.jcim.1c01545] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
The application of deep learning to generative molecule design has shown early promise for accelerating lead series development. However, questions remain concerning how factors like training, data set, and seed bias impact the technology's utility to medicinal and computational chemists. In this work, we analyze the impact of seed and training bias on the output of an activity-conditioned graph-based variational autoencoder (VAE). Leveraging a massive, labeled data set corresponding to the dopamine D2 receptor, our graph-based generative model is shown to excel in producing desired conditioned activities and favorable unconditioned physical properties in generated molecules. We implement an activity-swapping method that allows for the activation, deactivation, or retention of activity of molecular seeds, and we apply independent deep learning classifiers to verify the generative results. Overall, we uncover relationships between noise, molecular seeds, and training set selection across a range of latent-space sampling procedures, providing important insights for practical AI-driven molecule generation.
Collapse
Affiliation(s)
- Seung-Gu Kang
- Computational Biology Center, IBM Thomas J. Watson Research Center, 1101 Kitchawan Road, Yorktown Heights, New York 10594, United States
| | - Joseph A Morrone
- Computational Biology Center, IBM Thomas J. Watson Research Center, 1101 Kitchawan Road, Yorktown Heights, New York 10594, United States
| | - Jeffrey K Weber
- Computational Biology Center, IBM Thomas J. Watson Research Center, 1101 Kitchawan Road, Yorktown Heights, New York 10594, United States
| | - Wendy D Cornell
- Computational Biology Center, IBM Thomas J. Watson Research Center, 1101 Kitchawan Road, Yorktown Heights, New York 10594, United States
| |
Collapse
|
28
|
Big data and artificial intelligence (AI) methodologies for computer-aided drug design (CADD). Biochem Soc Trans 2022; 50:241-252. [PMID: 35076690 PMCID: PMC9022974 DOI: 10.1042/bst20211240] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2021] [Revised: 12/23/2021] [Accepted: 12/23/2021] [Indexed: 12/18/2022]
Abstract
There have been numerous advances in the development of computational and statistical methods and applications of big data and artificial intelligence (AI) techniques for computer-aided drug design (CADD). Drug design is a costly and laborious process considering the biological complexity of diseases. To effectively and efficiently design and develop a new drug, CADD can be used to apply cutting-edge techniques to various limitations in the drug design field. Data pre-processing approaches, which clean the raw data for consistent and reproducible applications of big data and AI methods are introduced. We include the current status of the applicability of big data and AI methods to drug design areas such as the identification of binding sites in target proteins, structure-based virtual screening (SBVS), and absorption, distribution, metabolism, excretion and toxicity (ADMET) property prediction. Data pre-processing and applications of big data and AI methods enable the accurate and comprehensive analysis of massive biomedical data and the development of predictive models in the field of drug design. Understanding and analyzing biological, chemical, or pharmaceutical architectures of biomedical entities related to drug design will provide beneficial information in the biomedical big data era.
Collapse
|
29
|
Jiang D, Hsieh CY, Wu Z, Kang Y, Wang J, Wang E, Liao B, Shen C, Xu L, Wu J, Cao D, Hou T. InteractionGraphNet: A Novel and Efficient Deep Graph Representation Learning Framework for Accurate Protein-Ligand Interaction Predictions. J Med Chem 2021; 64:18209-18232. [PMID: 34878785 DOI: 10.1021/acs.jmedchem.1c01830] [Citation(s) in RCA: 62] [Impact Index Per Article: 20.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
Accurate quantification of protein-ligand interactions remains a key challenge to structure-based drug design. However, traditional machine learning (ML)-based methods based on handcrafted descriptors, one-dimensional protein sequences, and/or two-dimensional graph representations limit their capability to learn the generalized molecular interactions in 3D space. Here, we proposed a novel deep graph representation learning framework named InteractionGraphNet (IGN) to learn the protein-ligand interactions from the 3D structures of protein-ligand complexes. In IGN, two independent graph convolution modules were stacked to sequentially learn the intramolecular and intermolecular interactions, and the learned intermolecular interactions can be efficiently used for subsequent tasks. Extensive binding affinity prediction, large-scale structure-based virtual screening, and pose prediction experiments demonstrated that IGN achieved better or competitive performance against other state-of-the-art ML-based baselines and docking programs. More importantly, such state-of-the-art performance was proven from the successful learning of the key features in protein-ligand interactions instead of just memorizing certain biased patterns from data.
Collapse
Affiliation(s)
- Dejun Jiang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China.,College of Computer Science and Technology, Zhejiang University, Hangzhou 310058, China.,State Key Laboratory of CAD&CG, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Chang-Yu Hsieh
- Tencent Quantum Laboratory, Tencent, Shenzhen 518057, Guangdong, China
| | - Zhenxing Wu
- College of Computer Science and Technology, Zhejiang University, Hangzhou 310058, China
| | - Yu Kang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Jike Wang
- School of Computer Science, Wuhan University, Wuhan 430072, Hubei, China
| | - Ercheng Wang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Ben Liao
- Tencent Quantum Laboratory, Tencent, Shenzhen 518057, Guangdong, China
| | - Chao Shen
- College of Computer Science and Technology, Zhejiang University, Hangzhou 310058, China
| | - Lei Xu
- Institute of Bioinformatics and Medical Engineering, School of Electrical and Information Engineering, Jiangsu University of Technology, Changzhou 213001, China
| | - Jian Wu
- College of Computer Science and Technology, Zhejiang University, Hangzhou 310058, China
| | - Dongsheng Cao
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410004, Hunan, China
| | - Tingjun Hou
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China.,State Key Laboratory of CAD&CG, Zhejiang University, Hangzhou 310058, Zhejiang, China
| |
Collapse
|
30
|
Weber JK, Morrone JA, Bagchi S, Pabon JDE, Kang SG, Zhang L, Cornell WD. Simplified, interpretable graph convolutional neural networks for small molecule activity prediction. J Comput Aided Mol Des 2021; 36:391-404. [PMID: 34817762 PMCID: PMC9325818 DOI: 10.1007/s10822-021-00421-6] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2021] [Accepted: 09/24/2021] [Indexed: 12/11/2022]
Abstract
We here present a streamlined, explainable graph convolutional neural network (gCNN) architecture for small molecule activity prediction. We first conduct a hyperparameter optimization across nearly 800 protein targets that produces a simplified gCNN QSAR architecture, and we observe that such a model can yield performance improvements over both standard gCNN and RF methods on difficult-to-classify test sets. Additionally, we discuss how reductions in convolutional layer dimensions potentially speak to the “anatomical” needs of gCNNs with respect to radial coarse graining of molecular substructure. We augment this simplified architecture with saliency map technology that highlights molecular substructures relevant to activity, and we perform saliency analysis on nearly 100 data-rich protein targets. We show that resultant substructural clusters are useful visualization tools for understanding substructure-activity relationships. We go on to highlight connections between our models’ saliency predictions and observations made in the medicinal chemistry literature, focusing on four case studies of past lead finding and lead optimization campaigns.
Collapse
Affiliation(s)
- Jeffrey K Weber
- IBM Thomas J Watson Research Center, Yorktown Heights, NY, USA
| | | | - Sugato Bagchi
- IBM Thomas J Watson Research Center, Yorktown Heights, NY, USA
| | | | - Seung-Gu Kang
- IBM Thomas J Watson Research Center, Yorktown Heights, NY, USA
| | - Leili Zhang
- IBM Thomas J Watson Research Center, Yorktown Heights, NY, USA
| | - Wendy D Cornell
- IBM Thomas J Watson Research Center, Yorktown Heights, NY, USA.
| |
Collapse
|
31
|
Wang Z, Zheng L, Liu Y, Qu Y, Li YQ, Zhao M, Mu Y, Li W. OnionNet-2: A Convolutional Neural Network Model for Predicting Protein-Ligand Binding Affinity Based on Residue-Atom Contacting Shells. Front Chem 2021; 9:753002. [PMID: 34778208 PMCID: PMC8579074 DOI: 10.3389/fchem.2021.753002] [Citation(s) in RCA: 24] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2021] [Accepted: 10/06/2021] [Indexed: 01/31/2023] Open
Abstract
One key task in virtual screening is to accurately predict the binding affinity (△G) of protein-ligand complexes. Recently, deep learning (DL) has significantly increased the predicting accuracy of scoring functions due to the extraordinary ability of DL to extract useful features from raw data. Nevertheless, more efforts still need to be paid in many aspects, for the aim of increasing prediction accuracy and decreasing computational cost. In this study, we proposed a simple scoring function (called OnionNet-2) based on convolutional neural network to predict △G. The protein-ligand interactions are characterized by the number of contacts between protein residues and ligand atoms in multiple distance shells. Compared to published models, the efficacy of OnionNet-2 is demonstrated to be the best for two widely used datasets CASF-2016 and CASF-2013 benchmarks. The OnionNet-2 model was further verified by non-experimental decoy structures from docking program and the CSAR NRC-HiQ data set (a high-quality data set provided by CSAR), which showed great success. Thus, our study provides a simple but efficient scoring function for predicting protein-ligand binding free energy.
Collapse
Affiliation(s)
- Zechen Wang
- School of Physics, Shandong University, Jinan, China
| | | | - Yang Liu
- School of Physics, Shandong University, Jinan, China
| | - Yuanyuan Qu
- School of Physics, Shandong University, Jinan, China
| | - Yong-Qiang Li
- School of Physics, Shandong University, Jinan, China
| | - Mingwen Zhao
- School of Physics, Shandong University, Jinan, China
| | - Yuguang Mu
- School of Biological Sciences, Nanyang Technological University, Singapore
| | - Weifeng Li
- School of Physics, Shandong University, Jinan, China
| |
Collapse
|
32
|
Shen C, Hu X, Gao J, Zhang X, Zhong H, Wang Z, Xu L, Kang Y, Cao D, Hou T. The impact of cross-docked poses on performance of machine learning classifier for protein-ligand binding pose prediction. J Cheminform 2021; 13:81. [PMID: 34656169 PMCID: PMC8520186 DOI: 10.1186/s13321-021-00560-w] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2021] [Accepted: 10/05/2021] [Indexed: 02/06/2023] Open
Abstract
Structure-based drug design depends on the detailed knowledge of the three-dimensional (3D) structures of protein-ligand binding complexes, but accurate prediction of ligand-binding poses is still a major challenge for molecular docking due to deficiency of scoring functions (SFs) and ignorance of protein flexibility upon ligand binding. In this study, based on a cross-docking dataset dedicatedly constructed from the PDBbind database, we developed several XGBoost-trained classifiers to discriminate the near-native binding poses from decoys, and systematically assessed their performance with/without the involvement of the cross-docked poses in the training/test sets. The calculation results illustrate that using Extended Connectivity Interaction Features (ECIF), Vina energy terms and docking pose ranks as the features can achieve the best performance, according to the validation through the random splitting or refined-core splitting and the testing on the re-docked or cross-docked poses. Besides, it is found that, despite the significant decrease of the performance for the threefold clustered cross-validation, the inclusion of the Vina energy terms can effectively ensure the lower limit of the performance of the models and thus improve their generalization capability. Furthermore, our calculation results also highlight the importance of the incorporation of the cross-docked poses into the training of the SFs with wide application domain and high robustness for binding pose prediction. The source code and the newly-developed cross-docking datasets can be freely available at https://github.com/sc8668/ml_pose_prediction and https://zenodo.org/record/5525936 , respectively, under an open-source license. We believe that our study may provide valuable guidance for the development and assessment of new machine learning-based SFs (MLSFs) for the predictions of protein-ligand binding poses.
Collapse
Affiliation(s)
- Chao Shen
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang, 310058, People's Republic of China.,State Key Lab of CAD&CG, Zhejiang University, Hangzhou, Zhejiang, 310058, People's Republic of China
| | - Xueping Hu
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang, 310058, People's Republic of China
| | - Junbo Gao
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang, 310058, People's Republic of China
| | - Xujun Zhang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang, 310058, People's Republic of China
| | - Haiyang Zhong
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang, 310058, People's Republic of China
| | - Zhe Wang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang, 310058, People's Republic of China
| | - Lei Xu
- Institute of Bioinformatics and Medical Engineering, School of Electrical and Information Engineering, Jiangsu University of Technology, Changzhou, 213001, China
| | - Yu Kang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang, 310058, People's Republic of China.
| | - Dongsheng Cao
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha, Hunan, 410013, People's Republic of China.
| | - Tingjun Hou
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang, 310058, People's Republic of China. .,State Key Lab of CAD&CG, Zhejiang University, Hangzhou, Zhejiang, 310058, People's Republic of China.
| |
Collapse
|
33
|
Crampon K, Giorkallos A, Deldossi M, Baud S, Steffenel LA. Machine-learning methods for ligand-protein molecular docking. Drug Discov Today 2021; 27:151-164. [PMID: 34560276 DOI: 10.1016/j.drudis.2021.09.007] [Citation(s) in RCA: 83] [Impact Index Per Article: 27.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2021] [Revised: 07/14/2021] [Accepted: 09/15/2021] [Indexed: 12/22/2022]
Abstract
Artificial intelligence (AI) is often presented as a new Industrial Revolution. Many domains use AI, including molecular simulation for drug discovery. In this review, we provide an overview of ligand-protein molecular docking and how machine learning (ML), especially deep learning (DL), a subset of ML, is transforming the field by tackling the associated challenges.
Collapse
Affiliation(s)
- Kevin Crampon
- Université de Reims Champagne Ardenne, CNRS, MEDyC UMR 7369, 51097 Reims, France; Université de Reims Champagne Ardenne, LICIIS - LRC CEA DIGIT, 51100 Reims, France; Atos SE, Center of Excellence in Advanced Computing, 38130 Echirolles, France
| | - Alexis Giorkallos
- Atos SE, Center of Excellence in Advanced Computing, 38130 Echirolles, France
| | - Myrtille Deldossi
- Atos SE, Center of Excellence in Advanced Computing, 38130 Echirolles, France
| | - Stéphanie Baud
- Université de Reims Champagne Ardenne, CNRS, MEDyC UMR 7369, 51097 Reims, France
| | | |
Collapse
|
34
|
Di Filippo JI, Cavasotto CN. Guided structure-based ligand identification and design via artificial intelligence modeling. Expert Opin Drug Discov 2021; 17:71-78. [PMID: 34544293 DOI: 10.1080/17460441.2021.1979514] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
Abstract
INTRODUCTION The implementation of Artificial Intelligence (AI) methodologies to drug discovery (DD) are on the rise. Several applications have been developed for structure-based DD, where AI methods provide an alternative framework for the identification of ligands for validated therapeutic targets, as well as the de novo design of ligands through generative models. AREAS COVERED Herein, the authors review the contributions between the 2019 to present period regarding the application of AI methods to structure-based virtual screening (SBVS) which encompasses mainly molecular docking applications - binding pose prediction and binary classification for ligand or hit identification-, as well as de novo drug design driven by machine learning (ML) generative models, and the validation of AI models in structure-based screening. Studies are reviewed in terms of their main objective, used databases, implemented methodology, input and output, and key results . EXPERT OPINION More profound analyses regarding the validity and applicability of AI methods in DD have begun to appear. In the near future, we expect to see more structure-based generative models- which are scarce in comparison to ligand-based generative models-, the implementation of standard guidelines for validating the generated structures, and more analyses regarding the validation of AI methods in structure-based DD.
Collapse
Affiliation(s)
- Juan I Di Filippo
- Computational Drug Design and Biomedical Informatics Laboratory, Instituto de Investigaciones en Medicina Traslacional (IIMT), CONICET-Universidad Austral, Pilar, Buenos Aires, Argentina.,Facultad de Ciencias Biomédicas, and Facultad de Ingeniería, Universidad Austral, Pilar, Buenos Aires, Argentina.,Austral Institute for Applied Artificial Intelligence, Universidad Austral, Pilar, Buenos Aires, Argentina
| | - Claudio N Cavasotto
- Computational Drug Design and Biomedical Informatics Laboratory, Instituto de Investigaciones en Medicina Traslacional (IIMT), CONICET-Universidad Austral, Pilar, Buenos Aires, Argentina.,Facultad de Ciencias Biomédicas, and Facultad de Ingeniería, Universidad Austral, Pilar, Buenos Aires, Argentina.,Austral Institute for Applied Artificial Intelligence, Universidad Austral, Pilar, Buenos Aires, Argentina
| |
Collapse
|
35
|
Wang S, Jiang M, Zhang S, Wang X, Yuan Q, Wei Z, Li Z. MCN-CPI: Multiscale Convolutional Network for Compound-Protein Interaction Prediction. Biomolecules 2021; 11:1119. [PMID: 34439785 PMCID: PMC8392217 DOI: 10.3390/biom11081119] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2021] [Revised: 07/19/2021] [Accepted: 07/26/2021] [Indexed: 01/09/2023] Open
Abstract
In the process of drug discovery, identifying the interaction between the protein and the novel compound plays an important role. With the development of technology, deep learning methods have shown excellent performance in various situations. However, the compound-protein interaction is complicated and the features extracted by most deep models are not comprehensive, which limits the performance to a certain extent. In this paper, we proposed a multiscale convolutional network that extracted the local and global features of the protein and the topological feature of the compound using different types of convolutional networks. The results showed that our model obtained the best performance compared with the existing deep learning methods.
Collapse
Affiliation(s)
- Shuang Wang
- College of Computer Science and Technology, China University of Petroleum, Qingdao 266580, China;
| | - Mingjian Jiang
- School of Information and Control Engineering, Qingdao University of Technology, Qingdao 266520, China;
| | - Shugang Zhang
- College of Computer Science and Technology, Ocean University of China, Qingdao 266100, China; (S.Z.); (X.W.); (Q.Y.); (Z.W.)
| | - Xiaofeng Wang
- College of Computer Science and Technology, Ocean University of China, Qingdao 266100, China; (S.Z.); (X.W.); (Q.Y.); (Z.W.)
| | - Qing Yuan
- College of Computer Science and Technology, Ocean University of China, Qingdao 266100, China; (S.Z.); (X.W.); (Q.Y.); (Z.W.)
| | - Zhiqiang Wei
- College of Computer Science and Technology, Ocean University of China, Qingdao 266100, China; (S.Z.); (X.W.); (Q.Y.); (Z.W.)
| | - Zhen Li
- College of Computer Science and Technology, Qingdao University, Qingdao 266071, China
| |
Collapse
|
36
|
Pinto GP, Hendrikse NM, Stourac J, Damborsky J, Bednar D. Virtual screening of potential anticancer drugs based on microbial products. Semin Cancer Biol 2021; 86:1207-1217. [PMID: 34298109 DOI: 10.1016/j.semcancer.2021.07.012] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2021] [Revised: 07/14/2021] [Accepted: 07/18/2021] [Indexed: 01/20/2023]
Abstract
The development of microbial products for cancer treatment has been in the spotlight in recent years. In order to accelerate the lengthy and expensive drug development process, in silico screening tools are systematically employed, especially during the initial discovery phase. Moreover, considering the steadily increasing number of molecules approved by authorities for commercial use, there is a demand for faster methods to repurpose such drugs. Here we present a review on virtual screening web tools, such as publicly available databases of molecular targets and libraries of ligands, with the aim to facilitate the discovery of potential anticancer drugs based on microbial products. We provide an entry-level step-by-step description of the workflow for virtual screening of microbial metabolites with known protein targets, as well as two practical examples using freely available web tools. The first case presents a virtual screening study of drugs developed from microbial products using Caver Web, a web tool that performs docking along a tunnel. The second case comprises a comparative analysis between a wild type isocitrate dehydrogenase 1 and a mutant that results in cancer, using the recently developed web tool PredictSNPOnco. In summary, this review provides the basic and essential background information necessary for virtual screening experiments, which may accelerate the discovery of novel anticancer drugs.
Collapse
Affiliation(s)
- Gaspar P Pinto
- Loschmidt Laboratories, Department of Experimental Biology and RECETOX, Faculty of Science, Masaryk University, Kamenice 5/A13, Brno, 625 00, Czech Republic; International Clinical Research Center, St. Anne's University Hospital Brno, Pekarska 53, Brno, 656 91, Czech Republic
| | - Natalie M Hendrikse
- Loschmidt Laboratories, Department of Experimental Biology and RECETOX, Faculty of Science, Masaryk University, Kamenice 5/A13, Brno, 625 00, Czech Republic
| | - Jan Stourac
- Loschmidt Laboratories, Department of Experimental Biology and RECETOX, Faculty of Science, Masaryk University, Kamenice 5/A13, Brno, 625 00, Czech Republic; International Clinical Research Center, St. Anne's University Hospital Brno, Pekarska 53, Brno, 656 91, Czech Republic
| | - Jiri Damborsky
- Loschmidt Laboratories, Department of Experimental Biology and RECETOX, Faculty of Science, Masaryk University, Kamenice 5/A13, Brno, 625 00, Czech Republic; International Clinical Research Center, St. Anne's University Hospital Brno, Pekarska 53, Brno, 656 91, Czech Republic
| | - David Bednar
- Loschmidt Laboratories, Department of Experimental Biology and RECETOX, Faculty of Science, Masaryk University, Kamenice 5/A13, Brno, 625 00, Czech Republic; International Clinical Research Center, St. Anne's University Hospital Brno, Pekarska 53, Brno, 656 91, Czech Republic.
| |
Collapse
|
37
|
Kashyap K, Siddiqi MI. Recent trends in artificial intelligence-driven identification and development of anti-neurodegenerative therapeutic agents. Mol Divers 2021; 25:1517-1539. [PMID: 34282519 DOI: 10.1007/s11030-021-10274-8] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2021] [Accepted: 07/05/2021] [Indexed: 12/12/2022]
Abstract
Neurological disorders affect various aspects of life. Finding drugs for the central nervous system is a very challenging and complex task due to the involvement of the blood-brain barrier, P-glycoprotein, and the drug's high attrition rates. The availability of big data present in online databases and resources has enabled the emergence of artificial intelligence techniques including machine learning to analyze, process the data, and predict the unknown data with high efficiency. The use of these modern techniques has revolutionized the whole drug development paradigm, with an unprecedented acceleration in the central nervous system drug discovery programs. Also, the new deep learning architectures proposed in many recent works have given a better understanding of how artificial intelligence can tackle big complex problems that arose due to central nervous system disorders. Therefore, the present review provides comprehensive and up-to-date information on machine learning/artificial intelligence-triggered effort in the brain care domain. In addition, a brief overview is presented on machine learning algorithms and their uses in structure-based drug design, ligand-based drug design, ADMET prediction, de novo drug design, and drug repurposing. Lastly, we conclude by discussing the major challenges and limitations posed and how they can be tackled in the future by using these modern machine learning/artificial intelligence approaches.
Collapse
Affiliation(s)
- Kushagra Kashyap
- Academy of Scientific and Innovative Research (AcSIR), CSIR-Central Drug Research Institute (CSIR-CDRI) Campus, Lucknow, India.,Molecular and Structural Biology Division, CSIR-Central Drug Research Institute (CSIR-CDRI), Sector 10, Jankipuram Extension, Sitapur Road, Lucknow, 226031, India
| | - Mohammad Imran Siddiqi
- Academy of Scientific and Innovative Research (AcSIR), CSIR-Central Drug Research Institute (CSIR-CDRI) Campus, Lucknow, India. .,Molecular and Structural Biology Division, CSIR-Central Drug Research Institute (CSIR-CDRI), Sector 10, Jankipuram Extension, Sitapur Road, Lucknow, 226031, India.
| |
Collapse
|
38
|
Sanner MF, Dieguez L, Forli S, Lis E. Improving Docking Power for Short Peptides Using Random Forest. J Chem Inf Model 2021; 61:3074-3090. [PMID: 34124893 PMCID: PMC8543977 DOI: 10.1021/acs.jcim.1c00573] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
In recent years, therapeutic peptides have gained a lot interest as demonstrated by the 60 peptides approved as drugs in major markets and 150+ peptides currently in clinical trials. However, while small molecule docking is routinely used in rational drug design efforts, docking peptides has proven challenging partly because docking scoring functions, developed and calibrated for small molecules, perform poorly for these molecules. Here, we present random forest classifiers trained to discriminate correctly docked peptides. We show that, for a testing set of 47 protein-peptide complexes, structurally dissimilar from the training set and previously used to benchmark AutoDock Vina's ability to dock short peptides, these random forest classifiers improve docking power from ∼25% for AutoDock scoring functions to an average of ∼70%. These results pave the way for peptide-docking success rates comparable to those of small molecule docking. To develop these classifiers, we compiled the ProptPep37_2021 data set, a curated, high-quality set of 322 crystallographic protein-peptides complexes annotated with structural similarity information. The data set also provides a collection of high-quality putative poses with a range of deviations from the crystallographic pose, providing correct and incorrect poses (i.e., decoys) of the peptide for each entry. The ProptPep37_2021 data set as well as the classifiers presented here are freely available.
Collapse
Affiliation(s)
- Michel F. Sanner
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA 93037, USA
| | - Leonard Dieguez
- Koliber Biosciences Inc., 12265 World Trade Drive, Suite G, San Diego, CA 92128, USA
| | - Stefano Forli
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA 93037, USA
| | - Ewa Lis
- Koliber Biosciences Inc., 12265 World Trade Drive, Suite G, San Diego, CA 92128, USA
| |
Collapse
|
39
|
Affiliation(s)
- W Patrick Walters
- Relay Therapeutics, 399 Binney Street, Cambridge, Massachusetts 02139, United States
| | - Renxiao Wang
- Department of Medicinal Chemistry, School of Pharmacy, Fudan University, 826 Zhangheng Road, Shanghai 201203, People's Republic of China
| |
Collapse
|
40
|
Nigam A, Pollice R, Hurley MFD, Hickman RJ, Aldeghi M, Yoshikawa N, Chithrananda S, Voelz VA, Aspuru-Guzik A. Assigning confidence to molecular property prediction. Expert Opin Drug Discov 2021; 16:1009-1023. [DOI: 10.1080/17460441.2021.1925247] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Affiliation(s)
- AkshatKumar Nigam
- Chemical Physics Theory Group, Department of Chemistry, University of Toronto, Toronto, Canada
- Department of Computer Science, University of Toronto, Toronto, Canada
| | - Robert Pollice
- Chemical Physics Theory Group, Department of Chemistry, University of Toronto, Toronto, Canada
- Department of Computer Science, University of Toronto, Toronto, Canada
| | | | - Riley J. Hickman
- Chemical Physics Theory Group, Department of Chemistry, University of Toronto, Toronto, Canada
- Department of Computer Science, University of Toronto, Toronto, Canada
| | - Matteo Aldeghi
- Chemical Physics Theory Group, Department of Chemistry, University of Toronto, Toronto, Canada
- Department of Computer Science, University of Toronto, Toronto, Canada
- Vector Institute for Artificial Intelligence, University Ave Suite 710, Toronto, Canada
| | - Naruki Yoshikawa
- Chemical Physics Theory Group, Department of Chemistry, University of Toronto, Toronto, Canada
- Department of Computer Science, University of Toronto, Toronto, Canada
| | | | | | - Alán Aspuru-Guzik
- Chemical Physics Theory Group, Department of Chemistry, University of Toronto, Toronto, Canada
- Department of Computer Science, University of Toronto, Toronto, Canada
- Vector Institute for Artificial Intelligence, University Ave Suite 710, Toronto, Canada
- Canadian Institute for Advanced Research (CIFAR), University Ave, Toronto, Canada
| |
Collapse
|
41
|
Qin T, Zhu Z, Wang XS, Xia J, Wu S. Computational representations of protein-ligand interfaces for structure-based virtual screening. Expert Opin Drug Discov 2021; 16:1175-1192. [PMID: 34011222 DOI: 10.1080/17460441.2021.1929921] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
Introduction: Structure-based virtual screening (SBVS) is an essential strategy for hit identification. SBVS primarily uses molecular docking, which exploits the protein-ligand binding mode and associated affinity score for compound ranking. Previous studies have shown that computational representation of protein-ligand interfaces and the later establishment of machine learning models are efficacious in improving the accuracy of SBVS.Areas covered: The authors review the computational methods for representing protein-ligand interfaces, which include the traditional ones that use deliberately designed fingerprints and descriptors and the more recent methods that automatically extract features with deep learning. The effects of these methods on the performance of machine learning models are briefly discussed. Additionally, case studies that applied various computational representations to machine learning are cited with remarks.Expert opinion: It has become a trend to extract binding features automatically by deep learning, which uses a completely end-to-end representation. However, there is still plenty of scope for improvement . The interpretability of deep-learning models, the organization of data management, the quantity and quality of available data, and the optimization of hyperparameters could impact the accuracy of feature extraction. In addition, other important structural factors such as water molecules and protein flexibility should be considered.
Collapse
Affiliation(s)
- Tong Qin
- State Key Laboratory of Bioactive Substance and Function of Natural Medicines, Department of New Drug Research and Development, Institute of Materia Medica, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Zihao Zhu
- State Key Laboratory of Bioactive Substance and Function of Natural Medicines, Department of New Drug Research and Development, Institute of Materia Medica, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Xiang Simon Wang
- Artificial Intelligence and Drug Discovery Core Laboratory for District of Columbia Center for AIDS Research (DC CFAR), Department of Pharmaceutical Sciences, College of Pharmacy, Howard University, U.S.A
| | - Jie Xia
- State Key Laboratory of Bioactive Substance and Function of Natural Medicines, Department of New Drug Research and Development, Institute of Materia Medica, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Song Wu
- State Key Laboratory of Bioactive Substance and Function of Natural Medicines, Department of New Drug Research and Development, Institute of Materia Medica, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| |
Collapse
|
42
|
Suh D, Lee JW, Choi S, Lee Y. Recent Applications of Deep Learning Methods on Evolution- and Contact-Based Protein Structure Prediction. Int J Mol Sci 2021; 22:6032. [PMID: 34199677 PMCID: PMC8199773 DOI: 10.3390/ijms22116032] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2021] [Revised: 05/29/2021] [Accepted: 05/29/2021] [Indexed: 01/23/2023] Open
Abstract
The new advances in deep learning methods have influenced many aspects of scientific research, including the study of the protein system. The prediction of proteins' 3D structural components is now heavily dependent on machine learning techniques that interpret how protein sequences and their homology govern the inter-residue contacts and structural organization. Especially, methods employing deep neural networks have had a significant impact on recent CASP13 and CASP14 competition. Here, we explore the recent applications of deep learning methods in the protein structure prediction area. We also look at the potential opportunities for deep learning methods to identify unknown protein structures and functions to be discovered and help guide drug-target interactions. Although significant problems still need to be addressed, we expect these techniques in the near future to play crucial roles in protein structural bioinformatics as well as in drug discovery.
Collapse
Affiliation(s)
- Donghyuk Suh
- Global AI Drug Discovery Center, School of Pharmaceutical Sciences, College of Pharmacy and Graduate, Ewha Womans University, Seoul 03760, Korea; (D.S.); (J.W.L.); (S.C.)
| | - Jai Woo Lee
- Global AI Drug Discovery Center, School of Pharmaceutical Sciences, College of Pharmacy and Graduate, Ewha Womans University, Seoul 03760, Korea; (D.S.); (J.W.L.); (S.C.)
| | - Sun Choi
- Global AI Drug Discovery Center, School of Pharmaceutical Sciences, College of Pharmacy and Graduate, Ewha Womans University, Seoul 03760, Korea; (D.S.); (J.W.L.); (S.C.)
| | - Yoonji Lee
- College of Pharmacy, Chung-Ang University, Seoul 06974, Korea
| |
Collapse
|
43
|
Perez JJ, Perez RA, Perez A. Computational Modeling as a Tool to Investigate PPI: From Drug Design to Tissue Engineering. Front Mol Biosci 2021; 8:681617. [PMID: 34095231 PMCID: PMC8173110 DOI: 10.3389/fmolb.2021.681617] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2021] [Accepted: 05/05/2021] [Indexed: 12/13/2022] Open
Abstract
Protein-protein interactions (PPIs) mediate a large number of important regulatory pathways. Their modulation represents an important strategy for discovering novel therapeutic agents. However, the features of PPI binding surfaces make the use of structure-based drug discovery methods very challenging. Among the diverse approaches used in the literature to tackle the problem, linear peptides have demonstrated to be a suitable methodology to discover PPI disruptors. Unfortunately, the poor pharmacokinetic properties of linear peptides prevent their direct use as drugs. However, they can be used as models to design enzyme resistant analogs including, cyclic peptides, peptide surrogates or peptidomimetics. Small molecules have a narrower set of targets they can bind to, but the screening technology based on virtual docking is robust and well tested, adding to the computational tools used to disrupt PPI. We review computational approaches used to understand and modulate PPI and highlight applications in a few case studies involved in physiological processes such as cell growth, apoptosis and intercellular communication.
Collapse
Affiliation(s)
- Juan J Perez
- Department of Chemical Engineering, Universitat Politecnica de Catalunya, Barcelona, Spain
| | - Roman A Perez
- Bioengineering Institute of Technology, Universitat Internacional de Catalunya, Sant Cugat, Spain
| | - Alberto Perez
- The Quantum Theory Project, Department of Chemistry, University of Florida, Gainesville, FL, United States
| |
Collapse
|
44
|
Kim QH, Ko JH, Kim S, Park N, Jhe W. Bayesian neural network with pretrained protein embedding enhances prediction accuracy of drug-protein interaction. Bioinformatics 2021; 37:3428-3435. [PMID: 33978713 PMCID: PMC8545317 DOI: 10.1093/bioinformatics/btab346] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2020] [Revised: 04/26/2021] [Accepted: 05/05/2021] [Indexed: 11/25/2022] Open
Abstract
Motivation Characterizing drug–protein interactions (DPIs) is crucial to the high-throughput screening for drug discovery. The deep learning-based approaches have attracted attention because they can predict DPIs without human trial and error. However, because data labeling requires significant resources, the available protein data size is relatively small, which consequently decreases model performance. Here, we propose two methods to construct a deep learning framework that exhibits superior performance with a small labeled dataset. Results At first, we use transfer learning in encoding protein sequences with a pretrained model, which trains general sequence representations in an unsupervised manner. Second, we use a Bayesian neural network to make a robust model by estimating the data uncertainty. Our resulting model performs better than the previous baselines at predicting interactions between molecules and proteins. We also show that the quantified uncertainty from the Bayesian inference is related to confidence and can be used for screening DPI data points. Availability and implementation The code is available at https://github.com/QHwan/PretrainDPI. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- QHwan Kim
- Department of Physics and Astronomy, Institute of Applied Physics, Seoul National University, Gwanak-gu, Seoul 08826, Republic of Korea
| | - Joon-Hyuk Ko
- Department of Physics and Astronomy, Institute of Applied Physics, Seoul National University, Gwanak-gu, Seoul 08826, Republic of Korea
| | - Sunghoon Kim
- Department of Physics and Astronomy, Institute of Applied Physics, Seoul National University, Gwanak-gu, Seoul 08826, Republic of Korea
| | - Nojun Park
- Department of Physics and Astronomy, Institute of Applied Physics, Seoul National University, Gwanak-gu, Seoul 08826, Republic of Korea
| | - Wonho Jhe
- Department of Physics and Astronomy, Institute of Applied Physics, Seoul National University, Gwanak-gu, Seoul 08826, Republic of Korea
| |
Collapse
|
45
|
Mandal SK, Munshi P. Predicting Accurate Lead Structures for Screening Molecular Libraries: A Quantum Crystallographic Approach. Molecules 2021; 26:molecules26092605. [PMID: 33946965 PMCID: PMC8124947 DOI: 10.3390/molecules26092605] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2021] [Revised: 04/22/2021] [Accepted: 04/24/2021] [Indexed: 11/17/2022] Open
Abstract
Optimization of lead structures is crucial for drug discovery. However, the accuracy of such a prediction using the traditional molecular docking approach remains a major concern. Our study demonstrates that the employment of quantum crystallographic approach-counterpoise corrected kernel energy method (KEM-CP) can improve the accuracy by and large. We select human aldose reductase at 0.66 Å, cyclin dependent kinase 2 at 2.0 Å and estrogen receptor β at 2.7 Å resolutions with active site environment ranging from highly hydrophilic to moderate to highly hydrophobic and several of their known ligands. Overall, the use of KEM-CP alongside the GoldScore resulted superior prediction than the GoldScore alone. Unlike GoldScore, the KEM-CP approach is neither environment-specific nor structural resolution dependent, which highlights its versatility. Further, the ranking of the ligands based on the KEM-CP results correlated well with that of the experimental IC50 values. This computationally inexpensive yet simple approach is expected to ease the process of virtual screening of potent ligands, and it would advance the drug discovery research.
Collapse
|
46
|
Marchetti F, Moroni E, Pandini A, Colombo G. Machine Learning Prediction of Allosteric Drug Activity from Molecular Dynamics. J Phys Chem Lett 2021; 12:3724-3732. [PMID: 33843228 PMCID: PMC8154828 DOI: 10.1021/acs.jpclett.1c00045] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2021] [Accepted: 04/05/2021] [Indexed: 05/13/2023]
Abstract
Allosteric drugs have been attracting increasing interest over the past few years. In this context, it is common practice to use high-throughput screening for the discovery of non-natural allosteric drugs. While the discovery stage is supported by a growing amount of biological information and increasing computing power, major challenges still remain in selecting allosteric ligands and predicting their effect on the target protein's function. Indeed, allosteric compounds can act both as inhibitors and activators of biological responses. Computational approaches to the problem have focused on variations on the theme of molecular docking coupled to molecular dynamics with the aim of recovering information on the (long-range) modulation typical of allosteric proteins.
Collapse
Affiliation(s)
- Filippo Marchetti
- Department
of Chemistry, Università Degli Studi
di Pavia, Viale Taramelli 12, 27100 Pavia, Italy
- Università
Degli Studi di Milano, Via C. Golgi, 19, I-20133 Milan, Italy
| | - Elisabetta Moroni
- Istituto
di Scienze e Tecnologie Chimiche, Via Mario Bianco 9, 20131 Milano, Italy
| | | | - Giorgio Colombo
- Department
of Chemistry, Università Degli Studi
di Pavia, Viale Taramelli 12, 27100 Pavia, Italy
- Istituto
di Scienze e Tecnologie Chimiche, Via Mario Bianco 9, 20131 Milano, Italy
| |
Collapse
|
47
|
Gupta P, Mohanty D. SMMPPI: a machine learning-based approach for prediction of modulators of protein-protein interactions and its application for identification of novel inhibitors for RBD:hACE2 interactions in SARS-CoV-2. Brief Bioinform 2021; 22:6220172. [PMID: 33839740 PMCID: PMC8083326 DOI: 10.1093/bib/bbab111] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2020] [Revised: 02/18/2021] [Accepted: 03/12/2021] [Indexed: 11/30/2022] Open
Abstract
Small molecule modulators of protein–protein interactions (PPIs) are being pursued as novel anticancer, antiviral and antimicrobial drug candidates. We have utilized a large data set of experimentally validated PPI modulators and developed machine learning classifiers for prediction of new small molecule modulators of PPI. Our analysis reveals that using random forest (RF) classifier, general PPI Modulators independent of PPI family can be predicted with ROC-AUC higher than 0.9, when training and test sets are generated by random split. The performance of the classifier on data sets very different from those used in training has also been estimated by using different state of the art protocols for removing various types of bias in division of data into training and test sets. The family-specific PPIM predictors developed in this work for 11 clinically important PPI families also have prediction accuracies of above 90% in majority of the cases. All these ML-based predictors have been implemented in a freely available software named SMMPPI for prediction of small molecule modulators for clinically relevant PPIs like RBD:hACE2, Bromodomain_Histone, BCL2-Like_BAX/BAK, LEDGF_IN, LFA_ICAM, MDM2-Like_P53, RAS_SOS1, XIAP_Smac, WDR5_MLL1, KEAP1_NRF2 and CD4_gp120. We have identified novel chemical scaffolds as inhibitors for RBD_hACE PPI involved in host cell entry of SARS-CoV-2. Docking studies for some of the compounds reveal that they can inhibit RBD_hACE2 interaction by high affinity binding to interaction hotspots on RBD. Some of these new scaffolds have also been found in SARS-CoV-2 viral growth inhibitors reported recently; however, it is not known if these molecules inhibit the entry phase.
Collapse
Affiliation(s)
| | - Debasisa Mohanty
- Bioinformatics & Computational Biology research group at NII, New Delhi 110067, India
| |
Collapse
|
48
|
Xie L, Xu L, Kong R, Chang S, Xu X. Improvement of Prediction Performance With Conjoint Molecular Fingerprint in Deep Learning. Front Pharmacol 2021; 11:606668. [PMID: 33488387 PMCID: PMC7819282 DOI: 10.3389/fphar.2020.606668] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2020] [Accepted: 11/23/2020] [Indexed: 12/27/2022] Open
Abstract
The accurate predicting of physical properties and bioactivity of drug molecules in deep learning depends on how molecules are represented. Many types of molecular descriptors have been developed for quantitative structure-activity/property relationships quantitative structure-activity relationships (QSPR). However, each molecular descriptor is optimized for a specific application with encoding preference. Considering that standalone featurization methods may only cover parts of information of the chemical molecules, we proposed to build the conjoint fingerprint by combining two supplementary fingerprints. The impact of conjoint fingerprint and each standalone fingerprint on predicting performance was systematically evaluated in predicting the logarithm of the partition coefficient (logP) and binding affinity of protein-ligand by using machine learning/deep learning (ML/DL) methods, including random forest (RF), support vector regression (SVR), extreme gradient boosting (XGBoost), long short-term memory network (LSTM), and deep neural network (DNN). The results demonstrated that the conjoint fingerprint yielded improved predictive performance, even outperforming the consensus model using two standalone fingerprints among four out of five examined methods. Given that the conjoint fingerprint scheme shows easy extensibility and high applicability, we expect that the proposed conjoint scheme would create new opportunities for continuously improving predictive performance of deep learning by harnessing the complementarity of various types of fingerprints.
Collapse
Affiliation(s)
- Liangxu Xie
- Institute of Bioinformatics and Medical Engineering, School of Electrical and Information Engineering, Jiangsu University of Technology, Changzhou, China.,Jiangsu Sino-Israel Industrial Technology Research Institute, Changzhou, China
| | - Lei Xu
- Institute of Bioinformatics and Medical Engineering, School of Electrical and Information Engineering, Jiangsu University of Technology, Changzhou, China
| | - Ren Kong
- Institute of Bioinformatics and Medical Engineering, School of Electrical and Information Engineering, Jiangsu University of Technology, Changzhou, China
| | - Shan Chang
- Institute of Bioinformatics and Medical Engineering, School of Electrical and Information Engineering, Jiangsu University of Technology, Changzhou, China
| | - Xiaojun Xu
- Institute of Bioinformatics and Medical Engineering, School of Electrical and Information Engineering, Jiangsu University of Technology, Changzhou, China
| |
Collapse
|
49
|
Gao P, Zhang J, Sun Y, Yu J. Accurate predictions of aqueous solubility of drug molecules via the multilevel graph convolutional network (MGCN) and SchNet architectures. Phys Chem Chem Phys 2020; 22:23766-23772. [PMID: 33063077 DOI: 10.1039/d0cp03596c] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
Abstract
Deep learning based methods have been widely applied to predict various kinds of molecular properties in the pharmaceutical industry with increasingly more success. In this study, we propose two novel models for aqueous solubility predictions, based on the Multilevel Graph Convolutional Network (MGCN) and SchNet architectures, respectively. The advantage of the MGCN lies in the fact that it could extract the graph features of the target molecules directly from the (3D) structural information; therefore, it doesn't need to rely on a lot of intra-molecular descriptors to learn the features, which are of significance for accurate predictions of the molecular properties. The SchNet performs well in modelling the interatomic interactions inside a molecule, and such a deep learning architecture is also capable of extracting structural information and further predicting the related properties. The actual accuracy of these two novel approaches was systematically benchmarked with four different independent datasets. We found that both the MGCN and SchNet models performed well for aqueous solubility predictions. In the future, we believe such promising predictive models will be applicable to enhancing the efficiency of the screening, crystallization and delivery of drug molecules, essentially as a useful tool to promote the development of molecular pharmaceutics.
Collapse
Affiliation(s)
- Peng Gao
- School of Chemistry and Molecular Bioscience, University of Wollongong, NSW 2500, Australia
| | | | | | | |
Collapse
|
50
|
Francoeur PG, Masuda T, Sunseri J, Jia A, Iovanisci RB, Snyder I, Koes DR. Three-Dimensional Convolutional Neural Networks and a Cross-Docked Data Set for Structure-Based Drug Design. J Chem Inf Model 2020; 60:4200-4215. [DOI: 10.1021/acs.jcim.0c00411] [Citation(s) in RCA: 38] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Affiliation(s)
- Paul G. Francoeur
- Department of Computational and Systems Biology, University of Pittsburgh, Pittsburgh, Pennsylvania 15260, United States
| | - Tomohide Masuda
- Department of Computational and Systems Biology, University of Pittsburgh, Pittsburgh, Pennsylvania 15260, United States
| | - Jocelyn Sunseri
- Department of Computational and Systems Biology, University of Pittsburgh, Pittsburgh, Pennsylvania 15260, United States
| | - Andrew Jia
- Department of Computational and Systems Biology, University of Pittsburgh, Pittsburgh, Pennsylvania 15260, United States
| | - Richard B. Iovanisci
- Department of Computational and Systems Biology, University of Pittsburgh, Pittsburgh, Pennsylvania 15260, United States
| | - Ian Snyder
- Department of Computational and Systems Biology, University of Pittsburgh, Pittsburgh, Pennsylvania 15260, United States
| | - David R. Koes
- Department of Computational and Systems Biology, University of Pittsburgh, Pittsburgh, Pennsylvania 15260, United States
| |
Collapse
|