1
|
Kumar N, Acharya V. Advances in machine intelligence-driven virtual screening approaches for big-data. Med Res Rev 2024; 44:939-974. [PMID: 38129992 DOI: 10.1002/med.21995] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2022] [Revised: 07/15/2023] [Accepted: 10/29/2023] [Indexed: 12/23/2023]
Abstract
Virtual screening (VS) is an integral and ever-evolving domain of drug discovery framework. The VS is traditionally classified into ligand-based (LB) and structure-based (SB) approaches. Machine intelligence or artificial intelligence has wide applications in the drug discovery domain to reduce time and resource consumption. In combination with machine intelligence algorithms, VS has emerged into revolutionarily progressive technology that learns within robust decision orders for data curation and hit molecule screening from large VS libraries in minutes or hours. The exponential growth of chemical and biological data has evolved as "big-data" in the public domain demands modern and advanced machine intelligence-driven VS approaches to screen hit molecules from ultra-large VS libraries. VS has evolved from an individual approach (LB and SB) to integrated LB and SB techniques to explore various ligand and target protein aspects for the enhanced rate of appropriate hit molecule prediction. Current trends demand advanced and intelligent solutions to handle enormous data in drug discovery domain for screening and optimizing hits or lead with fewer or no false positive hits. Following the big-data drift and tremendous growth in computational architecture, we presented this review. Here, the article categorized and emphasized individual VS techniques, detailed literature presented for machine learning implementation, modern machine intelligence approaches, and limitations and deliberated the future prospects.
Collapse
Affiliation(s)
- Neeraj Kumar
- Artificial Intelligence for Computational Biology Lab (AICoB), Biotechnology Division, CSIR-Institute of Himalayan Bioresource Technology, Palampur, Himachal Pradesh, India
- Academy of Scientific and Innovative Research, Ghaziabad, India
| | - Vishal Acharya
- Artificial Intelligence for Computational Biology Lab (AICoB), Biotechnology Division, CSIR-Institute of Himalayan Bioresource Technology, Palampur, Himachal Pradesh, India
- Academy of Scientific and Innovative Research, Ghaziabad, India
| |
Collapse
|
2
|
Mqawass G, Popov P. graphLambda: Fusion Graph Neural Networks for Binding Affinity Prediction. J Chem Inf Model 2024; 64:2323-2330. [PMID: 38366974 DOI: 10.1021/acs.jcim.3c00771] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/19/2024]
Abstract
Predicting the binding affinity of protein-ligand complexes is crucial for computer-aided drug discovery (CADD) and the identification of potential drug candidates. The deep learning-based scoring functions have emerged as promising predictors of binding constants. Building on recent advancements in graph neural networks, we present graphLambda for protein-ligand binding affinity prediction, which utilizes graph convolutional, attention, and isomorphism blocks to enhance the predictive capabilities. The graphLambda model exhibits superior performance across CASF16 and CSAR HiQ NRC benchmarks and demonstrates robustness with respect to different types of train-validation set partitions. The development of graphLambda underscores the potential of graph neural networks in advancing binding affinity prediction models, contributing to more effective CADD methodologies.
Collapse
Affiliation(s)
- Ghaith Mqawass
- Faculty of Computer Science, University of Vienna, Vienna A-1090, Austria
- UniVie Doctoral School Computer Science, University of Vienna, Vienna A-1090, Austria
| | - Petr Popov
- Tetra-d, Rheinweg 9, Schaffhausen 8200, Switzerland
- School of Science, Constructor University Bremen gGmbH, Bremen 28759, Germany
| |
Collapse
|
3
|
Zhang Y, Li S, Meng K, Sun S. Machine Learning for Sequence and Structure-Based Protein-Ligand Interaction Prediction. J Chem Inf Model 2024; 64:1456-1472. [PMID: 38385768 DOI: 10.1021/acs.jcim.3c01841] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/23/2024]
Abstract
Developing new drugs is too expensive and time -consuming. Accurately predicting the interaction between drugs and targets will likely change how the drug is discovered. Machine learning-based protein-ligand interaction prediction has demonstrated significant potential. In this paper, computational methods, focusing on sequence and structure to study protein-ligand interactions, are examined. Therefore, this paper starts by presenting an overview of the data sets applied in this area, as well as the various approaches applied for representing proteins and ligands. Then, sequence-based and structure-based classification criteria are subsequently utilized to categorize and summarize both the classical machine learning models and deep learning models employed in protein-ligand interaction studies. Moreover, the evaluation methods and interpretability of these models are proposed. Furthermore, delving into the diverse applications of protein-ligand interaction models in drug research is presented. Lastly, the current challenges and future directions in this field are addressed.
Collapse
Affiliation(s)
- Yunjiang Zhang
- Beijing Key Laboratory for Green Catalysis and Separation, The Faculty of Environment and Life, Beijing University of Technology, Beijing 100124, P. R. China
| | - Shuyuan Li
- Beijing Key Laboratory for Green Catalysis and Separation, The Faculty of Environment and Life, Beijing University of Technology, Beijing 100124, P. R. China
| | - Kong Meng
- Beijing Key Laboratory for Green Catalysis and Separation, The Faculty of Environment and Life, Beijing University of Technology, Beijing 100124, P. R. China
| | - Shaorui Sun
- Beijing Key Laboratory for Green Catalysis and Separation, The Faculty of Environment and Life, Beijing University of Technology, Beijing 100124, P. R. China
| |
Collapse
|
4
|
Smith MD, Darryl Quarles L, Demerdash O, Smith JC. Drugging the entire human proteome: Are we there yet? Drug Discov Today 2024; 29:103891. [PMID: 38246414 DOI: 10.1016/j.drudis.2024.103891] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2023] [Revised: 01/12/2024] [Accepted: 01/16/2024] [Indexed: 01/23/2024]
Abstract
Each of the ∼20,000 proteins in the human proteome is a potential target for compounds that bind to it and modify its function. The 3D structures of most of these proteins are now available. Here, we discuss the prospects for using these structures to perform proteome-wide virtual HTS (VHTS). We compare physics-based (docking) and AI VHTS approaches, some of which are now being applied with large databases of compounds to thousands of targets. Although preliminary proteome-wide screens are now within our grasp, further methodological developments are expected to improve the accuracy of the results.
Collapse
Affiliation(s)
- Micholas Dean Smith
- University of Tennessee/Oak Ridge National Laboratory Center for Molecular Biophysics, Oak Ridge, TN 37830, USA; Department of Biochemistry and Cellular and Molecular Biology, University of Tennessee, Knoxville, TN 37996, USA
| | - L Darryl Quarles
- Departments of Medicine, University of Tennessee Health Science Center, Memphis, TN 38163, USA; ORRxD LLC, 3404 Olney Drive, Durham, NC 27705, USA
| | - Omar Demerdash
- Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN 37830, USA
| | - Jeremy C Smith
- University of Tennessee/Oak Ridge National Laboratory Center for Molecular Biophysics, Oak Ridge, TN 37830, USA; Department of Biochemistry and Cellular and Molecular Biology, University of Tennessee, Knoxville, TN 37996, USA.
| |
Collapse
|
5
|
Li B, Wang Y, Yin Z, Xu L, Xie L, Xu X. Decision tree-based identification of important molecular fragments for protein-ligand binding. Chem Biol Drug Des 2024; 103:e14427. [PMID: 38230776 DOI: 10.1111/cbdd.14427] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2023] [Revised: 11/16/2023] [Accepted: 12/11/2023] [Indexed: 01/18/2024]
Abstract
Fragment-based drug design is an emerging technology in pharmaceutical research and development. One of the key aspects of this technology is the identification and quantitative characterization of molecular fragments. This study presents a strategy for identifying important molecular fragments based on molecular fingerprints and decision tree algorithms and verifies its feasibility in predicting protein-ligand binding affinity. Specifically, the three-dimensional (3D) structures of protein-ligand complexes are encoded using extended-connectivity fingerprints (ECFP), and three decision tree models, namely Random Forest, XGBoost, and LightGBM, are used to quantitatively characterize the feature importance, thereby extracting important molecular fragments with high reliability. Few-shot learning reveals that the extracted molecular fragments contribute significantly and consistently to the binding affinity even with a small sample size. Despite the absence of location and distance information for molecular fragments in ECFP, 3D visualization, in combination with the reverse ECFP process, shows that the majority of the extracted fragments are located at the binding interface of the protein and the ligand. This alignment with the distance constraints critical for binding affinity further supports the reliability of the strategy for identifying important molecular fragments.
Collapse
Affiliation(s)
- Baiyi Li
- Institute of Bioinformatics and Medical Engineering, Jiangsu University of Technology, Changzhou, China
| | - Yunsong Wang
- School of Electrical and Information Engineering, Jiangsu University of Technology, Changzhou, China
| | - Zuode Yin
- Institute of Bioinformatics and Medical Engineering, Jiangsu University of Technology, Changzhou, China
| | - Lei Xu
- Institute of Bioinformatics and Medical Engineering, Jiangsu University of Technology, Changzhou, China
| | - Liangxu Xie
- Institute of Bioinformatics and Medical Engineering, Jiangsu University of Technology, Changzhou, China
| | - Xiaojun Xu
- Institute of Bioinformatics and Medical Engineering, Jiangsu University of Technology, Changzhou, China
| |
Collapse
|
6
|
Das K, Paltani M, Tripathi PK, Kumar R, Verma S, Kumar S, Jain CK. Current implications and challenges of artificial intelligence technologies in therapeutic intervention of colorectal cancer. EXPLORATION OF TARGETED ANTI-TUMOR THERAPY 2023; 4:1286-1300. [PMID: 38213536 PMCID: PMC10776591 DOI: 10.37349/etat.2023.00197] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2023] [Accepted: 08/28/2023] [Indexed: 01/13/2024] Open
Abstract
Irrespective of men and women, colorectal cancer (CRC), is the third most common cancer in the population with more than 1.85 million cases annually. Fewer than 20% of patients only survive beyond five years from diagnosis. CRC is a highly preventable disease if diagnosed at the early stage of malignancy. Several screening methods like endoscopy (like colonoscopy; gold standard), imaging examination [computed tomographic colonography (CTC)], guaiac-based fecal occult blood (gFOBT), immunochemical test from faeces, and stool DNA test are available with different levels of sensitivity and specificity. The available screening methods are associated with certain drawbacks like invasiveness, cost, or sensitivity. In recent years, computer-aided systems-based screening, diagnosis, and treatment have been very promising in the early-stage detection and diagnosis of CRC cases. Artificial intelligence (AI) is an enormously in-demand, cost-effective technology, that uses various tools machine learning (ML), and deep learning (DL) to screen, diagnose, and stage, and has great potential to treat CRC. Moreover, different ML algorithms and neural networks [artificial neural network (ANN), k-nearest neighbors (KNN), and support vector machines (SVMs)] have been deployed to predict precise and personalized treatment options. This review examines and summarizes different ML and DL models used for therapeutic intervention in CRC cancer along with the gap and challenges for AI.
Collapse
Affiliation(s)
- Kriti Das
- Department of Artificial Intelligence and Precision Medicine, School of Allied Health Sciences and Management, Delhi Pharmaceutical Sciences and Research University, New Delhi 110017, India
| | - Maanvi Paltani
- Department of Artificial Intelligence and Precision Medicine, School of Allied Health Sciences and Management, Delhi Pharmaceutical Sciences and Research University, New Delhi 110017, India
| | - Pankaj Kumar Tripathi
- Department of Biotechnology, Jaypee Institute of Information Technology, Noida 201309, Uttar Pradesh, India
| | - Rajnish Kumar
- Department of Medical Laboratory Technology, School of Allied Health Sciences, Delhi Pharmaceutical Sciences and Research University, Delhi 110017, India
| | - Saniya Verma
- Department of Medical Laboratory Technology, School of Allied Health Sciences, Delhi Pharmaceutical Sciences and Research University, Delhi 110017, India
| | - Subodh Kumar
- Department of Medical Laboratory Technology, School of Allied Health Sciences, Delhi Pharmaceutical Sciences and Research University, Delhi 110017, India
| | - Chakresh Kumar Jain
- Department of Biotechnology, Jaypee Institute of Information Technology, Noida 201309, Uttar Pradesh, India
| |
Collapse
|
7
|
Meli R, Morris GM, Biggin PC. Scoring Functions for Protein-Ligand Binding Affinity Prediction using Structure-Based Deep Learning: A Review. FRONTIERS IN BIOINFORMATICS 2022; 2:885983. [PMID: 36187180 PMCID: PMC7613667 DOI: 10.3389/fbinf.2022.885983] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2022] [Accepted: 05/11/2022] [Indexed: 01/01/2023] Open
Abstract
The rapid and accurate in silico prediction of protein-ligand binding free energies or binding affinities has the potential to transform drug discovery. In recent years, there has been a rapid growth of interest in deep learning methods for the prediction of protein-ligand binding affinities based on the structural information of protein-ligand complexes. These structure-based scoring functions often obtain better results than classical scoring functions when applied within their applicability domain. Here we review structure-based scoring functions for binding affinity prediction based on deep learning, focussing on different types of architectures, featurization strategies, data sets, methods for training and evaluation, and the role of explainable artificial intelligence in building useful models for real drug-discovery applications.
Collapse
Affiliation(s)
- Rocco Meli
- Department of Biochemistry, University of Oxford, Oxford, United Kingdom
| | - Garrett M. Morris
- Department of Statistics, University of Oxford, Oxford, United Kingdom
| | - Philip C. Biggin
- Department of Biochemistry, University of Oxford, Oxford, United Kingdom
| |
Collapse
|
8
|
Rezaei MA, Li Y, Wu D, Li X, Li C. Deep Learning in Drug Design: Protein-Ligand Binding Affinity Prediction. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:407-417. [PMID: 33360998 PMCID: PMC8942327 DOI: 10.1109/tcbb.2020.3046945] [Citation(s) in RCA: 19] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Computational drug design relies on the calculation of binding strength between two biological counterparts especially a chemical compound, i.e., a ligand, and a protein. Predicting the affinity of protein-ligand binding with reasonable accuracy is crucial for drug discovery, and enables the optimization of compounds to achieve better interaction with their target protein. In this paper, we propose a data-driven framework named DeepAtom to accurately predict the protein-ligand binding affinity. With 3D Convolutional Neural Network (3D-CNN) architecture, DeepAtom could automatically extract binding related atomic interaction patterns from the voxelized complex structure. Compared with the other CNN based approaches, our light-weight model design effectively improves the model representational capacity, even with the limited available training data. We carried out validation experiments on the PDBbind v.2016 benchmark and the independent Astex Diverse Set. We demonstrate that the less feature engineering dependent DeepAtom approach consistently outperforms the other baseline scoring methods. We also compile and propose a new benchmark dataset to further improve the model performances. With the new dataset as training input, DeepAtom achieves Pearson's R=0.83 and RMSE=1.23 pK units on the PDBbind v.2016 core set. The promising results demonstrate that DeepAtom models can be potentially adopted in computational drug development protocols such as molecular docking and virtual screening.
Collapse
Affiliation(s)
- Mohammad A. Rezaei
- Department of Medicinal Chemistry, Center for Natural Products, Drug Discovery and Development (CNPD3), University of Florida
| | - Yanjun Li
- Large-scale Intelligent Systems Laboratory, NSF Center for Big Learning, University of Florida Gainesville, FL, USA
| | - Dapeng Wu
- Large-scale Intelligent Systems Laboratory, NSF Center for Big Learning, University of Florida Gainesville, FL, USA
| | - Xiaolin Li
- Cognization Lab, Palo Alto, California, USA
| | - Chenglong Li
- Department of Medicinal Chemistry, Center for Natural Products, Drug Discovery and Development (CNPD3), University of Florida
- Large-scale Intelligent Systems Laboratory, NSF Center for Big Learning, University of Florida Gainesville, FL, USA
| |
Collapse
|
9
|
Can docking scoring functions guarantee success in virtual screening? VIRTUAL SCREENING AND DRUG DOCKING 2022. [DOI: 10.1016/bs.armc.2022.08.008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
10
|
Using diverse potentials and scoring functions for the development of improved machine-learned models for protein-ligand affinity and docking pose prediction. J Comput Aided Mol Des 2021; 35:1095-1123. [PMID: 34708263 DOI: 10.1007/s10822-021-00423-4] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2021] [Accepted: 10/11/2021] [Indexed: 10/20/2022]
Abstract
The advent of computational drug discovery holds the promise of significantly reducing the effort of experimentalists, along with monetary cost. More generally, predicting the binding of small organic molecules to biological macromolecules has far-reaching implications for a range of problems, including metabolomics. However, problems such as predicting the bound structure of a protein-ligand complex along with its affinity have proven to be an enormous challenge. In recent years, machine learning-based methods have proven to be more accurate than older methods, many based on simple linear regression. Nonetheless, there remains room for improvement, as these methods are often trained on a small set of features, with a single functional form for any given physical effect, and often with little mention of the rationale behind choosing one functional form over another. Moreover, it is not entirely clear why one machine learning method is favored over another. In this work, we endeavor to undertake a comprehensive effort towards developing high-accuracy, machine-learned scoring functions, systematically investigating the effects of machine learning method and choice of features, and, when possible, providing insights into the relevant physics using methods that assess feature importance. Here, we show synergism among disparate features, yielding adjusted R2 with experimental binding affinities of up to 0.871 on an independent test set and enrichment for native bound structures of up to 0.913. When purely physical terms that model enthalpic and entropic effects are used in the training, we use feature importance assessments to probe the relevant physics and hopefully guide future investigators working on this and other computational chemistry problems.
Collapse
|
11
|
Veit-Acosta M, de Azevedo Junior WF. Computational Prediction of Binding Affinity for CDK2-ligand Complexes. A Protein Target for Cancer Drug Discovery. Curr Med Chem 2021; 29:2438-2455. [PMID: 34365938 DOI: 10.2174/0929867328666210806105810] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2021] [Revised: 06/15/2021] [Accepted: 06/22/2021] [Indexed: 11/22/2022]
Abstract
BACKGROUND CDK2 participates in the control of eukaryotic cell-cycle progression. Due to the great interest in CDK2 for drug development and the relative easiness in crystallizing this enzyme, we have over 400 structural studies focused on this protein target. This structural data is the basis for the development of computational models to estimate CDK2-ligand binding affinity. OBJECTIVE This work focuses on the recent developments in the application of supervised machine learning modeling to develop scoring functions to predict the binding affinity of CDK2. METHOD We employed the structures available at the protein data bank and the ligand information accessed from the BindingDB, Binding MOAD, and PDBbind to evaluate the predictive performance of machine learning techniques combined with physical modeling used to calculate binding affinity. We compared this hybrid methodology with classical scoring functions available in docking programs. RESULTS Our comparative analysis of previously published models indicated that a model created using a combination of a mass-spring system and cross-validated Elastic Net to predict the binding affinity of CDK2-inhibitor complexes outperformed classical scoring functions available in AutoDock4 and AutoDock Vina. CONCLUSION All studies reviewed here suggest that targeted machine learning models are superior to classical scoring functions to calculate binding affinities. Specifically for CDK2, we see that the combination of physical modeling with supervised machine learning techniques exhibits improved predictive performance to calculate the protein-ligand binding affinity. These results find theoretical support in the application of the concept of scoring function space.
Collapse
Affiliation(s)
- Martina Veit-Acosta
- Western Michigan University, 1903 Western, Michigan Ave, Kalamazoo, MI 49008. United States
| | | |
Collapse
|
12
|
Xiong G, Shen C, Yang Z, Jiang D, Liu S, Lu A, Chen X, Hou T, Cao D. Featurization strategies for protein–ligand interactions and their applications in scoring function development. WIRES COMPUTATIONAL MOLECULAR SCIENCE 2021. [DOI: 10.1002/wcms.1567] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Affiliation(s)
- Guoli Xiong
- Xiangya School of Pharmaceutical Sciences Central South University Changsha China
| | - Chao Shen
- Hangzhou Institute of Innovative Medicine, College of Pharmaceutical Sciences Zhejiang University Hangzhou China
| | - Ziyi Yang
- Xiangya School of Pharmaceutical Sciences Central South University Changsha China
| | - Dejun Jiang
- Hangzhou Institute of Innovative Medicine, College of Pharmaceutical Sciences Zhejiang University Hangzhou China
- College of Computer Science and Technology Zhejiang University Hangzhou China
| | - Shao Liu
- Department of Pharmacy Xiangya Hospital, Central South University Changsha China
| | - Aiping Lu
- Institute for Advancing Translational Medicine in Bone & Joint Diseases, School of Chinese Medicine Hong Kong Baptist University Hong Kong SAR China
| | - Xiang Chen
- Department of Dermatology, Hunan Engineering Research Center of Skin Health and Disease, Hunan Key Laboratory of Skin Cancer and Psoriasis Xiangya Hospital, Central South University Changsha China
| | - Tingjun Hou
- Hangzhou Institute of Innovative Medicine, College of Pharmaceutical Sciences Zhejiang University Hangzhou China
| | - Dongsheng Cao
- Xiangya School of Pharmaceutical Sciences Central South University Changsha China
- Institute for Advancing Translational Medicine in Bone & Joint Diseases, School of Chinese Medicine Hong Kong Baptist University Hong Kong SAR China
| |
Collapse
|
13
|
Kashyap K, Siddiqi MI. Recent trends in artificial intelligence-driven identification and development of anti-neurodegenerative therapeutic agents. Mol Divers 2021; 25:1517-1539. [PMID: 34282519 DOI: 10.1007/s11030-021-10274-8] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2021] [Accepted: 07/05/2021] [Indexed: 12/12/2022]
Abstract
Neurological disorders affect various aspects of life. Finding drugs for the central nervous system is a very challenging and complex task due to the involvement of the blood-brain barrier, P-glycoprotein, and the drug's high attrition rates. The availability of big data present in online databases and resources has enabled the emergence of artificial intelligence techniques including machine learning to analyze, process the data, and predict the unknown data with high efficiency. The use of these modern techniques has revolutionized the whole drug development paradigm, with an unprecedented acceleration in the central nervous system drug discovery programs. Also, the new deep learning architectures proposed in many recent works have given a better understanding of how artificial intelligence can tackle big complex problems that arose due to central nervous system disorders. Therefore, the present review provides comprehensive and up-to-date information on machine learning/artificial intelligence-triggered effort in the brain care domain. In addition, a brief overview is presented on machine learning algorithms and their uses in structure-based drug design, ligand-based drug design, ADMET prediction, de novo drug design, and drug repurposing. Lastly, we conclude by discussing the major challenges and limitations posed and how they can be tackled in the future by using these modern machine learning/artificial intelligence approaches.
Collapse
Affiliation(s)
- Kushagra Kashyap
- Academy of Scientific and Innovative Research (AcSIR), CSIR-Central Drug Research Institute (CSIR-CDRI) Campus, Lucknow, India.,Molecular and Structural Biology Division, CSIR-Central Drug Research Institute (CSIR-CDRI), Sector 10, Jankipuram Extension, Sitapur Road, Lucknow, 226031, India
| | - Mohammad Imran Siddiqi
- Academy of Scientific and Innovative Research (AcSIR), CSIR-Central Drug Research Institute (CSIR-CDRI) Campus, Lucknow, India. .,Molecular and Structural Biology Division, CSIR-Central Drug Research Institute (CSIR-CDRI), Sector 10, Jankipuram Extension, Sitapur Road, Lucknow, 226031, India.
| |
Collapse
|
14
|
Bitencourt-Ferreira G, Rizzotto C, de Azevedo Junior WF. Machine Learning-Based Scoring Functions, Development and Applications with SAnDReS. Curr Med Chem 2021; 28:1746-1756. [PMID: 32410551 DOI: 10.2174/0929867327666200515101820] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2019] [Revised: 04/06/2020] [Accepted: 04/07/2020] [Indexed: 11/22/2022]
Abstract
BACKGROUND Analysis of atomic coordinates of protein-ligand complexes can provide three-dimensional data to generate computational models to evaluate binding affinity and thermodynamic state functions. Application of machine learning techniques can create models to assess protein-ligand potential energy and binding affinity. These methods show superior predictive performance when compared with classical scoring functions available in docking programs. OBJECTIVE Our purpose here is to review the development and application of the program SAnDReS. We describe the creation of machine learning models to assess the binding affinity of protein-ligand complexes. METHODS SAnDReS implements machine learning methods available in the scikit-learn library. This program is available for download at https://github.com/azevedolab/sandres. SAnDReS uses crystallographic structures, binding and thermodynamic data to create targeted scoring functions. RESULTS Recent applications of the program SAnDReS to drug targets such as Coagulation factor Xa, cyclin-dependent kinases and HIV-1 protease were able to create targeted scoring functions to predict inhibition of these proteins. These targeted models outperform classical scoring functions. CONCLUSION Here, we reviewed the development of machine learning scoring functions to predict binding affinity through the application of the program SAnDReS. Our studies show the superior predictive performance of the SAnDReS-developed models when compared with classical scoring functions available in the programs such as AutoDock4, Molegro Virtual Docker and AutoDock Vina.
Collapse
Affiliation(s)
| | - Camila Rizzotto
- Pontifical Catholic University of Rio Grande do Sul - PUCRS, Porto Alegre-RS, Brazil
| | | |
Collapse
|
15
|
Ji B, He X, Zhai J, Zhang Y, Man VH, Wang J. Machine learning on ligand-residue interaction profiles to significantly improve binding affinity prediction. Brief Bioinform 2021; 22:6184410. [PMID: 33758923 DOI: 10.1093/bib/bbab054] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2020] [Revised: 01/06/2021] [Accepted: 02/02/2021] [Indexed: 01/01/2023] Open
Abstract
Structure-based virtual screenings (SBVSs) play an important role in drug discovery projects. However, it is still a challenge to accurately predict the binding affinity of an arbitrary molecule binds to a drug target and prioritize top ligands from an SBVS. In this study, we developed a novel method, using ligand-residue interaction profiles (IPs) to construct machine learning (ML)-based prediction models, to significantly improve the screening performance in SBVSs. Such a kind of the prediction model is called an IP scoring function (IP-SF). We systematically investigated how to improve the performance of IP-SFs from many perspectives, including the sampling methods before interaction energy calculation and different ML algorithms. Using six drug targets with each having hundreds of known ligands, we conducted a critical evaluation on the developed IP-SFs. The IP-SFs employing a gradient boosting decision tree (GBDT) algorithm in conjunction with the MIN + GB simulation protocol achieved the best overall performance. Its scoring power, ranking power and screening power significantly outperformed the Glide SF. First, compared with Glide, the average values of mean absolute error and root mean square error of GBDT/MIN + GB decreased about 38 and 36%, respectively. Second, the mean values of squared correlation coefficient and predictive index increased about 225 and 73%, respectively. Third, more encouragingly, the average value of the areas under the curve of receiver operating characteristic for six targets by GBDT, 0.87, is significantly better than that by Glide, which is only 0.71. Thus, we expected IP-SFs to have broad and promising applications in SBVSs.
Collapse
Affiliation(s)
- Beihong Ji
- Department of Pharmaceutical Sciences and Computational Chemical Genomics Screening Center, School of Pharmacy, University of Pittsburgh, Pittsburgh, PA 15261, USA
| | - Xibing He
- Department of Pharmaceutical Sciences and Computational Chemical Genomics Screening Center, School of Pharmacy, University of Pittsburgh, Pittsburgh, PA 15261, USA
| | - Jingchen Zhai
- Department of Pharmaceutical Sciences and Computational Chemical Genomics Screening Center, School of Pharmacy, University of Pittsburgh, Pittsburgh, PA 15261, USA
| | - Yuzhao Zhang
- Department of Pharmaceutical Sciences and Computational Chemical Genomics Screening Center, School of Pharmacy, University of Pittsburgh, Pittsburgh, PA 15261, USA
| | - Viet Hoang Man
- Department of Pharmaceutical Sciences and Computational Chemical Genomics Screening Center, School of Pharmacy, University of Pittsburgh, Pittsburgh, PA 15261, USA
| | - Junmei Wang
- Department of Pharmaceutical Sciences and Computational Chemical Genomics Screening Center, School of Pharmacy, University of Pittsburgh, Pittsburgh, PA 15261, USA
| |
Collapse
|
16
|
Baskaran SG, Sharp TP, Sharp KA. Computational Graphics Software for Interactive Docking and Visualization of Ligand-Protein Complementarity. J Chem Inf Model 2021; 61:1427-1443. [PMID: 33656873 DOI: 10.1021/acs.jcim.0c01485] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
The Dockeye software is designed to complement automated docking protocols by allowing the user's chemical know-how and experience of what makes for good protein-ligand binding, knowledge that is not easily encoded into automated algorithms, to guide the docking. It allows the interactive manipulation of the ligand placement against a protein target. Real-time intuitively comprehensible feedback about the location, spatial density, and the extent of both favorable and unfavorable atomic interactions between ligand and protein is provided through a carefully designed graphical object. It is also a tool for the graphical analysis of the interactions of known protein-ligand complexes. Comparative docking of 58 protein-ligand complexes with Dockeye and Autodock Vina shows how this software can be used synergistically with automated docking programs to significantly improve the task of discovery of ligand placement.
Collapse
Affiliation(s)
- Saravana G Baskaran
- Platelet Biogenesis, 65 Grove Street, Suite 303, Watertown, Massachusetts 02472, United States
| | - Thayne P Sharp
- Harriton High School, 600 North Ithan Avenue, Bryn Mawr, Pennsylvania 19010, United States
| | - Kim A Sharp
- Department of Biochemistry and Biophysics, Perelman School of Medicine at the University of Pennsylvania, 3620 Hamilton Walk, Philadelphia, Pennsylvania 19104-6073, United States
| |
Collapse
|
17
|
Wang DD, Xie H, Yan H. Proteo-chemometrics interaction fingerprints of protein-ligand complexes predict binding affinity. Bioinformatics 2021; 37:2570-2579. [PMID: 33650636 DOI: 10.1093/bioinformatics/btab132] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2020] [Revised: 01/10/2021] [Accepted: 02/25/2021] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Reliable predictive models of protein-ligand binding affinity are required in many areas of biomedical research. Accurate prediction based on current descriptors or molecular fingerprints remains a challenge. We develop novel interaction fingerprints (IFPs) to encode protein-ligand interactions and use them to improve the prediction. RESULTS Proteo-chemometrics IFPs (PrtCmm IFPs) formed by combining extended connectivity fingerprints (ECFPs) with the proteo-chemometrics concept, were developed. Combining PrtCmm IFPs with machine-learning models led to efficient scoring models, which were validated on the PDBbind v2019 core set and CSAR-HiQ sets. The PrtCmm IFP Score outperformed several other models in predicting protein-ligand binding affinities. Besides, conventional ECFPs were simplified to generate new IFPs, which provided consistent but faster predictions. The relationship between the base atom properties of ECFPs and the accuracy of predictions was also investigated. AVAILABILITY PrtCmm IFP has been implemented in the IFP Score Toolkit on github https://github.com/debbydanwang/IFPscore. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Debby D Wang
- Institute of Medical Information Engineering, School of Medical Instrument and Food Engineering,University of Shanghai for Science and Technology, 516 Jungong Rd, Shanghai 200093, China
| | - Haoran Xie
- Department of Computing and Decision Sciences, Lingnan University, 8 Castle Peak Rd, Tuen Mun, Hong Kong
| | - Hong Yan
- Department of Electrical Engineering, City University of Hong Kong, Tat Chee Avenue, Kowloon, Hong Kong
| |
Collapse
|
18
|
Wang DD, Zhu M, Yan H. Computationally predicting binding affinity in protein-ligand complexes: free energy-based simulations and machine learning-based scoring functions. Brief Bioinform 2020; 22:5860693. [PMID: 32591817 DOI: 10.1093/bib/bbaa107] [Citation(s) in RCA: 26] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2020] [Revised: 04/20/2020] [Accepted: 05/05/2020] [Indexed: 12/18/2022] Open
Abstract
Accurately predicting protein-ligand binding affinities can substantially facilitate the drug discovery process, but it remains as a difficult problem. To tackle the challenge, many computational methods have been proposed. Among these methods, free energy-based simulations and machine learning-based scoring functions can potentially provide accurate predictions. In this paper, we review these two classes of methods, following a number of thermodynamic cycles for the free energy-based simulations and a feature-representation taxonomy for the machine learning-based scoring functions. More recent deep learning-based predictions, where a hierarchy of feature representations are generally extracted, are also reviewed. Strengths and weaknesses of the two classes of methods, coupled with future directions for improvements, are comparatively discussed.
Collapse
Affiliation(s)
- Debby D Wang
- School of Medical Instrument and Food Engineering, University of Shanghai for Science and Technology
| | - Mengxu Zhu
- Department of Electrical Engineering, City University of Hong Kong
| | - Hong Yan
- College of Science and Engineering, City University of Hong Kong
| |
Collapse
|
19
|
Advancing Drug Discovery via Artificial Intelligence. Trends Pharmacol Sci 2019; 40:592-604. [DOI: 10.1016/j.tips.2019.06.004] [Citation(s) in RCA: 164] [Impact Index Per Article: 32.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2019] [Revised: 05/23/2019] [Accepted: 06/11/2019] [Indexed: 01/15/2023]
|
20
|
Shen C, Ding J, Wang Z, Cao D, Ding X, Hou T. From machine learning to deep learning: Advances in scoring functions for protein–ligand docking. WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL MOLECULAR SCIENCE 2019. [DOI: 10.1002/wcms.1429] [Citation(s) in RCA: 76] [Impact Index Per Article: 15.2] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Affiliation(s)
- Chao Shen
- Hangzhou Institute of Innovative Medicine, College of Pharmaceutical Sciences, Zhejiang University Hangzhou P. R. China
| | - Junjie Ding
- Beijing Institute of Pharmaceutical Chemistry Beijing P. R. China
| | - Zhe Wang
- Hangzhou Institute of Innovative Medicine, College of Pharmaceutical Sciences, Zhejiang University Hangzhou P. R. China
| | - Dongsheng Cao
- Xiangya School of Pharmaceutical Sciences, Central South University Changsha P. R. China
| | - Xiaoqin Ding
- Beijing Institute of Pharmaceutical Chemistry Beijing P. R. China
| | - Tingjun Hou
- Hangzhou Institute of Innovative Medicine, College of Pharmaceutical Sciences, Zhejiang University Hangzhou P. R. China
| |
Collapse
|
21
|
Yasuo N, Sekijima M. Improved Method of Structure-Based Virtual Screening via Interaction-Energy-Based Learning. J Chem Inf Model 2019; 59:1050-1061. [PMID: 30808172 DOI: 10.1021/acs.jcim.8b00673] [Citation(s) in RCA: 72] [Impact Index Per Article: 14.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Virtual screening is a promising method for obtaining novel hit compounds in drug discovery. It aims to enrich potentially active compounds from a large chemical library for further biological experiments. However, the accuracy of current virtual screening methods is insufficient. In this study, we develop a new virtual screening method named Similarity of Interaction Energy VEctor Score (SIEVE-Score), in which protein-ligand interaction energies are extracted to represent docking poses for machine learning. SIEVE-Score offers substantial improvements compared to other state-of-the-art virtual screening methods, namely, other machine-learning-based scoring functions, interaction fingerprints, and docking software, for the enrichment factor 1% results on the Directory of Useful Decoys, Enhanced (DUD-E). The screening results are also human-interpretable in the form of important interactions for distinguishing between active and inactive compounds. The source code is available at https://github.com/sekijima-lab/SIEVE-Score .
Collapse
Affiliation(s)
- Nobuaki Yasuo
- Department of Computer Science , Tokyo Institute of Technology , 4259-J3-23, Nagatsuta-cho , Midori-ku, Yokohama , Japan
| | - Masakazu Sekijima
- Department of Computer Science , Tokyo Institute of Technology , 4259-J3-23, Nagatsuta-cho , Midori-ku, Yokohama , Japan.,Advanced Computational Drug Discovery Unit , Tokyo Institute of Technology , 4259-J3-23, Nagatsuta-cho , Midori-ku, Yokohama , Japan
| |
Collapse
|
22
|
Berishvili VP, Voronkov AE, Radchenko EV, Palyulin VA. Machine Learning Classification Models to Improve the Docking-based Screening: A Case of PI3K-Tankyrase Inhibitors. Mol Inform 2018; 37:e1800030. [DOI: 10.1002/minf.201800030] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2018] [Accepted: 05/28/2018] [Indexed: 01/20/2023]
Affiliation(s)
- Vladimir P. Berishvili
- Department of Chemistry; Lomonosov Moscow State University; Leninskie gory 1/3 Moscow 119991 Russia
| | - Andrew E. Voronkov
- Department of Chemistry; Lomonosov Moscow State University; Leninskie gory 1/3 Moscow 119991 Russia
- Digital BioPharm Ltd.; Hovseterveien 42 A, H0301 Oslo 0768 Norway
| | - Eugene V. Radchenko
- Department of Chemistry; Lomonosov Moscow State University; Leninskie gory 1/3 Moscow 119991 Russia
| | - Vladimir A. Palyulin
- Department of Chemistry; Lomonosov Moscow State University; Leninskie gory 1/3 Moscow 119991 Russia
| |
Collapse
|
23
|
Hadi-Alijanvand H, Rouhani M. Partner-Specific Prediction of Protein-Dimer Stability from Unbound Structure of Monomer. J Chem Inf Model 2018; 58:733-745. [PMID: 29444397 DOI: 10.1021/acs.jcim.7b00606] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
Abstract
Protein complexes play deterministic roles in live entities in sensing, compiling, controlling, and responding to external and internal stimuli. Thermodynamic stability is an important property of protein complexes; having knowledge about complex stability helps us to understand the basics of protein assembly-related diseases and the mechanism of protein assembly clearly. Enormous protein-protein interactions, detected by high-throughput methods, necessitate finding fast methods for predicting the stability of protein assemblies in a quantitative and qualitative manner. The existing methods of predicting complex stability need knowledge about the three-dimensional (3D) structure of the intended protein complex. Here, we introduce a new method for predicting dissociation free energy of subunits by analyzing the structural and topological properties of a protein binding patch on a single subunit of the desired protein complex. The method needs the 3D structure of just one subunit and the information about the position of the intended binding site on the surface of that subunit to predict dimer stability in a classwise manner. The patterns of structural and topological properties of a protein binding patch are decoded by recurrence quantification analysis. Nonparametric discrimination is then utilized to predict the stability class of the intended dimer with accuracy greater than 85%.
Collapse
Affiliation(s)
- Hamid Hadi-Alijanvand
- Department of Biological Sciences , Institute for Advanced Studies in Basic Sciences (IASBS) , Zanjan , 45137-66731 , Iran
| | - Maryam Rouhani
- Department of Biological Sciences , Institute for Advanced Studies in Basic Sciences (IASBS) , Zanjan , 45137-66731 , Iran
| |
Collapse
|
24
|
Jiménez J, Škalič M, Martínez-Rosell G, De Fabritiis G. KDEEP: Protein–Ligand Absolute Binding Affinity Prediction via 3D-Convolutional Neural Networks. J Chem Inf Model 2018; 58:287-296. [DOI: 10.1021/acs.jcim.7b00650] [Citation(s) in RCA: 389] [Impact Index Per Article: 64.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023]
Affiliation(s)
- José Jiménez
- Computational
Biophysics Laboratory, Universitat Pompeu Fabra, Parc de Recerca Biomèdica de Barcelona, Carrer del Dr. Aiguader
88, Barcelona 08003, Spain
| | - Miha Škalič
- Computational
Biophysics Laboratory, Universitat Pompeu Fabra, Parc de Recerca Biomèdica de Barcelona, Carrer del Dr. Aiguader
88, Barcelona 08003, Spain
| | - Gerard Martínez-Rosell
- Computational
Biophysics Laboratory, Universitat Pompeu Fabra, Parc de Recerca Biomèdica de Barcelona, Carrer del Dr. Aiguader
88, Barcelona 08003, Spain
| | - Gianni De Fabritiis
- Computational
Biophysics Laboratory, Universitat Pompeu Fabra, Parc de Recerca Biomèdica de Barcelona, Carrer del Dr. Aiguader
88, Barcelona 08003, Spain
- Institució Catalana de Recerca i Estudis Avançats (ICREA), Passeig Lluis Companys 23, 08010 Barcelona, Spain
| |
Collapse
|
25
|
Pillong M, Marx C, Piechon P, Wicker JGP, Cooper RI, Wagner T. A publicly available crystallisation data set and its application in machine learning. CrystEngComm 2017. [DOI: 10.1039/c7ce00738h] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
A publicly available crystallisation database for clusters of highly similar compounds is used to build machine learning models.
Collapse
Affiliation(s)
- Max Pillong
- Global Discovery Chemistry Analytics
- Novartis Institutes for Biomedical Research
- 4002 Basel
- Switzerland
| | - Corinne Marx
- Global Discovery Chemistry Analytics
- Novartis Institutes for Biomedical Research
- 4002 Basel
- Switzerland
| | - Philippe Piechon
- Global Discovery Chemistry Analytics
- Novartis Institutes for Biomedical Research
- 4002 Basel
- Switzerland
| | | | | | - Trixie Wagner
- Global Discovery Chemistry Analytics
- Novartis Institutes for Biomedical Research
- 4002 Basel
- Switzerland
| |
Collapse
|
26
|
Affiliation(s)
- Dawei Zhang
- School of Physics and Engineering, Henan University of Science and Technology, Luoyang, P. R. China
| | - Haisheng Li
- School of Physics and Engineering, Henan University of Science and Technology, Luoyang, P. R. China
| | - Huixian Wang
- School of Physics and Engineering, Henan University of Science and Technology, Luoyang, P. R. China
| | - Liben Li
- School of Physics and Engineering, Henan University of Science and Technology, Luoyang, P. R. China
| |
Collapse
|
27
|
Pason LP, Sotriffer CA. Empirical Scoring Functions for Affinity Prediction of Protein-ligand Complexes. Mol Inform 2016; 35:541-548. [PMID: 27870243 DOI: 10.1002/minf.201600048] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2016] [Accepted: 06/01/2016] [Indexed: 12/31/2022]
Abstract
The ability to rapidly assess the quality of a protein-ligand complex in terms of its affinity is of fundamental importance for various methods of computer-aided drug design. While simple filtering or matching critieria may be sufficient in fast docking methods or at early stages of virtual screening, estimates of the actual free energy of binding are needed whenever refined docking solutions, ligand rankings or support for the optimization of hit compounds are required. If rigorous free energy calculations based on molecular simulations are impractical, such affinity estimates are provided by scoring functions. The class of empirical scoring functions aims to provide them via a regression-based approach. Using experimental structures and affinity data of protein-ligand complexes and descriptors suitable to capture the essential features of the interaction, these functions are trained with classical linear regression techniques or machine-learning methods. The latter have led to considerable improvements in terms of prediction accuracy for large generic data sets. Nevertheless, many limitations are not yet resolved and pose significant challenges for future developments.
Collapse
Affiliation(s)
- Lukas P Pason
- Institute of Pharmacy and Food Chemistry, University of Würzburg, Am Hubland, D-97074, Würzburg, Germany
| | - Christoph A Sotriffer
- Institute of Pharmacy and Food Chemistry, University of Würzburg, Am Hubland, D-97074, Würzburg, Germany
| |
Collapse
|
28
|
Ehrt C, Brinkjost T, Koch O. Impact of Binding Site Comparisons on Medicinal Chemistry and Rational Molecular Design. J Med Chem 2016; 59:4121-51. [PMID: 27046190 DOI: 10.1021/acs.jmedchem.6b00078] [Citation(s) in RCA: 66] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
Abstract
Modern rational drug design not only deals with the search for ligands binding to interesting and promising validated targets but also aims to identify the function and ligands of yet uncharacterized proteins having impact on different diseases. Additionally, it contributes to the design of inhibitors with distinct selectivity patterns and the prediction of possible off-target effects. The identification of similarities between binding sites of various proteins is a useful approach to cope with those challenges. The main scope of this perspective is to describe applications of different protein binding site comparison approaches to outline their applicability and impact on molecular design. The article deals with various substantial application domains and provides some outstanding examples to show how various binding site comparison methods can be applied to promote in silico drug design workflows. In addition, we will also briefly introduce the fundamental principles of different protein binding site comparison methods.
Collapse
Affiliation(s)
- Christiane Ehrt
- Faculty of Chemistry and Chemical Biology, TU Dortmund University , Otto-Hahn-Straße 6, 44227 Dortmund, Germany
| | - Tobias Brinkjost
- Faculty of Chemistry and Chemical Biology, TU Dortmund University , Otto-Hahn-Straße 6, 44227 Dortmund, Germany.,Department of Computer Science, TU Dortmund University , Otto-Hahn-Straße 14, 44224 Dortmund, Germany
| | - Oliver Koch
- Faculty of Chemistry and Chemical Biology, TU Dortmund University , Otto-Hahn-Straße 6, 44227 Dortmund, Germany
| |
Collapse
|
29
|
Menegatti S, Zakrewsky M, Kumar S, De Oliveira JS, Muraski JA, Mitragotri S. De Novo Design of Skin-Penetrating Peptides for Enhanced Transdermal Delivery of Peptide Drugs. Adv Healthc Mater 2016; 5:602-9. [PMID: 26799634 DOI: 10.1002/adhm.201500634] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2015] [Revised: 09/23/2015] [Indexed: 11/10/2022]
Abstract
Skin-penetrating peptides (SPPs) are attracting increasing attention as a non-invasive strategy for transdermal delivery of therapeutics. The identification of SPP sequences, however, currently performed by experimental screening of peptide libraries, is very laborious. Recent studies have shown that, to be effective enhancers, SPPs must possess affinity for both skin keratin and the drug of interest. We therefore developed a computational process for generating and screening virtual libraries of disulfide-cyclic peptides against keratin and cyclosporine A (CsA) to identify SPPs capable of enhancing transdermal CsA delivery. The selected sequences were experimentally tested and found to bind both CsA and keratin, as determined by mass spectrometry and affinity chromatography, and enhance transdermal permeation of CsA. Four heptameric sequences that emerged as leading candidates (ACSATLQHSCG, ACSLTVNWNCG, ACTSTGRNACG, and ACSASTNHNCG) were tested and yielded CsA permeation on par with previously identified SPP SPACE (TM) . An octameric peptide (ACNAHQARSTCG) yielded significantly higher delivery of CsA compared to heptameric SPPs. The safety profile of the selected sequences was also validated by incubation with skin keratinocytes. This method thus represents an effective procedure for the de novo design of skin-penetrating peptides for the delivery of desired therapeutic or cosmetic agents.
Collapse
Affiliation(s)
- Stefano Menegatti
- Department of Chemical and Biomolecular Engineering; North Carolina State University; Raleigh NC 27695 USA
| | - Michael Zakrewsky
- Center for Bioengineering; Department of Chemical Engineering; University of California; Santa Barbara CA 93106 USA
| | | | - Joshua Sanchez De Oliveira
- Center for Bioengineering; Department of Chemical Engineering; University of California; Santa Barbara CA 93106 USA
| | | | - Samir Mitragotri
- Center for Bioengineering; Department of Chemical Engineering; University of California; Santa Barbara CA 93106 USA
| |
Collapse
|
30
|
Ain QU, Aleksandrova A, Roessler FD, Ballester PJ. Machine-learning scoring functions to improve structure-based binding affinity prediction and virtual screening. WILEY INTERDISCIPLINARY REVIEWS. COMPUTATIONAL MOLECULAR SCIENCE 2015; 5:405-424. [PMID: 27110292 PMCID: PMC4832270 DOI: 10.1002/wcms.1225] [Citation(s) in RCA: 187] [Impact Index Per Article: 20.8] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/03/2015] [Revised: 07/17/2015] [Accepted: 07/18/2015] [Indexed: 12/29/2022]
Abstract
Docking tools to predict whether and how a small molecule binds to a target can be applied if a structural model of such target is available. The reliability of docking depends, however, on the accuracy of the adopted scoring function (SF). Despite intense research over the years, improving the accuracy of SFs for structure-based binding affinity prediction or virtual screening has proven to be a challenging task for any class of method. New SFs based on modern machine-learning regression models, which do not impose a predetermined functional form and thus are able to exploit effectively much larger amounts of experimental data, have recently been introduced. These machine-learning SFs have been shown to outperform a wide range of classical SFs at both binding affinity prediction and virtual screening. The emerging picture from these studies is that the classical approach of using linear regression with a small number of expert-selected structural features can be strongly improved by a machine-learning approach based on nonlinear regression allied with comprehensive data-driven feature selection. Furthermore, the performance of classical SFs does not grow with larger training datasets and hence this performance gap is expected to widen as more training data becomes available in the future. Other topics covered in this review include predicting the reliability of a SF on a particular target class, generating synthetic data to improve predictive performance and modeling guidelines for SF development. WIREs Comput Mol Sci 2015, 5:405-424. doi: 10.1002/wcms.1225 For further resources related to this article, please visit the WIREs website.
Collapse
Affiliation(s)
- Qurrat Ul Ain
- Department of Chemistry, Centre for Molecular Informatics University of Cambridge Cambridge UK
| | | | - Florian D Roessler
- Department of Chemistry, Centre for Molecular Informatics University of Cambridge Cambridge UK
| | - Pedro J Ballester
- Cancer Research Center of Marseille, (INSERM U1068, Institut Paoli-Calmettes, Aix-Marseille Université, CNRS UMR7258) Marseille France
| |
Collapse
|
31
|
Ferreira LG, Dos Santos RN, Oliva G, Andricopulo AD. Molecular docking and structure-based drug design strategies. Molecules 2015; 20:13384-421. [PMID: 26205061 PMCID: PMC6332083 DOI: 10.3390/molecules200713384] [Citation(s) in RCA: 938] [Impact Index Per Article: 104.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2015] [Revised: 07/14/2015] [Accepted: 07/20/2015] [Indexed: 02/07/2023] Open
Abstract
Pharmaceutical research has successfully incorporated a wealth of molecular modeling methods, within a variety of drug discovery programs, to study complex biological and chemical systems. The integration of computational and experimental strategies has been of great value in the identification and development of novel promising compounds. Broadly used in modern drug design, molecular docking methods explore the ligand conformations adopted within the binding sites of macromolecular targets. This approach also estimates the ligand-receptor binding free energy by evaluating critical phenomena involved in the intermolecular recognition process. Today, as a variety of docking algorithms are available, an understanding of the advantages and limitations of each method is of fundamental importance in the development of effective strategies and the generation of relevant results. The purpose of this review is to examine current molecular docking strategies used in drug discovery and medicinal chemistry, exploring the advances in the field and the role played by the integration of structure- and ligand-based methods.
Collapse
Affiliation(s)
- Leonardo G Ferreira
- Laboratório de Química Medicinal e Computacional, Centro de Pesquisa e Inovação em Biodiversidade e Fármacos, Instituto de Física de São Carlos, Universidade de São Paulo, Av. João Dagnone 1100, São Carlos-SP 13563-120, Brazil.
| | - Ricardo N Dos Santos
- Laboratório de Química Medicinal e Computacional, Centro de Pesquisa e Inovação em Biodiversidade e Fármacos, Instituto de Física de São Carlos, Universidade de São Paulo, Av. João Dagnone 1100, São Carlos-SP 13563-120, Brazil.
| | - Glaucius Oliva
- Laboratório de Química Medicinal e Computacional, Centro de Pesquisa e Inovação em Biodiversidade e Fármacos, Instituto de Física de São Carlos, Universidade de São Paulo, Av. João Dagnone 1100, São Carlos-SP 13563-120, Brazil.
| | - Adriano D Andricopulo
- Laboratório de Química Medicinal e Computacional, Centro de Pesquisa e Inovação em Biodiversidade e Fármacos, Instituto de Física de São Carlos, Universidade de São Paulo, Av. João Dagnone 1100, São Carlos-SP 13563-120, Brazil.
| |
Collapse
|
32
|
Sukumar N, Krein MP, Prabhu G, Bhattacharya S, Sen S. Network measures for chemical library design. Drug Dev Res 2015; 75:402-11. [PMID: 25195584 DOI: 10.1002/ddr.21218] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]
Abstract
In this overview, we examine recent developments in network approaches to drug design. A brief overview of networks is followed by a discussion of how chemical similarity networks and their properties address challenges in drug design. Multiple methods used to assess or enhance chemical diversity for early-stage drug discovery are discussed, as well as methods that can be used for drug repositioning and ligand polypharmacology.
Collapse
Affiliation(s)
- Nagamani Sukumar
- Department of Chemistry, Shiv Nadar University, Dadri, Gautam Budh Nagar, U.P., 201314, India; Center for Informatics, Shiv Nadar University, Dadri, Gautam Budh Nagar, U.P., 201314, India
| | | | | | | | | |
Collapse
|
33
|
Li H, Leung KS, Wong MH, Ballester PJ. Improving AutoDock Vina Using Random Forest: The Growing Accuracy of Binding Affinity Prediction by the Effective Exploitation of Larger Data Sets. Mol Inform 2015; 34:115-26. [PMID: 27490034 DOI: 10.1002/minf.201400132] [Citation(s) in RCA: 151] [Impact Index Per Article: 16.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2014] [Accepted: 12/06/2014] [Indexed: 12/28/2022]
Abstract
There is a growing body of evidence showing that machine learning regression results in more accurate structure-based prediction of protein-ligand binding affinity. Docking methods that aim at optimizing the affinity of ligands for a target rely on how accurate their predicted ranking is. However, despite their proven advantages, machine-learning scoring functions are still not widely applied. This seems to be due to insufficient understanding of their properties and the lack of user-friendly software implementing them. Here we present a study where the accuracy of AutoDock Vina, arguably the most commonly-used docking software, is strongly improved by following a machine learning approach. We also analyse the factors that are responsible for this improvement and their generality. Most importantly, with the help of a proposed benchmark, we demonstrate that this improvement will be larger as more data becomes available for training Random Forest models, as regression models implying additive functional forms do not improve with more training data. We discuss how the latter opens the door to new opportunities in scoring function development. In order to facilitate the translation of this advance to enhance structure-based molecular design, we provide software to directly re-score Vina-generated poses and thus strongly improve their predicted binding affinity. The software is available at http://istar.cse.cuhk.edu.hk/rf-score-3.tgz and http://crcm. marseille.inserm.fr/fileadmin/rf-score-3.tgz.
Collapse
Affiliation(s)
- Hongjian Li
- Department of Computer Science and Engineering, Chinese University of Hong Kong, Shatin, New Territories, Hong Kong
| | - Kwong-Sak Leung
- Department of Computer Science and Engineering, Chinese University of Hong Kong, Shatin, New Territories, Hong Kong
| | - Man-Hon Wong
- Department of Computer Science and Engineering, Chinese University of Hong Kong, Shatin, New Territories, Hong Kong
| | - Pedro J Ballester
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK. .,Cancer Research Center of Marseille, INSERM U1068, F-13009 Marseille, France, Institut Paoli-Calmettes, F-13009 Marseille, France, Aix-Marseille Université, F-13284 Marseille, France, CNRS UMR7258, F-13009 Marseille, France.
| |
Collapse
|
34
|
Cortés-Ciriano I, Ain QU, Subramanian V, Lenselink EB, Méndez-Lucio O, IJzerman AP, Wohlfahrt G, Prusis P, Malliavin TE, van Westen GJP, Bender A. Polypharmacology modelling using proteochemometrics (PCM): recent methodological developments, applications to target families, and future prospects. MEDCHEMCOMM 2015. [DOI: 10.1039/c4md00216d] [Citation(s) in RCA: 80] [Impact Index Per Article: 8.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
Proteochemometric (PCM) modelling is a computational method to model the bioactivity of multiple ligands against multiple related protein targets simultaneously.
Collapse
Affiliation(s)
- Isidro Cortés-Ciriano
- Unité de Bioinformatique Structurale
- Institut Pasteur and CNRS UMR 3825
- Structural Biology and Chemistry Department
- 75 724 Paris
- France
| | - Qurrat Ul Ain
- Unilever Centre for Molecular Informatics
- Department of Chemistry
- CB2 1EW Cambridge
- UK
| | | | - Eelke B. Lenselink
- Division of Medicinal Chemistry
- Leiden Academic Centre for Drug Research
- Leiden
- The Netherlands
| | - Oscar Méndez-Lucio
- Unilever Centre for Molecular Informatics
- Department of Chemistry
- CB2 1EW Cambridge
- UK
| | - Adriaan P. IJzerman
- Division of Medicinal Chemistry
- Leiden Academic Centre for Drug Research
- Leiden
- The Netherlands
| | - Gerd Wohlfahrt
- Computer-Aided Drug Design
- Orion Pharma
- FIN-02101 Espoo
- Finland
| | - Peteris Prusis
- Computer-Aided Drug Design
- Orion Pharma
- FIN-02101 Espoo
- Finland
| | - Thérèse E. Malliavin
- Unité de Bioinformatique Structurale
- Institut Pasteur and CNRS UMR 3825
- Structural Biology and Chemistry Department
- 75 724 Paris
- France
| | - Gerard J. P. van Westen
- European Molecular Biology Laboratory
- European Bioinformatics Institute
- Wellcome Trust Genome Campus
- Hinxton
- UK
| | - Andreas Bender
- Unilever Centre for Molecular Informatics
- Department of Chemistry
- CB2 1EW Cambridge
- UK
| |
Collapse
|
35
|
Li Y, Han L, Liu Z, Wang R. Comparative assessment of scoring functions on an updated benchmark: 2. Evaluation methods and general results. J Chem Inf Model 2014; 54:1717-36. [PMID: 24708446 DOI: 10.1021/ci500081m] [Citation(s) in RCA: 242] [Impact Index Per Article: 24.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
Our comparative assessment of scoring functions (CASF) benchmark is created to provide an objective evaluation of current scoring functions. The key idea of CASF is to compare the general performance of scoring functions on a diverse set of protein-ligand complexes. In order to avoid testing scoring functions in the context of molecular docking, the scoring process is separated from the docking (or sampling) process by using ensembles of ligand binding poses that are generated in prior. Here, we describe the technical methods and evaluation results of the latest CASF-2013 study. The PDBbind core set (version 2013) was employed as the primary test set in this study, which consists of 195 protein-ligand complexes with high-quality three-dimensional structures and reliable binding constants. A panel of 20 scoring functions, most of which are implemented in main-stream commercial software, were evaluated in terms of "scoring power" (binding affinity prediction), "ranking power" (relative ranking prediction), "docking power" (binding pose prediction), and "screening power" (discrimination of true binders from random molecules). Our results reveal that the performance of these scoring functions is generally more promising in the docking/screening power tests than in the scoring/ranking power tests. Top-ranked scoring functions in the scoring power test, such as X-Score(HM), ChemScore@SYBYL, ChemPLP@GOLD, and PLP@DS, are also top-ranked in the ranking power test. Top-ranked scoring functions in the docking power test, such as ChemPLP@GOLD, Chemscore@GOLD, GlidScore-SP, LigScore@DS, and PLP@DS, are also top-ranked in the screening power test. Our results obtained on the entire test set and its subsets suggest that the real challenge in protein-ligand binding affinity prediction lies in polar interactions and associated desolvation effect. Nonadditive features observed among high-affinity protein-ligand complexes also need attention.
Collapse
Affiliation(s)
- Yan Li
- State Key Laboratory of Bioorganic and Natural Products Chemistry, Shanghai Institute of Organic Chemistry, Chinese Academy of Sciences , 345 Lingling Road, Shanghai 200032, People's Republic of China
| | | | | | | |
Collapse
|
36
|
Martell RE, Brooks DG, Wang Y, Wilcoxen K. Discovery of novel drugs for promising targets. Clin Ther 2014; 35:1271-81. [PMID: 24054704 DOI: 10.1016/j.clinthera.2013.08.005] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2013] [Revised: 06/27/2013] [Accepted: 08/13/2013] [Indexed: 11/18/2022]
Abstract
BACKGROUND Once a promising drug target is identified, the steps to actually discover and optimize a drug are diverse and challenging. OBJECTIVE The goal of this study was to provide a road map to navigate drug discovery. METHODS Review general steps for drug discovery and provide illustrating references. RESULTS A number of approaches are available to enhance and accelerate target identification and validation. Consideration of a variety of potential mechanisms of action of potential drugs can guide discovery efforts. The hit to lead stage may involve techniques such as high-throughput screening, fragment-based screening, and structure-based design, with informatics playing an ever-increasing role. Biologically relevant screening models are discussed, including cell lines, 3-dimensional culture, and in vivo screening. The process of enabling human studies for an investigational drug is also discussed. CONCLUSIONS Drug discovery is a complex process that has significantly evolved in recent years.
Collapse
Affiliation(s)
- Robert E Martell
- TESARO Inc, Waltham, Massachusetts; Tufts Medical Center, Boston, Massachusetts.
| | | | | | | |
Collapse
|
37
|
Zhan W, Li D, Che J, Zhang L, Yang B, Hu Y, Liu T, Dong X. Integrating docking scores, interaction profiles and molecular descriptors to improve the accuracy of molecular docking: Toward the discovery of novel Akt1 inhibitors. Eur J Med Chem 2014; 75:11-20. [DOI: 10.1016/j.ejmech.2014.01.019] [Citation(s) in RCA: 38] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2013] [Revised: 01/08/2014] [Accepted: 01/13/2014] [Indexed: 11/30/2022]
|
38
|
Ballester PJ, Schreyer A, Blundell TL. Does a more precise chemical description of protein-ligand complexes lead to more accurate prediction of binding affinity? J Chem Inf Model 2014; 54:944-55. [PMID: 24528282 PMCID: PMC3966527 DOI: 10.1021/ci500091r] [Citation(s) in RCA: 129] [Impact Index Per Article: 12.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023]
Abstract
![]()
Predicting
the binding affinities of large sets of diverse molecules against
a range of macromolecular targets is an extremely challenging task.
The scoring functions that attempt such computational prediction are
essential for exploiting and analyzing the outputs of docking, which
is in turn an important tool in problems such as structure-based drug
design. Classical scoring functions assume a predetermined theory-inspired
functional form for the relationship between the variables that describe
an experimentally determined or modeled structure of a protein–ligand
complex and its binding affinity. The inherent problem of this approach
is in the difficulty of explicitly modeling the various contributions
of intermolecular interactions to binding affinity. New scoring functions
based on machine-learning regression models, which are able to exploit
effectively much larger amounts of experimental data and circumvent
the need for a predetermined functional form, have already been shown
to outperform a broad range of state-of-the-art scoring functions
in a widely used benchmark. Here, we investigate the impact of the
chemical description of the complex on the predictive power of the
resulting scoring function using a systematic battery of numerical
experiments. The latter resulted in the most accurate scoring function
to date on the benchmark. Strikingly, we also found that a more precise
chemical description of the protein–ligand complex does not
generally lead to a more accurate prediction of binding affinity.
We discuss four factors that may contribute to this result: modeling
assumptions, codependence of representation and regression, data restricted
to the bound state, and conformational heterogeneity in data.
Collapse
Affiliation(s)
- Pedro J Ballester
- European Bioinformatics Institute , Wellcome Trust Genome Campus, Hinxton - CB10 1SD, United Kingdom
| | | | | |
Collapse
|
39
|
Wang W, He W, Zhou X, Chen X. Optimization of molecular docking scores with support vector rank regression. Proteins 2013; 81:1386-98. [PMID: 23504920 DOI: 10.1002/prot.24282] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2012] [Revised: 01/29/2013] [Accepted: 02/26/2013] [Indexed: 01/16/2023]
Abstract
This work introduces the support vector rank regression (SVRR) algorithm for the optimization of molecular docking scores. Seven original docking scores reported by two docking software were integrated by the SVRR algorithm. The resulting SVRR scores showed an average of 12.1% improvement (59.5-66.7%) in binding conformation prediction tests to rank the correctly computed conformation in the first place, along with 16.7% RMSD improvement (2.5414 vs. 2.1162 Å) for the top ranked conformations. In compound library screening (LS) tests, an average of 46.3% improvement (18.2-26.6%) was also observed to rank the correct ligand in the first place. Furthermore, it was shown that SVRR scores trained with different example datasets, using different training strategies, all exhibited exceedingly consistent accuracies, suggesting that the SVRR algorithm is highly robust and generalizable. In contrast, using the same training datasets, traditional support vector classification and regression algorithms failed to improve comparably the accuracy of LS and conformation prediction. These results suggested that, with additional features to indicate the comparative fitness between computed binding conformations, the SVRR algorithm holds the potential to create a new category of more accurate integrative docking scores.
Collapse
Affiliation(s)
- Wei Wang
- State Key Laboratory of Plant Physiology and Biochemistry, Zhejiang University, Hangzhou 310058, People's Republic of China
| | | | | | | |
Collapse
|
40
|
Graphs and networks in chemical and biological informatics: past, present and future. Future Med Chem 2012; 4:2039-47. [DOI: 10.4155/fmc.12.128] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
Chemical and biological network analysis has recently garnered intense interest from the perspective of drug design and discovery. While graph theoretic concepts have a long history in chemistry – predating quantum mechanics – and graphical measures of chemical structures date back to the 1970s, it is only recently with the advent of public repositories of information and availability of high-throughput assays and computational resources that network analysis of large-scale chemical networks, such as protein–protein interaction networks, has become possible. Drug design and discovery are undergoing a paradigm shift, from the notion of ‘one target, one drug’ to a much more nuanced view that relies on multiple sources of information: genomic, proteomic, metabolomic and so on. This holistic view of drug design is an incredibly daunting undertaking still very much in its infancy. Here, we focus on current developments in graph- and network-centric approaches in chemical and biological informatics, with particular reference to applications in the fields of SAR modeling and drug design. Key insights from the past suggest a path forward via visualization and fusion of multiple sources of chemical network data.
Collapse
|
41
|
Ballester PJ, Mangold M, Howard NI, Robinson RLM, Abell C, Blumberger J, Mitchell JBO. Hierarchical virtual screening for the discovery of new molecular scaffolds in antibacterial hit identification. J R Soc Interface 2012; 9:3196-207. [PMID: 22933186 PMCID: PMC3481598 DOI: 10.1098/rsif.2012.0569] [Citation(s) in RCA: 55] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023] Open
Abstract
One of the initial steps of modern drug discovery is the identification of small organic molecules able to inhibit a target macromolecule of therapeutic interest. A small proportion of these hits are further developed into lead compounds, which in turn may ultimately lead to a marketed drug. A commonly used screening protocol used for this task is high-throughput screening (HTS). However, the performance of HTS against antibacterial targets has generally been unsatisfactory, with high costs and low rates of hit identification. Here, we present a novel computational methodology that is able to identify a high proportion of structurally diverse inhibitors by searching unusually large molecular databases in a time-, cost- and resource-efficient manner. This virtual screening methodology was tested prospectively on two versions of an antibacterial target (type II dehydroquinase from Mycobacterium tuberculosis and Streptomyces coelicolor), for which HTS has not provided satisfactory results and consequently practically all known inhibitors are derivatives of the same core scaffold. Overall, our protocols identified 100 new inhibitors, with calculated Ki ranging from 4 to 250 μM (confirmed hit rates are 60% and 62% against each version of the target). Most importantly, over 50 new active molecular scaffolds were discovered that underscore the benefits that a wide application of prospectively validated in silico screening tools is likely to bring to antibacterial hit identification.
Collapse
Affiliation(s)
- Pedro J Ballester
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK.
| | | | | | | | | | | | | |
Collapse
|
42
|
Sukumar N, Krein MP, Embrechts MJ. Predictive cheminformatics in drug discovery: statistical modeling for analysis of micro-array and gene expression data. Methods Mol Biol 2012; 910:165-94. [PMID: 22821597 DOI: 10.1007/978-1-61779-965-5_9] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023]
Abstract
The vast amounts of chemical and biological data available through robotic high-throughput assays and micro-array technologies require computational techniques for visualization, analysis, and predictive -modeling. Predictive cheminformatics and bioinformatics employ statistical methods to mine this data for hidden correlations and to retrieve molecules or genes with desirable biological activity from large databases, for the purpose of drug development. While many statistical methods are commonly employed and widely accessible, their proper use involves due consideration to data representation and preprocessing, model validation and domain of applicability estimation, similarity assessment, the nature of the structure-activity landscape, and model interpretation. This chapter seeks to review these considerations in light of the current state of the art in statistical modeling and to summarize the best practices in predictive cheminformatics.
Collapse
Affiliation(s)
- N Sukumar
- Rensselaer Exploratory Center for Cheminformatics Research and Department of Chemistry and Chemical Biology, Rensselaer Polytechnic Institute, Troy, NY, USA.
| | | | | |
Collapse
|
43
|
|
44
|
Wang JC, Lin JH, Chen CM, Perryman AL, Olson AJ. Robust scoring functions for protein-ligand interactions with quantum chemical charge models. J Chem Inf Model 2011; 51:2528-37. [PMID: 21932857 DOI: 10.1021/ci200220v] [Citation(s) in RCA: 41] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]
Abstract
Ordinary least-squares (OLS) regression has been used widely for constructing the scoring functions for protein-ligand interactions. However, OLS is very sensitive to the existence of outliers, and models constructed using it are easily affected by the outliers or even the choice of the data set. On the other hand, determination of atomic charges is regarded as of central importance, because the electrostatic interaction is known to be a key contributing factor for biomolecular association. In the development of the AutoDock4 scoring function, only OLS was conducted, and the simple Gasteiger method was adopted. It is therefore of considerable interest to see whether more rigorous charge models could improve the statistical performance of the AutoDock4 scoring function. In this study, we have employed two well-established quantum chemical approaches, namely the restrained electrostatic potential (RESP) and the Austin-model 1-bond charge correction (AM1-BCC) methods, to obtain atomic partial charges, and we have compared how different charge models affect the performance of AutoDock4 scoring functions. In combination with robust regression analysis and outlier exclusion, our new protein-ligand free energy regression model with AM1-BCC charges for ligands and Amber99SB charges for proteins achieve lowest root-mean-squared error of 1.637 kcal/mol for the training set of 147 complexes and 2.176 kcal/mol for the external test set of 1427 complexes. The assessment for binding pose prediction with the 100 external decoy sets indicates very high success rate of 87% with the criteria of predicted root-mean-squared deviation of less than 2 Å. The success rates and statistical performance of our robust scoring functions are only weakly class-dependent (hydrophobic, hydrophilic, or mixed).
Collapse
Affiliation(s)
- Jui-Chih Wang
- Institute of Biomedical Engineering, National Taiwan University, Taipei, Taiwan
| | | | | | | | | |
Collapse
|
45
|
Li L, Wang B, Meroueh SO. Support vector regression scoring of receptor-ligand complexes for rank-ordering and virtual screening of chemical libraries. J Chem Inf Model 2011; 51:2132-8. [PMID: 21728360 PMCID: PMC3209528 DOI: 10.1021/ci200078f] [Citation(s) in RCA: 69] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]
Abstract
The community structure-activity resource (CSAR) data sets are used to develop and test a support vector machine-based scoring function in regression mode (SVR). Two scoring functions (SVR-KB and SVR-EP) are derived with the objective of reproducing the trend of the experimental binding affinities provided within the two CSAR data sets. The features used to train SVR-KB are knowledge-based pairwise potentials, while SVR-EP is based on physicochemical properties. SVR-KB and SVR-EP were compared to seven other widely used scoring functions, including Glide, X-score, GoldScore, ChemScore, Vina, Dock, and PMF. Results showed that SVR-KB trained with features obtained from three-dimensional complexes of the PDBbind data set outperformed all other scoring functions, including best performing X-score, by nearly 0.1 using three correlation coefficients, namely Pearson, Spearman, and Kendall. It was interesting that higher performance in rank ordering did not translate into greater enrichment in virtual screening assessed using the 40 targets of the Directory of Useful Decoys (DUD). To remedy this situation, a variant of SVR-KB (SVR-KBD) was developed by following a target-specific tailoring strategy that we had previously employed to derive SVM-SP. SVR-KBD showed a much higher enrichment, outperforming all other scoring functions tested, and was comparable in performance to our previously derived scoring function SVM-SP.
Collapse
Affiliation(s)
- Liwei Li
- Department of Biochemistry and Molecular Biology, Indiana University School of Medicine, Indiana University, Indianapolis, Indiana, United States
| | | | | |
Collapse
|
46
|
Krein MP, Sukumar N. Exploration of the Topology of Chemical Spaces with Network Measures. J Phys Chem A 2011; 115:12905-18. [DOI: 10.1021/jp204022u] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023]
Affiliation(s)
- Michael P. Krein
- Rensselaer Exploratory Center for Cheminformatics Research, and Department of Chemistry & Chemical Biology, Rensselaer Polytechnic Institute, 110 Eighth Street, Troy, New York 12180, United States
| | - N. Sukumar
- Rensselaer Exploratory Center for Cheminformatics Research, and Department of Chemistry & Chemical Biology, Rensselaer Polytechnic Institute, 110 Eighth Street, Troy, New York 12180, United States
| |
Collapse
|
47
|
Bergeron C, Krein M, Moore G, Breneman CM, Bennett KP. Modeling Choices for Virtual Screening Hit Identification. Mol Inform 2011; 30:765-77. [PMID: 27467409 DOI: 10.1002/minf.201100092] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2011] [Accepted: 06/25/2011] [Indexed: 11/09/2022]
Abstract
Making suitable modeling choices is crucial for successful in silico drug design, and one of the most important of these is the proper extraction and curation of data from qHTS screens, and the use of optimized statistical learning methods to obtain valid models. More specifically, we aim to learn the top-1 % most potent compounds against a variety of targets in a procedure we call virtual screening hit identification (VISHID). To do so, we exploit quantitative high-throughput screens (qHTS) obtained from PubChem, descriptors derived from molecular structures, and support vector machines (SVM) for model generation. Our results illustrate how an appreciation of subtle issues underlying qHTS data extraction and the resulting SVM models created using these data can enhance the effectiveness of solutions and, in doing so, accelerate drug discovery.
Collapse
Affiliation(s)
- Charles Bergeron
- Department of Mathematical Sciences, Rensselaer Polytechnic Institute, 110 Eighth Street, Troy, New York, 12180 phone/fax: (518) 276-6414, (518) 276-4824. , .,Department of Chemistry and Chemical Biology, Rensselaer Polytechnic Institute, 110 Eighth Street, Troy, New York, 12180. , .,Department of Pharmaceutical Sciences, Albany College of Pharmacy and Health Sciences, 261 Mountain View Drive, Colchester, Vermont, 05446. ,
| | - Michael Krein
- Department of Chemistry and Chemical Biology, Rensselaer Polytechnic Institute, 110 Eighth Street, Troy, New York, 12180
| | - Gregory Moore
- Department of Mathematical Sciences, Rensselaer Polytechnic Institute, 110 Eighth Street, Troy, New York, 12180 phone/fax: (518) 276-6414, (518) 276-4824.,Syracuse North Campus, Bryant and Stratton College, 8687 Carling Road, Syracuse, New York, 13090
| | - Curt M Breneman
- Department of Chemistry and Chemical Biology, Rensselaer Polytechnic Institute, 110 Eighth Street, Troy, New York, 12180
| | - Kristin P Bennett
- Department of Mathematical Sciences, Rensselaer Polytechnic Institute, 110 Eighth Street, Troy, New York, 12180 phone/fax: (518) 276-6414, (518) 276-4824
| |
Collapse
|
48
|
Ballester PJ, Mitchell JBO. Comments on “Leave-Cluster-Out Cross-Validation Is Appropriate for Scoring Functions Derived from Diverse Protein Data Sets”: Significance for the Validation of Scoring Functions. J Chem Inf Model 2011; 51:1739-41. [DOI: 10.1021/ci200057e] [Citation(s) in RCA: 40] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023]
Affiliation(s)
- Pedro J. Ballester
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
| | - John B. O. Mitchell
- Biomedical Sciences Research Complex and EaStCHEM School of Chemistry, University of St. Andrews, North Haugh, St. Andrews, Fife KY16 9ST, United Kingdom
| |
Collapse
|
49
|
Kramer C, Gedeck P. Global Free Energy Scoring Functions Based on Distance-Dependent Atom-Type Pair Descriptors. J Chem Inf Model 2011; 51:707-20. [DOI: 10.1021/ci100473d] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023]
Affiliation(s)
- Christian Kramer
- Novartis Institutes for BioMedical Research, Novartis Pharma AG, Forum 1, Novartis Campus, CH-4056 Basel, Switzerland
| | - Peter Gedeck
- Novartis Institutes for BioMedical Research, Novartis Pharma AG, Forum 1, Novartis Campus, CH-4056 Basel, Switzerland
| |
Collapse
|