1
|
Zhang Y, Li S, Meng K, Sun S. Machine Learning for Sequence and Structure-Based Protein-Ligand Interaction Prediction. J Chem Inf Model 2024; 64:1456-1472. [PMID: 38385768 DOI: 10.1021/acs.jcim.3c01841] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/23/2024]
Abstract
Developing new drugs is too expensive and time -consuming. Accurately predicting the interaction between drugs and targets will likely change how the drug is discovered. Machine learning-based protein-ligand interaction prediction has demonstrated significant potential. In this paper, computational methods, focusing on sequence and structure to study protein-ligand interactions, are examined. Therefore, this paper starts by presenting an overview of the data sets applied in this area, as well as the various approaches applied for representing proteins and ligands. Then, sequence-based and structure-based classification criteria are subsequently utilized to categorize and summarize both the classical machine learning models and deep learning models employed in protein-ligand interaction studies. Moreover, the evaluation methods and interpretability of these models are proposed. Furthermore, delving into the diverse applications of protein-ligand interaction models in drug research is presented. Lastly, the current challenges and future directions in this field are addressed.
Collapse
Affiliation(s)
- Yunjiang Zhang
- Beijing Key Laboratory for Green Catalysis and Separation, The Faculty of Environment and Life, Beijing University of Technology, Beijing 100124, P. R. China
| | - Shuyuan Li
- Beijing Key Laboratory for Green Catalysis and Separation, The Faculty of Environment and Life, Beijing University of Technology, Beijing 100124, P. R. China
| | - Kong Meng
- Beijing Key Laboratory for Green Catalysis and Separation, The Faculty of Environment and Life, Beijing University of Technology, Beijing 100124, P. R. China
| | - Shaorui Sun
- Beijing Key Laboratory for Green Catalysis and Separation, The Faculty of Environment and Life, Beijing University of Technology, Beijing 100124, P. R. China
| |
Collapse
|
2
|
Gómez-Sacristán P, Simeon S, Tran-Nguyen VK, Patil S, Ballester PJ. Inactive-enriched machine-learning models exploiting patent data improve structure-based virtual screening for PDL1 dimerizers. J Adv Res 2024:S2090-1232(24)00037-7. [PMID: 38280715 DOI: 10.1016/j.jare.2024.01.024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2023] [Revised: 12/01/2023] [Accepted: 01/21/2024] [Indexed: 01/29/2024] Open
Abstract
INTRODUCTION Small-molecule Programmable Cell Death Protein 1/Programmable Death-Ligand 1 (PD1/PDL1) inhibition via PDL1 dimerization has the potential to lead to inexpensive drugs with better cancer patient outcomes and milder side effects. However, this therapeutic approach has proven challenging, with only one PDL1 dimerizer reaching early clinical trials so far. There is hence a need for fast and accurate methods to develop alternative PDL1 dimerizers. OBJECTIVES We aim to show that structure-based virtual screening (SBVS) based on PDL1-specific machine-learning (ML) scoring functions (SFs) is a powerful drug design tool for detecting PD1/PDL1 inhibitors via PDL1 dimerization. METHODS By incorporating the latest MLSF advances, we generated and evaluated PDL1-specific MLSFs (classifiers and inactive-enriched regressors) on two demanding test sets. RESULTS 60 PDL1-specific MLSFs (30 classifiers and 30 regressors) were generated. Our large-scale analysis provides highly predictive PDL1-specific MLSFs that benefitted from training with large volumes of docked inactives and enabling inactive-enriched regression. CONCLUSION PDL1-specific MLSFs strongly outperformed generic SFs of various types on this target and are released here without restrictions.
Collapse
Affiliation(s)
| | - Saw Simeon
- Centre de Recherche en Cancérologie de Marseille, Marseille 13009, France
| | | | - Sachin Patil
- NanoBio Laboratory, Widener University, Chester, PA 19013, USA
| | - Pedro J Ballester
- Department of Bioengineering, Imperial College London, London SW7 2AZ, UK.
| |
Collapse
|
3
|
Tran-Nguyen VK, Junaid M, Simeon S, Ballester PJ. A practical guide to machine-learning scoring for structure-based virtual screening. Nat Protoc 2023; 18:3460-3511. [PMID: 37845361 DOI: 10.1038/s41596-023-00885-w] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2022] [Accepted: 07/03/2023] [Indexed: 10/18/2023]
Abstract
Structure-based virtual screening (SBVS) via docking has been used to discover active molecules for a range of therapeutic targets. Chemical and protein data sets that contain integrated bioactivity information have increased both in number and in size. Artificial intelligence and, more concretely, its machine-learning (ML) branch, including deep learning, have effectively exploited these data sets to build scoring functions (SFs) for SBVS against targets with an atomic-resolution 3D model (e.g., generated by X-ray crystallography or predicted by AlphaFold2). Often outperforming their generic and non-ML counterparts, target-specific ML-based SFs represent the state of the art for SBVS. Here, we present a comprehensive and user-friendly protocol to build and rigorously evaluate these new SFs for SBVS. This protocol is organized into four sections: (i) using a public benchmark of a given target to evaluate an existing generic SF; (ii) preparing experimental data for a target from public repositories; (iii) partitioning data into a training set and a test set for subsequent target-specific ML modeling; and (iv) generating and evaluating target-specific ML SFs by using the prepared training-test partitions. All necessary code and input/output data related to three example targets (acetylcholinesterase, HMG-CoA reductase, and peroxisome proliferator-activated receptor-α) are available at https://github.com/vktrannguyen/MLSF-protocol , can be run by using a single computer within 1 week and make use of easily accessible software/programs (e.g., Smina, CNN-Score, RF-Score-VS and DeepCoy) and web resources. Our aim is to provide practical guidance on how to augment training data to enhance SBVS performance, how to identify the most suitable supervised learning algorithm for a data set, and how to build an SF with the highest likelihood of discovering target-active molecules within a given compound library.
Collapse
Affiliation(s)
| | - Muhammad Junaid
- Centre de Recherche en Cancérologie de Marseille, Marseille, France
| | - Saw Simeon
- Centre de Recherche en Cancérologie de Marseille, Marseille, France
| | | |
Collapse
|
4
|
Tran-Nguyen VK, Ballester PJ. Beware of Simple Methods for Structure-Based Virtual Screening: The Critical Importance of Broader Comparisons. J Chem Inf Model 2023; 63:1401-1405. [PMID: 36848585 PMCID: PMC10015451 DOI: 10.1021/acs.jcim.3c00218] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/01/2023]
Abstract
We discuss how data unbiasing and simple methods such as protein-ligand Interaction FingerPrint (IFP) can overestimate virtual screening performance. We also show that IFP is strongly outperformed by target-specific machine-learning scoring functions, which were not considered in a recent report concluding that simple methods were better than machine-learning scoring functions at virtual screening.
Collapse
Affiliation(s)
| | - Pedro J Ballester
- Department of Bioengineering, Imperial College London, London SW7 2AZ, U.K
| |
Collapse
|
5
|
Spiegel J, Senderowitz H. Towards an Enrichment Optimization Algorithm (EOA)-based Target Specific Docking Functions for Virtual Screening. Mol Inform 2022; 41:e2200034. [PMID: 35790469 PMCID: PMC9786651 DOI: 10.1002/minf.202200034] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2022] [Accepted: 07/05/2022] [Indexed: 12/30/2022]
Abstract
Docking-based virtual screening (VS) is a common starting point in many drug discovery projects. While ligand-based approaches may sometimes provide better results, the advantage of docking lies in its ability to provide reliable ligand binding modes and approximated binding free energies, two factors that are important for hit selection and optimization. Most docking programs were developed to be as general as possible and consequently their performances on specific targets may be sub-optimal. With this in mind, in this work we present a method for the development of target-specific scoring functions using our recently reported Enrichment Optimization Algorithm (EOA). EOA derives QSAR models in the form of multiple linear regression (MLR) equations by optimizing an enrichment-like metric. Since EOA requires target-specific active and inactive (or decoy) compounds, we retrieved such data for six targets from the DUD-E database, and used them to re-derive the weights associated with the components that make up GOLD's ChemPLP scoring function yielding target-specific, modified functions. We then used the original ChemPLP function in small-scale VS experiments on the six targets and subsequently rescored the resulting poses with the modified functions. In addition, we used the modified functions for compounds re-docking. We found that in many although not all cases, either rescoring the original ChemPLP poses or repeating the entire docking process with the modified functions, yielded better results in terms of AUC and EF1% , two metrics, common for the evaluation of VS performances. While work on additional datasets and docking tools is clearly required, we propose that the results obtained thus far hint to the potential benefits in using EOA-based optimization for the derivation of target-specific functions in the context of virtual screening. To this end, we discuss the downsides of the methods and how it could be improved.
Collapse
Affiliation(s)
- Jacob Spiegel
- Department of ChemistryBar-Ilan UniversityRamat-Gan5290002Israel
| | | |
Collapse
|
6
|
Artificial intelligence and machine-learning approaches in structure and ligand-based discovery of drugs affecting central nervous system. Mol Divers 2022; 27:959-985. [PMID: 35819579 DOI: 10.1007/s11030-022-10489-3] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2022] [Accepted: 06/21/2022] [Indexed: 12/11/2022]
Abstract
CNS disorders are indications with a very high unmet medical needs, relatively smaller number of available drugs, and a subpar satisfaction level among patients and caregiver. Discovery of CNS drugs is extremely expensive affair with its own unique challenges leading to extremely high attrition rates and low efficiency. With explosion of data in information age, there is hardly any aspect of life that has not been touched by data driven technologies such as artificial intelligence (AI) and machine learning (ML). Drug discovery is no exception, emergence of big data via genomic, proteomic, biological, and chemical technologies has driven pharmaceutical giants to collaborate with AI oriented companies to revolutionise drug discovery, with the goal of increasing the efficiency of the process. In recent years many examples of innovative applications of AI and ML techniques in CNS drug discovery has been reported. Research on therapeutics for diseases such as schizophrenia, Alzheimer's and Parkinsonism has been provided with a new direction and thrust from these developments. AI and ML has been applied to both ligand-based and structure-based drug discovery and design of CNS therapeutics. In this review, we have summarised the general aspects of AI and ML from the perspective of drug discovery followed by a comprehensive coverage of the recent developments in the applications of AI/ML techniques in CNS drug discovery.
Collapse
|
7
|
Gorostiola González M, Janssen APA, IJzerman AP, Heitman LH, van Westen GJP. Oncological drug discovery: AI meets structure-based computational research. Drug Discov Today 2022; 27:1661-1670. [PMID: 35301149 DOI: 10.1016/j.drudis.2022.03.005] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2021] [Revised: 01/22/2022] [Accepted: 03/09/2022] [Indexed: 02/08/2023]
Abstract
The integration of machine learning and structure-based methods has proven valuable in the past as a way to prioritize targets and compounds in early drug discovery. In oncological research, these methods can be highly beneficial in addressing the diversity of neoplastic diseases portrayed by the different hallmarks of cancer. Here, we review six use case scenarios for integrated computational methods, namely driver prediction, computational mutagenesis, (off)-target prediction, binding site prediction, virtual screening, and allosteric modulation analysis. We address the heterogeneity of integration approaches and individual methods, while acknowledging their current limitations and highlighting their potential to bring drugs for personalized oncological therapies to the market faster.
Collapse
Affiliation(s)
- Marina Gorostiola González
- Division of Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Leiden University, The Netherlands; Oncode Institute, Utrecht, The Netherlands
| | - Antonius P A Janssen
- Oncode Institute, Utrecht, The Netherlands; Molecular Physiology, Leiden Institute of Chemistry, Leiden University, The Netherlands
| | - Adriaan P IJzerman
- Division of Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Leiden University, The Netherlands
| | - Laura H Heitman
- Division of Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Leiden University, The Netherlands; Oncode Institute, Utrecht, The Netherlands
| | - Gerard J P van Westen
- Division of Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Leiden University, The Netherlands.
| |
Collapse
|
8
|
Can docking scoring functions guarantee success in virtual screening? VIRTUAL SCREENING AND DRUG DOCKING 2022. [DOI: 10.1016/bs.armc.2022.08.008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
9
|
Ricci-Lopez J, Aguila SA, Gilson MK, Brizuela CA. Improving Structure-Based Virtual Screening with Ensemble Docking and Machine Learning. J Chem Inf Model 2021; 61:5362-5376. [PMID: 34652141 DOI: 10.1021/acs.jcim.1c00511] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
One of the main challenges of structure-based virtual screening (SBVS) is the incorporation of the receptor's flexibility, as its explicit representation in every docking run implies a high computational cost. Therefore, a common alternative to include the receptor's flexibility is the approach known as ensemble docking. Ensemble docking consists of using a set of receptor conformations and performing the docking assays over each of them. However, there is still no agreement on how to combine the ensemble docking results to obtain the final ligand ranking. A common choice is to use consensus strategies to aggregate the ensemble docking scores, but these strategies exhibit slight improvement regarding the single-structure approach. Here, we claim that using machine learning (ML) methodologies over the ensemble docking results could improve the predictive power of SBVS. To test this hypothesis, four proteins were selected as study cases: CDK2, FXa, EGFR, and HSP90. Protein conformational ensembles were built from crystallographic structures, whereas the evaluated compound library comprised up to three benchmarking data sets (DUD, DEKOIS 2.0, and CSAR-2012) and cocrystallized molecules. Ensemble docking results were processed through 30 repetitions of 4-fold cross-validation to train and validate two ML classifiers: logistic regression and gradient boosting trees. Our results indicate that the ML classifiers significantly outperform traditional consensus strategies and even the best performance case achieved with single-structure docking. We provide statistical evidence that supports the effectiveness of ML to improve the ensemble docking performance.
Collapse
Affiliation(s)
- Joel Ricci-Lopez
- Centro de Investigación Científica y de Educación Superior de Ensenada (CICESE), Ensenada, Baja California C.P. 22860, Mexico.,Centro de Nanociencias y Nanotecnología, Universidad Nacional Autónoma de México (UNAM), Ensenada, Baja California C.P. 22860, Mexico
| | - Sergio A Aguila
- Centro de Nanociencias y Nanotecnología, Universidad Nacional Autónoma de México (UNAM), Ensenada, Baja California C.P. 22860, Mexico
| | - Michael K Gilson
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, La Jolla, San Diego, California 92093, United States
| | - Carlos A Brizuela
- Centro de Investigación Científica y de Educación Superior de Ensenada (CICESE), Ensenada, Baja California C.P. 22860, Mexico
| |
Collapse
|
10
|
Xiong G, Shen C, Yang Z, Jiang D, Liu S, Lu A, Chen X, Hou T, Cao D. Featurization strategies for protein–ligand interactions and their applications in scoring function development. WIRES COMPUTATIONAL MOLECULAR SCIENCE 2021. [DOI: 10.1002/wcms.1567] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Affiliation(s)
- Guoli Xiong
- Xiangya School of Pharmaceutical Sciences Central South University Changsha China
| | - Chao Shen
- Hangzhou Institute of Innovative Medicine, College of Pharmaceutical Sciences Zhejiang University Hangzhou China
| | - Ziyi Yang
- Xiangya School of Pharmaceutical Sciences Central South University Changsha China
| | - Dejun Jiang
- Hangzhou Institute of Innovative Medicine, College of Pharmaceutical Sciences Zhejiang University Hangzhou China
- College of Computer Science and Technology Zhejiang University Hangzhou China
| | - Shao Liu
- Department of Pharmacy Xiangya Hospital, Central South University Changsha China
| | - Aiping Lu
- Institute for Advancing Translational Medicine in Bone & Joint Diseases, School of Chinese Medicine Hong Kong Baptist University Hong Kong SAR China
| | - Xiang Chen
- Department of Dermatology, Hunan Engineering Research Center of Skin Health and Disease, Hunan Key Laboratory of Skin Cancer and Psoriasis Xiangya Hospital, Central South University Changsha China
| | - Tingjun Hou
- Hangzhou Institute of Innovative Medicine, College of Pharmaceutical Sciences Zhejiang University Hangzhou China
| | - Dongsheng Cao
- Xiangya School of Pharmaceutical Sciences Central South University Changsha China
- Institute for Advancing Translational Medicine in Bone & Joint Diseases, School of Chinese Medicine Hong Kong Baptist University Hong Kong SAR China
| |
Collapse
|
11
|
Kashyap K, Siddiqi MI. Recent trends in artificial intelligence-driven identification and development of anti-neurodegenerative therapeutic agents. Mol Divers 2021; 25:1517-1539. [PMID: 34282519 DOI: 10.1007/s11030-021-10274-8] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2021] [Accepted: 07/05/2021] [Indexed: 12/12/2022]
Abstract
Neurological disorders affect various aspects of life. Finding drugs for the central nervous system is a very challenging and complex task due to the involvement of the blood-brain barrier, P-glycoprotein, and the drug's high attrition rates. The availability of big data present in online databases and resources has enabled the emergence of artificial intelligence techniques including machine learning to analyze, process the data, and predict the unknown data with high efficiency. The use of these modern techniques has revolutionized the whole drug development paradigm, with an unprecedented acceleration in the central nervous system drug discovery programs. Also, the new deep learning architectures proposed in many recent works have given a better understanding of how artificial intelligence can tackle big complex problems that arose due to central nervous system disorders. Therefore, the present review provides comprehensive and up-to-date information on machine learning/artificial intelligence-triggered effort in the brain care domain. In addition, a brief overview is presented on machine learning algorithms and their uses in structure-based drug design, ligand-based drug design, ADMET prediction, de novo drug design, and drug repurposing. Lastly, we conclude by discussing the major challenges and limitations posed and how they can be tackled in the future by using these modern machine learning/artificial intelligence approaches.
Collapse
Affiliation(s)
- Kushagra Kashyap
- Academy of Scientific and Innovative Research (AcSIR), CSIR-Central Drug Research Institute (CSIR-CDRI) Campus, Lucknow, India.,Molecular and Structural Biology Division, CSIR-Central Drug Research Institute (CSIR-CDRI), Sector 10, Jankipuram Extension, Sitapur Road, Lucknow, 226031, India
| | - Mohammad Imran Siddiqi
- Academy of Scientific and Innovative Research (AcSIR), CSIR-Central Drug Research Institute (CSIR-CDRI) Campus, Lucknow, India. .,Molecular and Structural Biology Division, CSIR-Central Drug Research Institute (CSIR-CDRI), Sector 10, Jankipuram Extension, Sitapur Road, Lucknow, 226031, India.
| |
Collapse
|
12
|
Qin T, Zhu Z, Wang XS, Xia J, Wu S. Computational representations of protein-ligand interfaces for structure-based virtual screening. Expert Opin Drug Discov 2021; 16:1175-1192. [PMID: 34011222 DOI: 10.1080/17460441.2021.1929921] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
Introduction: Structure-based virtual screening (SBVS) is an essential strategy for hit identification. SBVS primarily uses molecular docking, which exploits the protein-ligand binding mode and associated affinity score for compound ranking. Previous studies have shown that computational representation of protein-ligand interfaces and the later establishment of machine learning models are efficacious in improving the accuracy of SBVS.Areas covered: The authors review the computational methods for representing protein-ligand interfaces, which include the traditional ones that use deliberately designed fingerprints and descriptors and the more recent methods that automatically extract features with deep learning. The effects of these methods on the performance of machine learning models are briefly discussed. Additionally, case studies that applied various computational representations to machine learning are cited with remarks.Expert opinion: It has become a trend to extract binding features automatically by deep learning, which uses a completely end-to-end representation. However, there is still plenty of scope for improvement . The interpretability of deep-learning models, the organization of data management, the quantity and quality of available data, and the optimization of hyperparameters could impact the accuracy of feature extraction. In addition, other important structural factors such as water molecules and protein flexibility should be considered.
Collapse
Affiliation(s)
- Tong Qin
- State Key Laboratory of Bioactive Substance and Function of Natural Medicines, Department of New Drug Research and Development, Institute of Materia Medica, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Zihao Zhu
- State Key Laboratory of Bioactive Substance and Function of Natural Medicines, Department of New Drug Research and Development, Institute of Materia Medica, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Xiang Simon Wang
- Artificial Intelligence and Drug Discovery Core Laboratory for District of Columbia Center for AIDS Research (DC CFAR), Department of Pharmaceutical Sciences, College of Pharmacy, Howard University, U.S.A
| | - Jie Xia
- State Key Laboratory of Bioactive Substance and Function of Natural Medicines, Department of New Drug Research and Development, Institute of Materia Medica, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Song Wu
- State Key Laboratory of Bioactive Substance and Function of Natural Medicines, Department of New Drug Research and Development, Institute of Materia Medica, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| |
Collapse
|
13
|
Ghislat G, Rahman T, Ballester PJ. Recent progress on the prospective application of machine learning to structure-based virtual screening. Curr Opin Chem Biol 2021; 65:28-34. [PMID: 34052776 DOI: 10.1016/j.cbpa.2021.04.009] [Citation(s) in RCA: 34] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2021] [Revised: 04/13/2021] [Accepted: 04/23/2021] [Indexed: 12/30/2022]
Abstract
As more bioactivity and protein structure data become available, scoring functions (SFs) using machine learning (ML) to leverage these data sets continue to gain further accuracy and broader applicability. Advances in our understanding of the optimal ways to train and evaluate these ML-based SFs have introduced further improvements. One of these advances is how to select the most suitable decoys (molecules assumed inactive) to train or test an ML-based SF on a given target. We also review the latest applications of ML-based SFs for prospective structure-based virtual screening (SBVS), with a focus on the observed improvement over those using classical SFs. Finally, we provide recommendations for future prospective SBVS studies based on the findings of recent methodological studies.
Collapse
Affiliation(s)
- Ghita Ghislat
- U1104, CNRS UMR7280, Centre D'Immunologie de Marseille-Luminy, Inserm, Marseille, France
| | - Taufiq Rahman
- Department of Pharmacology, University of Cambridge, Cambridge, CB2 1PD, UK
| | - Pedro J Ballester
- Centre de Recherche en Cancérologie de Marseille (CRCM), Inserm, U1068, Marseille, F-13009, France; CNRS, UMR7258, Marseille, F-13009, France; Institut Paoli-Calmettes, Marseille, F-13009, France; Aix-Marseille University, UM 105, F-13284, Marseille, France.
| |
Collapse
|
14
|
Vatansever S, Schlessinger A, Wacker D, Kaniskan HÜ, Jin J, Zhou M, Zhang B. Artificial intelligence and machine learning-aided drug discovery in central nervous system diseases: State-of-the-arts and future directions. Med Res Rev 2021; 41:1427-1473. [PMID: 33295676 PMCID: PMC8043990 DOI: 10.1002/med.21764] [Citation(s) in RCA: 102] [Impact Index Per Article: 34.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2020] [Revised: 10/30/2020] [Accepted: 11/20/2020] [Indexed: 01/11/2023]
Abstract
Neurological disorders significantly outnumber diseases in other therapeutic areas. However, developing drugs for central nervous system (CNS) disorders remains the most challenging area in drug discovery, accompanied with the long timelines and high attrition rates. With the rapid growth of biomedical data enabled by advanced experimental technologies, artificial intelligence (AI) and machine learning (ML) have emerged as an indispensable tool to draw meaningful insights and improve decision making in drug discovery. Thanks to the advancements in AI and ML algorithms, now the AI/ML-driven solutions have an unprecedented potential to accelerate the process of CNS drug discovery with better success rate. In this review, we comprehensively summarize AI/ML-powered pharmaceutical discovery efforts and their implementations in the CNS area. After introducing the AI/ML models as well as the conceptualization and data preparation, we outline the applications of AI/ML technologies to several key procedures in drug discovery, including target identification, compound screening, hit/lead generation and optimization, drug response and synergy prediction, de novo drug design, and drug repurposing. We review the current state-of-the-art of AI/ML-guided CNS drug discovery, focusing on blood-brain barrier permeability prediction and implementation into therapeutic discovery for neurological diseases. Finally, we discuss the major challenges and limitations of current approaches and possible future directions that may provide resolutions to these difficulties.
Collapse
Affiliation(s)
- Sezen Vatansever
- Department of Genetics and Genomic SciencesIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Mount Sinai Center for Transformative Disease ModelingIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Icahn Institute for Data Science and Genomic TechnologyIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
| | - Avner Schlessinger
- Department of Pharmacological SciencesIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Mount Sinai Center for Therapeutics DiscoveryIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
| | - Daniel Wacker
- Department of Pharmacological SciencesIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Mount Sinai Center for Therapeutics DiscoveryIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Department of NeuroscienceIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
| | - H. Ümit Kaniskan
- Department of Pharmacological SciencesIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Mount Sinai Center for Therapeutics DiscoveryIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Department of Oncological Sciences, Tisch Cancer InstituteIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
| | - Jian Jin
- Department of Pharmacological SciencesIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Mount Sinai Center for Therapeutics DiscoveryIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Department of Oncological Sciences, Tisch Cancer InstituteIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
| | - Ming‐Ming Zhou
- Department of Pharmacological SciencesIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Department of Oncological Sciences, Tisch Cancer InstituteIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
| | - Bin Zhang
- Department of Genetics and Genomic SciencesIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Mount Sinai Center for Transformative Disease ModelingIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Icahn Institute for Data Science and Genomic TechnologyIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Department of Pharmacological SciencesIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
| |
Collapse
|
15
|
Ji B, He X, Zhai J, Zhang Y, Man VH, Wang J. Machine learning on ligand-residue interaction profiles to significantly improve binding affinity prediction. Brief Bioinform 2021; 22:6184410. [PMID: 33758923 DOI: 10.1093/bib/bbab054] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2020] [Revised: 01/06/2021] [Accepted: 02/02/2021] [Indexed: 01/01/2023] Open
Abstract
Structure-based virtual screenings (SBVSs) play an important role in drug discovery projects. However, it is still a challenge to accurately predict the binding affinity of an arbitrary molecule binds to a drug target and prioritize top ligands from an SBVS. In this study, we developed a novel method, using ligand-residue interaction profiles (IPs) to construct machine learning (ML)-based prediction models, to significantly improve the screening performance in SBVSs. Such a kind of the prediction model is called an IP scoring function (IP-SF). We systematically investigated how to improve the performance of IP-SFs from many perspectives, including the sampling methods before interaction energy calculation and different ML algorithms. Using six drug targets with each having hundreds of known ligands, we conducted a critical evaluation on the developed IP-SFs. The IP-SFs employing a gradient boosting decision tree (GBDT) algorithm in conjunction with the MIN + GB simulation protocol achieved the best overall performance. Its scoring power, ranking power and screening power significantly outperformed the Glide SF. First, compared with Glide, the average values of mean absolute error and root mean square error of GBDT/MIN + GB decreased about 38 and 36%, respectively. Second, the mean values of squared correlation coefficient and predictive index increased about 225 and 73%, respectively. Third, more encouragingly, the average value of the areas under the curve of receiver operating characteristic for six targets by GBDT, 0.87, is significantly better than that by Glide, which is only 0.71. Thus, we expected IP-SFs to have broad and promising applications in SBVSs.
Collapse
Affiliation(s)
- Beihong Ji
- Department of Pharmaceutical Sciences and Computational Chemical Genomics Screening Center, School of Pharmacy, University of Pittsburgh, Pittsburgh, PA 15261, USA
| | - Xibing He
- Department of Pharmaceutical Sciences and Computational Chemical Genomics Screening Center, School of Pharmacy, University of Pittsburgh, Pittsburgh, PA 15261, USA
| | - Jingchen Zhai
- Department of Pharmaceutical Sciences and Computational Chemical Genomics Screening Center, School of Pharmacy, University of Pittsburgh, Pittsburgh, PA 15261, USA
| | - Yuzhao Zhang
- Department of Pharmaceutical Sciences and Computational Chemical Genomics Screening Center, School of Pharmacy, University of Pittsburgh, Pittsburgh, PA 15261, USA
| | - Viet Hoang Man
- Department of Pharmaceutical Sciences and Computational Chemical Genomics Screening Center, School of Pharmacy, University of Pittsburgh, Pittsburgh, PA 15261, USA
| | - Junmei Wang
- Department of Pharmaceutical Sciences and Computational Chemical Genomics Screening Center, School of Pharmacy, University of Pittsburgh, Pittsburgh, PA 15261, USA
| |
Collapse
|
16
|
Selecting machine-learning scoring functions for structure-based virtual screening. DRUG DISCOVERY TODAY. TECHNOLOGIES 2020; 32-33:81-87. [PMID: 33386098 DOI: 10.1016/j.ddtec.2020.09.001] [Citation(s) in RCA: 27] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/01/2020] [Revised: 09/02/2020] [Accepted: 09/07/2020] [Indexed: 12/27/2022]
Abstract
Interest in docking technologies has grown parallel to the ever increasing number and diversity of 3D models for macromolecular therapeutic targets. Structure-Based Virtual Screening (SBVS) aims at leveraging these experimental structures to discover the necessary starting points for the drug discovery process. It is now established that Machine Learning (ML) can strongly enhance the predictive accuracy of scoring functions for SBVS by exploiting large datasets from targets, molecules and their associations. However, with greater choice, the question of which ML-based scoring function is the most suitable for prospective use on a given target has gained importance. Here we analyse two approaches to select an existing scoring function for the target along with a third approach consisting in generating a scoring function tailored to the target. These analyses required discussing the limitations of popular SBVS benchmarks, the alternatives to benchmark scoring functions for SBVS and how to generate them or use them using freely-available software.
Collapse
|
17
|
Cournia Z, Allen BK, Beuming T, Pearlman DA, Radak BK, Sherman W. Rigorous Free Energy Simulations in Virtual Screening. J Chem Inf Model 2020; 60:4153-4169. [PMID: 32539386 DOI: 10.1021/acs.jcim.0c00116] [Citation(s) in RCA: 99] [Impact Index Per Article: 24.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Abstract
Virtual high throughput screening (vHTS) in drug discovery is a powerful approach to identify hits: when applied successfully, it can be much faster and cheaper than experimental high-throughput screening approaches. However, mainstream vHTS tools have significant limitations: ligand-based methods depend on knowledge of existing chemical matter, while structure-based tools such as docking involve significant approximations that limit their accuracy. Recent advances in scientific methods coupled with dramatic speedups in computational processing with GPUs make this an opportune time to consider the role of more rigorous methods that could improve the predictive power of vHTS workflows. In this Perspective, we assert that alchemical binding free energy methods using all-atom molecular dynamics simulations have matured to the point where they can be applied in virtual screening campaigns as a final scoring stage to prioritize the top molecules for experimental testing. Specifically, we propose that alchemical absolute binding free energy (ABFE) calculations offer the most direct and computationally efficient approach within a rigorous statistical thermodynamic framework for computing binding energies of diverse molecules, as is required for virtual screening. ABFE calculations are particularly attractive for drug discovery at this point in time, where the confluence of large-scale genomics data and insights from chemical biology have unveiled a large number of promising disease targets for which no small molecule binders are known, precluding ligand-based approaches, and where traditional docking approaches have foundered to find progressible chemical matter.
Collapse
Affiliation(s)
- Zoe Cournia
- Biomedical Research Foundation, Academy of Athens, 4 Soranou Ephessiou, 11527 Athens, Greece
| | - Bryce K Allen
- Silicon Therapeutics, 300 A Street, Boston, Massachusetts 02210, United States
| | - Thijs Beuming
- Latham BioPharm Group, Cambridge, Massachusetts 02142, United States
| | - David A Pearlman
- QSimulate Incorporated, 625 Massachusetts Avenue, Cambridge, Massachusetts 02139, United States
| | - Brian K Radak
- Silicon Therapeutics, 300 A Street, Boston, Massachusetts 02210, United States
| | - Woody Sherman
- Silicon Therapeutics, 300 A Street, Boston, Massachusetts 02210, United States
| |
Collapse
|
18
|
Xiong GL, Ye WL, Shen C, Lu AP, Hou TJ, Cao DS. Improving structure-based virtual screening performance via learning from scoring function components. Brief Bioinform 2020; 22:5851268. [PMID: 32496540 DOI: 10.1093/bib/bbaa094] [Citation(s) in RCA: 26] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2020] [Revised: 03/30/2020] [Accepted: 04/28/2020] [Indexed: 11/12/2022] Open
Abstract
Scoring functions (SFs) based on complex machine learning (ML) algorithms have gradually emerged as a promising alternative to overcome the weaknesses of classical SFs. However, extensive efforts have been devoted to the development of SFs based on new protein-ligand interaction representations and advanced alternative ML algorithms instead of the energy components obtained by the decomposition of existing SFs. Here, we propose a new method named energy auxiliary terms learning (EATL), in which the scoring components are extracted and used as the input for the development of three levels of ML SFs including EATL SFs, docking-EATL SFs and comprehensive SFs with ascending VS performance. The EATL approach not only outperforms classical SFs for the absolute performance (ROC) and initial enrichment (BEDROC) but also yields comparable performance compared with other advanced ML-based methods on the diverse subset of Directory of Useful Decoys: Enhanced (DUD-E). The test on the relatively unbiased actives as decoys (AD) dataset also proved the effectiveness of EATL. Furthermore, the idea of learning from SF components to yield improved screening power can also be extended to other docking programs and SFs available.
Collapse
|
19
|
Li H, Sze K, Lu G, Ballester PJ. Machine‐learning scoring functions for structure‐based virtual screening. WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL MOLECULAR SCIENCE 2020. [DOI: 10.1002/wcms.1478] [Citation(s) in RCA: 37] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Affiliation(s)
- Hongjian Li
- Cancer Research Center of Marseille (INSERM U1068, Institut Paoli‐Calmettes, Aix‐Marseille Université UM105, CNRS UMR7258) Marseille France
- CUHK‐SDU Joint Laboratory on Reproductive Genetics, School of Biomedical Sciences Chinese University of Hong Kong Shatin Hong Kong
| | - Kam‐Heung Sze
- CUHK‐SDU Joint Laboratory on Reproductive Genetics, School of Biomedical Sciences Chinese University of Hong Kong Shatin Hong Kong
| | - Gang Lu
- CUHK‐SDU Joint Laboratory on Reproductive Genetics, School of Biomedical Sciences Chinese University of Hong Kong Shatin Hong Kong
| | - Pedro J. Ballester
- Cancer Research Center of Marseille (INSERM U1068, Institut Paoli‐Calmettes, Aix‐Marseille Université UM105, CNRS UMR7258) Marseille France
| |
Collapse
|
20
|
Shen C, Hu Y, Wang Z, Zhang X, Zhong H, Wang G, Yao X, Xu L, Cao D, Hou T. Can machine learning consistently improve the scoring power of classical scoring functions? Insights into the role of machine learning in scoring functions. Brief Bioinform 2020; 22:497-514. [PMID: 31982914 DOI: 10.1093/bib/bbz173] [Citation(s) in RCA: 38] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2019] [Revised: 12/10/2019] [Accepted: 11/21/2019] [Indexed: 01/12/2023] Open
Abstract
How to accurately estimate protein-ligand binding affinity remains a key challenge in computer-aided drug design (CADD). In many cases, it has been shown that the binding affinities predicted by classical scoring functions (SFs) cannot correlate well with experimentally measured biological activities. In the past few years, machine learning (ML)-based SFs have gradually emerged as potential alternatives and outperformed classical SFs in a series of studies. In this study, to better recognize the potential of classical SFs, we have conducted a comparative assessment of 25 commonly used SFs. Accordingly, the scoring power was systematically estimated by using the state-of-the-art ML methods that replaced the original multiple linear regression method to refit individual energy terms. The results show that the newly-developed ML-based SFs consistently performed better than classical ones. In particular, gradient boosting decision tree (GBDT) and random forest (RF) achieved the best predictions in most cases. The newly-developed ML-based SFs were also tested on another benchmark modified from PDBbind v2007, and the impacts of structural and sequence similarities were evaluated. The results indicated that the superiority of the ML-based SFs could be fully guaranteed when sufficient similar targets were contained in the training set. Moreover, the effect of the combinations of features from multiple SFs was explored, and the results indicated that combining NNscore2.0 with one to four other classical SFs could yield the best scoring power. However, it was not applicable to derive a generic target-specific SF or SF combination.
Collapse
|
21
|
Exploring fragment-based target-specific ranking protocol with machine learning on cathepsin S. J Comput Aided Mol Des 2019; 33:1095-1105. [PMID: 31729618 DOI: 10.1007/s10822-019-00247-3] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2019] [Accepted: 11/02/2019] [Indexed: 12/12/2022]
Abstract
Cathepsin S (CatS), a member of cysteine cathepsin proteases, has been well studied due to its significant role in many pathological processes, including arthritis, cancer and cardiovascular diseases. CatS inhibitors have been included in D3R-GC3 for both docking pose prediction and affinity ranking, and in D3R-GC4 for binding affinity ranking. The difficulties posed by CatS inhibitors in D3R mainly come from three aspects: large size, high flexibility and similar chemical structures. We have participated in GC4; our best submitted model, which employs a similarity-based alignment docking and Vina scoring protocol, yielded Kendall's τ of 0.23 for 459 binders in GC4. In our further explorations with machine learning, by curating a CatS specific training set, adopting a similarity-based constrained docking method as well as an arm-based fragmentation strategy which can describe large inhibitors in a locality-sensitive fashion, our best structure-based ranking protocol can achieve Kendall's τ of 0.52 for all binders in GC4. In this exploration process, we have demonstrated the importance of training data, docking approaches and fragmentation strategies in inhibitor-ranking protocol development with machine learning.
Collapse
|
22
|
Wang D, Cui C, Ding X, Xiong Z, Zheng M, Luo X, Jiang H, Chen K. Improving the Virtual Screening Ability of Target-Specific Scoring Functions Using Deep Learning Methods. Front Pharmacol 2019; 10:924. [PMID: 31507420 PMCID: PMC6713720 DOI: 10.3389/fphar.2019.00924] [Citation(s) in RCA: 26] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2019] [Accepted: 07/22/2019] [Indexed: 01/29/2023] Open
Abstract
Scoring functions play an important role in structure-based virtual screening. It has been widely accepted that target-specific scoring functions (TSSFs) may achieve better performance compared with universal scoring functions in actual drug research and development processes. A method that can effectively construct TSSFs will be of great value to drug design and discovery. In this work, we proposed a deep learning–based model named DeepScore to achieve this goal. DeepScore adopted the form of PMF scoring function to calculate protein–ligand binding affinity. However, different from PMF scoring function, in DeepScore, the score for each protein–ligand atom pair was calculated using a feedforward neural network. Our model significantly outperformed Glide Gscore on validation data set DUD-E. The average ROC-AUC on 102 targets was 0.98. We also combined Gscore and DeepScore together using a consensus method and put forward a consensus model named DeepScoreCS. The comparison results showed that DeepScore outperformed other machine learning–based TSSFs building methods. Furthermore, we presented a strategy to visualize the prediction of DeepScore. All of these results clearly demonstrated that DeepScore would be a useful model in constructing TSSFs and represented a novel way incorporating deep learning and drug design.
Collapse
Affiliation(s)
- Dingyan Wang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China.,College of Pharmacy, University of Chinese Academy of Sciences, Beijing, China
| | - Chen Cui
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China.,College of Pharmacy, University of Chinese Academy of Sciences, Beijing, China
| | - Xiaoyu Ding
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China.,College of Pharmacy, University of Chinese Academy of Sciences, Beijing, China
| | - Zhaoping Xiong
- School of Life Science and Technology, ShanghaiTech University, Shanghai, China
| | - Mingyue Zheng
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
| | - Xiaomin Luo
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
| | - Hualiang Jiang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China.,School of Life Science and Technology, ShanghaiTech University, Shanghai, China
| | - Kaixian Chen
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China.,School of Life Science and Technology, ShanghaiTech University, Shanghai, China
| |
Collapse
|
23
|
Sato A, Tanimura N, Honma T, Konagaya A. Significance of Data Selection in Deep Learning for Reliable Binding Mode Prediction of Ligands in the Active Site of CYP3A4. Chem Pharm Bull (Tokyo) 2019; 67:1183-1190. [PMID: 31423003 DOI: 10.1248/cpb.c19-00443] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
For rational drug design, it is essential to predict the binding mode of protein-ligand complexes. Although various machine learning-based models have been reported that use convolutional neural networks (deep learning) to predict binding modes from three-dimensional structures, there are few detailed reports on how best to construct and use datasets. Here, we examined how different datasets affected the prediction of the binding mode of CYP3A4 by a three-dimensional neural network when the number of crystal structures for the target protein was limited. We used four different training datasets: one large, general dataset containing various protein complexes and three smaller, more specific datasets containing complexes with CYP3A4-like pockets, complexes with CYP3A4-binding ligands, and complexes with CYP protein family members. We then trained models with different combinations of datasets with or without subsequent fine-tuning and evaluated the binding mode prediction performance of each model. The best receiver operating characteristic (ROC) area under the curve (AUC) model with respect to area under the receiver operating characteristic curve was obtained by training with a combination of the general protein and CYP family datasets. However, the ROC AUC-recall balanced model was obtained by training with this combination of datasets followed by fine-tuning with the CYP3A4-binding ligands dataset. Our results suggest that datasets that balance protein functionality and data size are important for optimizing binding mode prediction performance. In addition, datasets with large median binding pocket sizes may be important for the binding mode prediction specifically of CYP3A4.
Collapse
Affiliation(s)
- Atsuko Sato
- School of Computing, Department of Computer Science, Tokyo Institute of Technology
| | - Naoki Tanimura
- Science Solutions Division, Mizuho Information & Research Institute, Inc
| | - Teruki Honma
- School of Computing, Department of Computer Science, Tokyo Institute of Technology.,Center for Biosystems Dynamics Research, RIKEN.,Medical Sciences Innovation Hub Program, RIKEN
| | - Akihiko Konagaya
- School of Computing, Department of Computer Science, Tokyo Institute of Technology
| |
Collapse
|
24
|
DLIGAND2: an improved knowledge-based energy function for protein-ligand interactions using the distance-scaled, finite, ideal-gas reference state. J Cheminform 2019; 11:52. [PMID: 31392430 PMCID: PMC6686496 DOI: 10.1186/s13321-019-0373-4] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2019] [Accepted: 07/27/2019] [Indexed: 12/14/2022] Open
Abstract
Performance of structure-based molecular docking largely depends on the accuracy of scoring functions. One important type of scoring functions are knowledge-based potentials derived from known three-dimensional structures of proteins and/or protein–ligand complex structures. This study seeks to improve a knowledge-based protein–ligand potential based on a distance-scale finite ideal-gas reference (DFIRE) state (DLIGAND) by expanding the representation of protein atoms from 13 mol2 atom types to 167 residue-specific atom types, and employing a recently updated dataset containing 12,450 monomer protein chains for training. We found that the updated version DLIGAND2 has a consistent improvement over DLIGAND in predicting binding affinities for either native complex structures or docking-generated poses. More importantly, DLIGAND2 has a 52% increase over DLIGAND in enrichment factors in top 1% predictions based on the DUD-E decoy set, and consistently improves over Autodock Vina and other statistical energy functions in all three benchmark tests. We further found that DLIGAND2 outperforms empirical and machine-learning methods compared for virtual screening on new targets that are not homologous to the DUD-E training set. Given the best performance as a parameter-free statistical potential and among the best in all performance measures, DLIGAND2 should be useful for re-assessing the poses generated by docking software, or acting as one term in other scoring functions. The program is available at https://github.com/sysu-yanglab/DLIGAND2.![]()
Collapse
|
25
|
Yang X, Wang Y, Byrne R, Schneider G, Yang S. Concepts of Artificial Intelligence for Computer-Assisted Drug Discovery. Chem Rev 2019; 119:10520-10594. [PMID: 31294972 DOI: 10.1021/acs.chemrev.8b00728] [Citation(s) in RCA: 351] [Impact Index Per Article: 70.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
Artificial intelligence (AI), and, in particular, deep learning as a subcategory of AI, provides opportunities for the discovery and development of innovative drugs. Various machine learning approaches have recently (re)emerged, some of which may be considered instances of domain-specific AI which have been successfully employed for drug discovery and design. This review provides a comprehensive portrayal of these machine learning techniques and of their applications in medicinal chemistry. After introducing the basic principles, alongside some application notes, of the various machine learning algorithms, the current state-of-the art of AI-assisted pharmaceutical discovery is discussed, including applications in structure- and ligand-based virtual screening, de novo drug design, physicochemical and pharmacokinetic property prediction, drug repurposing, and related aspects. Finally, several challenges and limitations of the current methods are summarized, with a view to potential future directions for AI-assisted drug discovery and design.
Collapse
Affiliation(s)
- Xin Yang
- State Key Laboratory of Biotherapy and Cancer Center, West China Hospital , Sichuan University , Chengdu , Sichuan 610041 , China
| | - Yifei Wang
- State Key Laboratory of Biotherapy and Cancer Center, West China Hospital , Sichuan University , Chengdu , Sichuan 610041 , China
| | - Ryan Byrne
- ETH Zurich , Department of Chemistry and Applied Biosciences , Vladimir-Prelog-Weg 4 , CH-8093 Zurich , Switzerland
| | - Gisbert Schneider
- ETH Zurich , Department of Chemistry and Applied Biosciences , Vladimir-Prelog-Weg 4 , CH-8093 Zurich , Switzerland
| | - Shengyong Yang
- State Key Laboratory of Biotherapy and Cancer Center, West China Hospital , Sichuan University , Chengdu , Sichuan 610041 , China
| |
Collapse
|
26
|
Shen C, Ding J, Wang Z, Cao D, Ding X, Hou T. From machine learning to deep learning: Advances in scoring functions for protein–ligand docking. WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL MOLECULAR SCIENCE 2019. [DOI: 10.1002/wcms.1429] [Citation(s) in RCA: 76] [Impact Index Per Article: 15.2] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Affiliation(s)
- Chao Shen
- Hangzhou Institute of Innovative Medicine, College of Pharmaceutical Sciences, Zhejiang University Hangzhou P. R. China
| | - Junjie Ding
- Beijing Institute of Pharmaceutical Chemistry Beijing P. R. China
| | - Zhe Wang
- Hangzhou Institute of Innovative Medicine, College of Pharmaceutical Sciences, Zhejiang University Hangzhou P. R. China
| | - Dongsheng Cao
- Xiangya School of Pharmaceutical Sciences, Central South University Changsha P. R. China
| | - Xiaoqin Ding
- Beijing Institute of Pharmaceutical Chemistry Beijing P. R. China
| | - Tingjun Hou
- Hangzhou Institute of Innovative Medicine, College of Pharmaceutical Sciences, Zhejiang University Hangzhou P. R. China
| |
Collapse
|
27
|
Li H, Peng J, Sidorov P, Leung Y, Leung KS, Wong MH, Lu G, Ballester PJ. Classical scoring functions for docking are unable to exploit large volumes of structural and interaction data. Bioinformatics 2019; 35:3989-3995. [DOI: 10.1093/bioinformatics/btz183] [Citation(s) in RCA: 41] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2018] [Revised: 02/04/2019] [Accepted: 03/13/2019] [Indexed: 12/15/2022] Open
Abstract
Abstract
Motivation
Studies have shown that the accuracy of random forest (RF)-based scoring functions (SFs), such as RF-Score-v3, increases with more training samples, whereas that of classical SFs, such as X-Score, does not. Nevertheless, the impact of the similarity between training and test samples on this matter has not been studied in a systematic manner. It is therefore unclear how these SFs would perform when only trained on protein-ligand complexes that are highly dissimilar or highly similar to the test set. It is also unclear whether SFs based on machine learning algorithms other than RF can also improve accuracy with increasing training set size and to what extent they learn from dissimilar or similar training complexes.
Results
We present a systematic study to investigate how the accuracy of classical and machine-learning SFs varies with protein-ligand complex similarities between training and test sets. We considered three types of similarity metrics, based on the comparison of either protein structures, protein sequences or ligand structures. Regardless of the similarity metric, we found that incorporating a larger proportion of similar complexes to the training set did not make classical SFs more accurate. In contrast, RF-Score-v3 was able to outperform X-Score even when trained on just 32% of the most dissimilar complexes, showing that its superior performance owes considerably to learning from dissimilar training complexes to those in the test set. In addition, we generated the first SF employing Extreme Gradient Boosting (XGBoost), XGB-Score, and observed that it also improves with training set size while outperforming the rest of SFs. Given the continuous growth of training datasets, the development of machine-learning SFs has become very appealing.
Availability and implementation
https://github.com/HongjianLi/MLSF
Supplementary information
Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Hongjian Li
- SDIVF R&D Centre, Hong Kong Science Park, Sha Tin, New Territories, Hong Kong
- CUHK-SDU Joint Laboratory on Reproductive Genetics School of Biomedical Sciences, The Chinese University of Hong Kong, Sha Tin, New Territories, Hong Kong
| | - Jiangjun Peng
- School of Mathematics and Statistics, Xi'an Jiaotong University, Xi’an, China
| | - Pavel Sidorov
- Cancer Research Center of Marseille CRCM, INSERM, Institut Paoli-Calmettes, Aix-Marseille University, CNRS, F-13009 Marseille, France
| | | | - Kwong-Sak Leung
- Institute of Future Cities
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Sha Tin, New Territories, Hong Kong
| | - Man-Hon Wong
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Sha Tin, New Territories, Hong Kong
| | - Gang Lu
- CUHK-SDU Joint Laboratory on Reproductive Genetics School of Biomedical Sciences, The Chinese University of Hong Kong, Sha Tin, New Territories, Hong Kong
| | - Pedro J Ballester
- Cancer Research Center of Marseille CRCM, INSERM, Institut Paoli-Calmettes, Aix-Marseille University, CNRS, F-13009 Marseille, France
| |
Collapse
|
28
|
Xing J, Lu W, Liu R, Wang Y, Xie Y, Zhang H, Shi Z, Jiang H, Liu YC, Chen K, Jiang H, Luo C, Zheng M. Machine-Learning-Assisted Approach for Discovering Novel Inhibitors Targeting Bromodomain-Containing Protein 4. J Chem Inf Model 2017. [DOI: 10.1021/acs.jcim.7b00098] [Citation(s) in RCA: 32] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Affiliation(s)
- Jing Xing
- Drug
Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China
- State
Key Laboratory of Natural and Biomimetic Drugs, Peking University, Xue
Yuan Road 38, Beijing 100191, China
- Department
of Pharmacy, University of Chinese Academy of Sciences, 19A Yuquan
Road, Beijing 100049, China
| | - Wenchao Lu
- Drug
Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China
- Department
of Pharmacy, University of Chinese Academy of Sciences, 19A Yuquan
Road, Beijing 100049, China
| | - Rongfeng Liu
- Shanghai ChemPartner Co., LTD., #5 Building, 998 Halei Road, Shanghai 201203, China
| | - Yulan Wang
- Drug
Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China
- Department
of Pharmacy, University of Chinese Academy of Sciences, 19A Yuquan
Road, Beijing 100049, China
| | - Yiqian Xie
- Drug
Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China
- Department
of Pharmacy, University of Chinese Academy of Sciences, 19A Yuquan
Road, Beijing 100049, China
| | - Hao Zhang
- Drug
Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China
- Department
of Pharmacy, University of Chinese Academy of Sciences, 19A Yuquan
Road, Beijing 100049, China
| | - Zhe Shi
- Shanghai ChemPartner Co., LTD., #5 Building, 998 Halei Road, Shanghai 201203, China
| | - Hao Jiang
- Drug
Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China
- Department
of Pharmacy, University of Chinese Academy of Sciences, 19A Yuquan
Road, Beijing 100049, China
| | - Yu-Chih Liu
- Shanghai ChemPartner Co., LTD., #5 Building, 998 Halei Road, Shanghai 201203, China
| | - Kaixian Chen
- Drug
Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China
| | - Hualiang Jiang
- Drug
Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China
| | - Cheng Luo
- Drug
Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China
| | - Mingyue Zheng
- Drug
Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China
| |
Collapse
|
29
|
Xu D, Li L, Zhou D, Liu D, Hudmon A, Meroueh SO. Structure-Based Target-Specific Screening Leads to Small-Molecule CaMKII Inhibitors. ChemMedChem 2017; 12:660-677. [PMID: 28371191 PMCID: PMC5554713 DOI: 10.1002/cmdc.201600636] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2016] [Revised: 03/23/2017] [Indexed: 02/06/2023]
Abstract
Target-specific scoring methods are more commonly used to identify small-molecule inhibitors among compounds docked to a target of interest. Top candidates that emerge from these methods have rarely been tested for activity and specificity across a family of proteins. In this study we docked a chemical library into CaMKIIδ, a member of the Ca2+ /calmodulin (CaM)-dependent protein kinase (CaMK) family, and re-scored the resulting protein-compound structures using Support Vector Machine SPecific (SVMSP), a target-specific method that we developed previously. Among the 35 selected candidates, three hits were identified, such as quinazoline compound 1 (KIN-1; N4-[7-chloro-2-[(E)-styryl]quinazolin-4-yl]-N1,N1-diethylpentane-1,4-diamine), which was found to inhibit CaMKIIδ kinase activity at single-digit micromolar IC50 . Activity across the kinome was assessed by profiling analogues of 1, namely 6 (KIN-236; N4-[7-chloro-2-[(E)-2-(2-chloro-4,5-dimethoxyphenyl)vinyl]quinazolin-4-yl]-N1,N1-diethylpentane-1,4-diamine), and an analogue of hit compound 2 (KIN-15; 2-[4-[(E)-[(5-bromobenzofuran-2-carbonyl)hydrazono]methyl]-2-chloro-6-methoxyphenoxy]acetic acid), namely 14 (KIN-332; N-[(E)-[4-(2-anilino-2-oxoethoxy)-3-chlorophenyl]methyleneamino]benzofuran-2-carboxamide), against 337 kinases. Interestingly, for compound 6, CaMKIIδ and homologue CaMKIIγ were among the top ten targets. Among the top 25 targets of 6, IC50 values ranged from 5 to 22 μm. Compound 14 was found to be not specific toward CaMKII kinases, but it does inhibit two kinases with sub-micromolar IC50 values among the top 25. Derivatives of 1 were tested against several kinases including several members of the CaMK family. These data afforded a limited structure-activity relationship study. Molecular dynamics simulations with explicit solvent followed by end-point MM-GBSA free-energy calculations revealed strong engagement of specific residues within the ATP binding pocket, and also changes in the dynamics as a result of binding. This work suggests that target-specific scoring approaches such as SVMSP may hold promise for the identification of small-molecule kinase inhibitors that exhibit some level of specificity toward the target of interest across a large number of proteins.
Collapse
Affiliation(s)
- David Xu
- Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indianapolis, IN, 46202, USA
- Department of BioHealth Informatics, Indiana University School of Informatics and Computing, Indianapolis, IN, 46202, USA
| | - Liwei Li
- Department of Biochemistry and Molecular Biology, Indiana University School of Medicine, Van Nuys Medical Science Building, MS 4023, 635 Barnhill Drive, Indianapolis, IN, 46202-5122, USA
| | - Donghui Zhou
- Department of Biochemistry and Molecular Biology, Indiana University School of Medicine, Van Nuys Medical Science Building, MS 4023, 635 Barnhill Drive, Indianapolis, IN, 46202-5122, USA
| | - Degang Liu
- Department of Biochemistry and Molecular Biology, Indiana University School of Medicine, Van Nuys Medical Science Building, MS 4023, 635 Barnhill Drive, Indianapolis, IN, 46202-5122, USA
| | - Andy Hudmon
- Department of Biochemistry and Molecular Biology, Indiana University School of Medicine, Van Nuys Medical Science Building, MS 4023, 635 Barnhill Drive, Indianapolis, IN, 46202-5122, USA
- Stark Neurosciences Research Institute, Indiana University School of Medicine, Indianapolis, IN, 46202, USA
| | - Samy O Meroueh
- Department of Biochemistry and Molecular Biology, Indiana University School of Medicine, Van Nuys Medical Science Building, MS 4023, 635 Barnhill Drive, Indianapolis, IN, 46202-5122, USA
- Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indianapolis, IN, 46202, USA
- Stark Neurosciences Research Institute, Indiana University School of Medicine, Indianapolis, IN, 46202, USA
- Indiana University Simon Cancer Center, Indiana University School of Medicine, Indianapolis, IN, 46202, USA
| |
Collapse
|
30
|
Prediction of N-Methyl-D-Aspartate Receptor GluN1-Ligand Binding Affinity by a Novel SVM-Pose/SVM-Score Combinatorial Ensemble Docking Scheme. Sci Rep 2017; 7:40053. [PMID: 28059133 PMCID: PMC5216401 DOI: 10.1038/srep40053] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2016] [Accepted: 11/30/2016] [Indexed: 01/24/2023] Open
Abstract
The glycine-binding site of the N-methyl-D-aspartate receptor (NMDAR) subunit GluN1 is a potential pharmacological target for neurodegenerative disorders. A novel combinatorial ensemble docking scheme using ligand and protein conformation ensembles and customized support vector machine (SVM)-based models to select the docked pose and to predict the docking score was generated for predicting the NMDAR GluN1-ligand binding affinity. The predicted root mean square deviation (RMSD) values in pose by SVM-Pose models were found to be in good agreement with the observed values (n = 30, r2 = 0.928–0.988, = 0.894–0.954, RMSE = 0.002–0.412, s = 0.001–0.214), and the predicted pKi values by SVM-Score were found to be in good agreement with the observed values for the training samples (n = 24, r2 = 0.967, = 0.899, RMSE = 0.295, s = 0.170) and test samples (n = 13, q2 = 0.894, RMSE = 0.437, s = 0.202). When subjected to various statistical validations, the developed SVM-Pose and SVM-Score models consistently met the most stringent criteria. A mock test asserted the predictivity of this novel docking scheme. Collectively, this accurate novel combinatorial ensemble docking scheme can be used to predict the NMDAR GluN1-ligand binding affinity for facilitating drug discovery.
Collapse
|
31
|
Hu B, Kuang ZK, Feng SY, Wang D, He SB, Kong DX. Three-Dimensional Biologically Relevant Spectrum (BRS-3D): Shape Similarity Profile Based on PDB Ligands as Molecular Descriptors. Molecules 2016; 21:E1554. [PMID: 27869685 PMCID: PMC6273508 DOI: 10.3390/molecules21111554] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2016] [Revised: 11/10/2016] [Accepted: 11/11/2016] [Indexed: 01/11/2023] Open
Abstract
The crystallized ligands in the Protein Data Bank (PDB) can be treated as the inverse shapes of the active sites of corresponding proteins. Therefore, the shape similarity between a molecule and PDB ligands indicated the possibility of the molecule to bind with the targets. In this paper, we proposed a shape similarity profile that can be used as a molecular descriptor for ligand-based virtual screening. First, through three-dimensional (3D) structural clustering, 300 diverse ligands were extracted from the druggable protein-ligand database, sc-PDB. Then, each of the molecules under scrutiny was flexibly superimposed onto the 300 ligands. Superimpositions were scored by shape overlap and property similarity, producing a 300 dimensional similarity array termed the "Three-Dimensional Biologically Relevant Spectrum (BRS-3D)". Finally, quantitative or discriminant models were developed with the 300 dimensional descriptor using machine learning methods (support vector machine). The effectiveness of this approach was evaluated using 42 benchmark data sets from the G protein-coupled receptor (GPCR) ligand library and the GPCR decoy database (GLL/GDD). We compared the performance of BRS-3D with other 2D and 3D state-of-the-art molecular descriptors. The results showed that models built with BRS-3D performed best for most GLL/GDD data sets. We also applied BRS-3D in histone deacetylase 1 inhibitors screening and GPCR subtype selectivity prediction. The advantages and disadvantages of this approach are discussed.
Collapse
Affiliation(s)
- Ben Hu
- State Key Laboratory of Agricultural Microbiology, Huazhong Agricultural University, Wuhan 430070, China.
- Agricultural Bioinformatics Key Laboratory of Hubei Province, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China.
| | - Zheng-Kun Kuang
- State Key Laboratory of Agricultural Microbiology, Huazhong Agricultural University, Wuhan 430070, China.
- Agricultural Bioinformatics Key Laboratory of Hubei Province, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China.
| | - Shi-Yu Feng
- State Key Laboratory of Agricultural Microbiology, Huazhong Agricultural University, Wuhan 430070, China.
| | - Dong Wang
- State Key Laboratory of Agricultural Microbiology, Huazhong Agricultural University, Wuhan 430070, China.
| | - Song-Bing He
- State Key Laboratory of Agricultural Microbiology, Huazhong Agricultural University, Wuhan 430070, China.
| | - De-Xin Kong
- State Key Laboratory of Agricultural Microbiology, Huazhong Agricultural University, Wuhan 430070, China.
- Agricultural Bioinformatics Key Laboratory of Hubei Province, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China.
| |
Collapse
|
32
|
Xu D, Meroueh SO. Effect of Binding Pose and Modeled Structures on SVMGen and GlideScore Enrichment of Chemical Libraries. J Chem Inf Model 2016; 56:1139-51. [PMID: 27154487 DOI: 10.1021/acs.jcim.5b00709] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
Virtual screening consists of docking libraries of small molecules to a target protein followed by rank-ordering of the resulting structures using scoring functions. The ability of scoring methods to distinguish between actives and inactives depends on several factors that include the accuracy of the binding pose during the docking step and the quality of the three-dimensional structure of the target. Here, we build on our previous work to introduce a new scoring approach (SVMGen) that uses machine learning trained with features from statistical pair potentials obtained from three-dimensional crystal structures. We use SVMGen and GlideScore to explore how enrichment or rank-ordering is affected by binding pose accuracy. To that end, we create a validation set that consists strictly of proteins whose crystal structure was solved in complex with their inhibitors. For the rank-ordering studies, we use crystal structures from PDBbind along with corresponding binding affinity data provided in the database. In addition to binding pose, we investigate the effect of using modeled structures for the target on the enrichment performance of SVMGen and GlideScore. To accomplish this, we generated homology models for protein kinases in DUD-E for which crystal structures are available to enable comparison of enrichment between modeled and crystal structure. We also generate homology models for kinases in SARfari for which there are many known small-molecule inhibitors but no known crystal structure. These models are used to assess the ability of SVMGen and GlideScore to distinguish between actives and decoys. We focus our work on protein kinases considering the wealth of structural and binding affinity data that exists for this family of proteins.
Collapse
Affiliation(s)
- David Xu
- Department of BioHealth Informatics, Indiana University School of Informatics and Computing , Indianapolis, Indiana 46202, United States
| | | |
Collapse
|
33
|
Ain QU, Aleksandrova A, Roessler FD, Ballester PJ. Machine-learning scoring functions to improve structure-based binding affinity prediction and virtual screening. WILEY INTERDISCIPLINARY REVIEWS. COMPUTATIONAL MOLECULAR SCIENCE 2015; 5:405-424. [PMID: 27110292 PMCID: PMC4832270 DOI: 10.1002/wcms.1225] [Citation(s) in RCA: 190] [Impact Index Per Article: 21.1] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/03/2015] [Revised: 07/17/2015] [Accepted: 07/18/2015] [Indexed: 12/29/2022]
Abstract
Docking tools to predict whether and how a small molecule binds to a target can be applied if a structural model of such target is available. The reliability of docking depends, however, on the accuracy of the adopted scoring function (SF). Despite intense research over the years, improving the accuracy of SFs for structure-based binding affinity prediction or virtual screening has proven to be a challenging task for any class of method. New SFs based on modern machine-learning regression models, which do not impose a predetermined functional form and thus are able to exploit effectively much larger amounts of experimental data, have recently been introduced. These machine-learning SFs have been shown to outperform a wide range of classical SFs at both binding affinity prediction and virtual screening. The emerging picture from these studies is that the classical approach of using linear regression with a small number of expert-selected structural features can be strongly improved by a machine-learning approach based on nonlinear regression allied with comprehensive data-driven feature selection. Furthermore, the performance of classical SFs does not grow with larger training datasets and hence this performance gap is expected to widen as more training data becomes available in the future. Other topics covered in this review include predicting the reliability of a SF on a particular target class, generating synthetic data to improve predictive performance and modeling guidelines for SF development. WIREs Comput Mol Sci 2015, 5:405-424. doi: 10.1002/wcms.1225 For further resources related to this article, please visit the WIREs website.
Collapse
Affiliation(s)
- Qurrat Ul Ain
- Department of Chemistry, Centre for Molecular Informatics University of Cambridge Cambridge UK
| | | | - Florian D Roessler
- Department of Chemistry, Centre for Molecular Informatics University of Cambridge Cambridge UK
| | - Pedro J Ballester
- Cancer Research Center of Marseille, (INSERM U1068, Institut Paoli-Calmettes, Aix-Marseille Université, CNRS UMR7258) Marseille France
| |
Collapse
|
34
|
Li H, Leung KS, Wong MH, Ballester PJ. Low-Quality Structural and Interaction Data Improves Binding Affinity Prediction via Random Forest. Molecules 2015; 20:10947-62. [PMID: 26076113 PMCID: PMC6272292 DOI: 10.3390/molecules200610947] [Citation(s) in RCA: 59] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2015] [Revised: 06/04/2015] [Accepted: 06/09/2015] [Indexed: 12/17/2022] Open
Abstract
Docking scoring functions can be used to predict the strength of protein-ligand binding. It is widely believed that training a scoring function with low-quality data is detrimental for its predictive performance. Nevertheless, there is a surprising lack of systematic validation experiments in support of this hypothesis. In this study, we investigated to which extent training a scoring function with data containing low-quality structural and binding data is detrimental for predictive performance. We actually found that low-quality data is not only non-detrimental, but beneficial for the predictive performance of machine-learning scoring functions, though the improvement is less important than that coming from high-quality data. Furthermore, we observed that classical scoring functions are not able to effectively exploit data beyond an early threshold, regardless of its quality. This demonstrates that exploiting a larger data volume is more important for the performance of machine-learning scoring functions than restricting to a smaller set of higher data quality.
Collapse
Affiliation(s)
- Hongjian Li
- Department of Computer Science and Engineering, Chinese University of Hong Kong, Sha Tin, New Territories 999077, Hong Kong.
| | - Kwong-Sak Leung
- Department of Computer Science and Engineering, Chinese University of Hong Kong, Sha Tin, New Territories 999077, Hong Kong.
| | - Man-Hon Wong
- Department of Computer Science and Engineering, Chinese University of Hong Kong, Sha Tin, New Territories 999077, Hong Kong.
| | - Pedro J Ballester
- Cancer Research Center of Marseille, INSERM U1068, F-13009 Marseille, France.
| |
Collapse
|
35
|
Jasial S, Balfer J, Vogt M, Bajorath J. Determination of Meta-Parameters for Support Vector Machine Linear Combinations. Mol Inform 2015; 34:127-33. [DOI: 10.1002/minf.201400163] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2014] [Accepted: 12/16/2014] [Indexed: 11/05/2022]
|
36
|
Xu D, Wang B, Meroueh SO. Structure-based computational approaches for small-molecule modulation of protein-protein interactions. Methods Mol Biol 2015; 1278:77-92. [PMID: 25859944 DOI: 10.1007/978-1-4939-2425-7_5] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
Three-dimensional structures of proteins offer an opportunity for the rational design of small molecules to modulate protein-protein interactions. The presence of a well-defined binding pocket on the surface of protein complexes, particularly at their interface, can be used for docking-based virtual screening of chemical libraries. Several approaches have been developed to identify binding pockets that are implemented in programs such as SiteMap, fpocket, and FTSite. These programs enable the scoring of these pockets to determine whether they are suitable to accommodate high-affinity small molecules. Virtual screening of commercial or combinatorial libraries can be carried out to enrich these libraries and select compounds for further experimental validation. In virtual screening, a compound library is docked to the target protein. The resulting structures are scored and ranked for the selection and experimental validation of top candidates. Molecular docking has been implemented in a number of computer programs such as AutoDock Vina. We select a set of protein-protein interactions that have been successfully inhibited with small molecules in the past. Several computer programs are applied to identify pockets on the surface, and molecular docking is conducted in an attempt to reproduce the binding pose of the inhibitors. The results highlight the strengths and limitations of computational methods for the design of PPI inhibitors.
Collapse
Affiliation(s)
- David Xu
- Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, 410 W. 10th Street, Indianapolis, IN, 46202, USA
| | | | | |
Collapse
|
37
|
Wang B, Buchman CD, Li L, Hurley TD, Meroueh SO. Enrichment of chemical libraries docked to protein conformational ensembles and application to aldehyde dehydrogenase 2. J Chem Inf Model 2014; 54:2105-16. [PMID: 24856086 PMCID: PMC4114474 DOI: 10.1021/ci5002026] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
Molecular recognition is a complex process that involves a large ensemble of structures of the receptor and ligand. Yet, most structure-based virtual screening is carried out on a single structure typically from X-ray crystallography. Explicit-solvent molecular dynamics (MD) simulations offer an opportunity to sample multiple conformational states of a protein. Here we evaluate our recently developed scoring method SVMSP in its ability to enrich chemical libraries docked to MD structures of seven proteins from the Directory of Useful Decoys (DUD). SVMSP is a target-specific rescoring method that combines machine learning with statistical potentials. We find that enrichment power as measured by the area under the ROC curve (ROC-AUC) is not affected by increasing the number of MD structures. Among individual MD snapshots, many exhibited enrichment that was significantly better than the crystal structure, but no correlation between enrichment and structural deviation from crystal structure was found. We followed an innovative approach by training SVMSP scoring models using MD structures (SVMSPMD). The resulting models were applied to two difficult cases (p38 and CDK2) for which enrichment was not better than random. We found remarkable increase in enrichment power, particularly for p38, where the ROC-AUC increased by 0.30 to 0.85. Finally, we explored approaches for a priori identification of MD snapshots with high enrichment power from an MD simulation in the absence of active compounds. We found that the use of randomly selected compounds docked to the target of interest using SVMSP led to notable enrichment for EGFR and Src MD snapshots. SVMSP rescoring of protein-compound MD structures was applied for the search of small-molecule inhibitors of the mitochondrial enzyme aldehyde dehydrogenase 2 (ALDH2). Rank-ordering of a commercial library of 50 000 compounds docked to MD structures of ALDH2 led to five small-molecule inhibitors. Four compounds had IC50s below 5 μM. These compounds serve as leads for the design and synthesis of more potent and selective ALDH2 inhibitors.
Collapse
Affiliation(s)
- Bo Wang
- Department of Biochemistry and Molecular Biology, ‡Melvin and Bren Simon Cancer Center, §Center for Computational Biology and Bioinformatics, and ⊥Stark Neurosciences Institute, Indiana University School of Medicine , 535 Barnhill Drive, Indianapolis, Indiana 46202, United States
| | | | | | | | | |
Collapse
|
38
|
Abstract
Docking methodology aims to predict the experimental binding modes and affinities of small molecules within the binding site of particular receptor targets and is currently used as a standard computational tool in drug design for lead compound optimisation and in virtual screening studies to find novel biologically active molecules. The basic tools of a docking methodology include a search algorithm and an energy scoring function for generating and evaluating ligand poses. In this review, we present the search algorithms and scoring functions most commonly used in current molecular docking methods that focus on protein-ligand applications. We summarise the main topics and recent computational and methodological advances in protein-ligand docking. Protein flexibility, multiple ligand binding modes and the free-energy landscape profile for binding affinity prediction are important and interconnected challenges to be overcome by further methodological developments in the docking field.
Collapse
|
39
|
Wang B, Li L, Hurley TD, Meroueh SO. Molecular recognition in a diverse set of protein-ligand interactions studied with molecular dynamics simulations and end-point free energy calculations. J Chem Inf Model 2013; 53:2659-70. [PMID: 24032517 DOI: 10.1021/ci400312v] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
End-point free energy calculations using MM-GBSA and MM-PBSA provide a detailed understanding of molecular recognition in protein-ligand interactions. The binding free energy can be used to rank-order protein-ligand structures in virtual screening for compound or target identification. Here, we carry out free energy calculations for a diverse set of 11 proteins bound to 14 small molecules using extensive explicit-solvent MD simulations. The structure of these complexes was previously solved by crystallography and their binding studied with isothermal titration calorimetry (ITC) data enabling direct comparison to the MM-GBSA and MM-PBSA calculations. Four MM-GBSA and three MM-PBSA calculations reproduced the ITC free energy within 1 kcal·mol(-1) highlighting the challenges in reproducing the absolute free energy from end-point free energy calculations. MM-GBSA exhibited better rank-ordering with a Spearman ρ of 0.68 compared to 0.40 for MM-PBSA with dielectric constant (ε = 1). An increase in ε resulted in significantly better rank-ordering for MM-PBSA (ρ = 0.91 for ε = 10), but larger ε significantly reduced the contributions of electrostatics, suggesting that the improvement is due to the nonpolar and entropy components, rather than a better representation of the electrostatics. The SVRKB scoring function applied to MD snapshots resulted in excellent rank-ordering (ρ = 0.81). Calculations of the configurational entropy using normal-mode analysis led to free energies that correlated significantly better to the ITC free energy than the MD-based quasi-harmonic approach, but the computed entropies showed no correlation with the ITC entropy. When the adaptation energy is taken into consideration by running separate simulations for complex, apo, and ligand (MM-PBSAADAPT), there is less agreement with the ITC data for the individual free energies, but remarkably good rank-ordering is observed (ρ = 0.89). Interestingly, filtering MD snapshots by prescoring protein-ligand complexes with a machine learning-based approach (SVMSP) resulted in a significant improvement in the MM-PBSA results (ε = 1) from ρ = 0.40 to ρ = 0.81. Finally, the nonpolar components of MM-GBSA and MM-PBSA, but not the electrostatic components, showed strong correlation to the ITC free energy; the computed entropies did not correlate with the ITC entropy.
Collapse
Affiliation(s)
- Bo Wang
- Indiana University Department of Biochemistry and Molecular Biology, ‡Center for Computational Biology and Bioinformatics, §Department of Chemistry and Chemical Biology (IUPUI), ∥Stark Neurosciences Research Institute, Indiana University School of Medicine , 535 Barnhill Drive, Indianapolis, Indiana 46202, United States
| | | | | | | |
Collapse
|
40
|
Koppisetty CAK, Frank M, Kemp GJL, Nyholm PG. Computation of binding energies including their enthalpy and entropy components for protein-ligand complexes using support vector machines. J Chem Inf Model 2013; 53:2559-70. [PMID: 24050538 DOI: 10.1021/ci400321r] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Computing binding energies of protein-ligand complexes including their enthalpy and entropy terms by means of computational methods is an appealing approach for selecting initial hits and for further optimization in early stages of drug discovery. Despite the importance, computational predictions of thermodynamic components have evaded attention and reasonable solutions. In this study, support vector machines are used for developing scoring functions to compute binding energies and their enthalpy and entropy components of protein-ligand complexes. The binding energies computed from our newly derived scoring functions have better Pearson's correlation coefficients with experimental data than previously reported scoring functions in benchmarks for protein-ligand complexes from the PDBBind database. The protein-ligand complexes with binding energies dominated by enthalpy or entropy term could be qualitatively classified by the newly derived scoring functions with high accuracy. Furthermore, it is found that the inclusion of comprehensive descriptors based on ligand properties in the scoring functions improved the accuracy of classification as well as the prediction of binding energies including their thermodynamic components. The prediction of binding energies including the enthalpy and entropy components using the support vector machine based scoring functions should be of value in the drug discovery process.
Collapse
|
41
|
Yuriev E, Ramsland PA. Latest developments in molecular docking: 2010-2011 in review. J Mol Recognit 2013; 26:215-39. [PMID: 23526775 DOI: 10.1002/jmr.2266] [Citation(s) in RCA: 193] [Impact Index Per Article: 17.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2012] [Revised: 01/16/2013] [Accepted: 01/19/2013] [Indexed: 12/28/2022]
Affiliation(s)
- Elizabeth Yuriev
- Medicinal Chemistry, Monash Institute of Pharmaceutical Sciences; Monash University; Parkville; VIC; 3052; Australia
| | | |
Collapse
|
42
|
Scharfe M, Pippel M, Sippl W. ParaDockS - an open-source framework for molecular docking: implementation of target-class-specific scoring methods. J Cheminform 2013. [PMCID: PMC3606148 DOI: 10.1186/1758-2946-5-s1-p11] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
|
43
|
Abstract
Virtual screening has become a standard tool in drug discovery to identify novel lead compounds that target a biomolecule of interest. I present several concepts in ligand-based and structure-based virtual screening and discuss some of the current shortcomings and new developments. I also highlight approaches that combine concepts from structure- and ligand-based design.
Collapse
Affiliation(s)
- Markus Lill
- Department of Medicinal Chemistry and Molecular Pharmacology, Purdue University, West Lafayette, IN, USA
| |
Collapse
|
44
|
Scharfe M, Pippel M, Sippl W. Development of target-biased scoring functions for protein-ligand docking. J Cheminform 2012. [PMCID: PMC3341274 DOI: 10.1186/1758-2946-4-s1-p35] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022] Open
|
45
|
Vogt M, Bajorath J. Chemoinformatics: A view of the field and current trends in method development. Bioorg Med Chem 2012; 20:5317-23. [DOI: 10.1016/j.bmc.2012.03.030] [Citation(s) in RCA: 40] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2012] [Revised: 03/09/2012] [Accepted: 03/12/2012] [Indexed: 12/18/2022]
|
46
|
Cheng T, Li Q, Zhou Z, Wang Y, Bryant SH. Structure-based virtual screening for drug discovery: a problem-centric review. AAPS J 2012; 14:133-41. [PMID: 22281989 PMCID: PMC3282008 DOI: 10.1208/s12248-012-9322-0] [Citation(s) in RCA: 352] [Impact Index Per Article: 29.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2011] [Accepted: 01/04/2012] [Indexed: 11/30/2022] Open
Abstract
Structure-based virtual screening (SBVS) has been widely applied in early-stage drug discovery. From a problem-centric perspective, we reviewed the recent advances and applications in SBVS with a special focus on docking-based virtual screening. We emphasized the researchers' practical efforts in real projects by understanding the ligand-target binding interactions as a premise. We also highlighted the recent progress in developing target-biased scoring functions by optimizing current generic scoring functions toward certain target classes, as well as in developing novel ones by means of machine learning techniques.
Collapse
Affiliation(s)
- Tiejun Cheng
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, 8600 Rockville Pike, Bethesda, Maryland 20894 USA
| | - Qingliang Li
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, 8600 Rockville Pike, Bethesda, Maryland 20894 USA
| | - Zhigang Zhou
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, 8600 Rockville Pike, Bethesda, Maryland 20894 USA
| | - Yanli Wang
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, 8600 Rockville Pike, Bethesda, Maryland 20894 USA
| | - Stephen H. Bryant
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, 8600 Rockville Pike, Bethesda, Maryland 20894 USA
| |
Collapse
|
47
|
Li L, Wang B, Meroueh SO. Support vector regression scoring of receptor-ligand complexes for rank-ordering and virtual screening of chemical libraries. J Chem Inf Model 2011; 51:2132-8. [PMID: 21728360 PMCID: PMC3209528 DOI: 10.1021/ci200078f] [Citation(s) in RCA: 70] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]
Abstract
The community structure-activity resource (CSAR) data sets are used to develop and test a support vector machine-based scoring function in regression mode (SVR). Two scoring functions (SVR-KB and SVR-EP) are derived with the objective of reproducing the trend of the experimental binding affinities provided within the two CSAR data sets. The features used to train SVR-KB are knowledge-based pairwise potentials, while SVR-EP is based on physicochemical properties. SVR-KB and SVR-EP were compared to seven other widely used scoring functions, including Glide, X-score, GoldScore, ChemScore, Vina, Dock, and PMF. Results showed that SVR-KB trained with features obtained from three-dimensional complexes of the PDBbind data set outperformed all other scoring functions, including best performing X-score, by nearly 0.1 using three correlation coefficients, namely Pearson, Spearman, and Kendall. It was interesting that higher performance in rank ordering did not translate into greater enrichment in virtual screening assessed using the 40 targets of the Directory of Useful Decoys (DUD). To remedy this situation, a variant of SVR-KB (SVR-KBD) was developed by following a target-specific tailoring strategy that we had previously employed to derive SVM-SP. SVR-KBD showed a much higher enrichment, outperforming all other scoring functions tested, and was comparable in performance to our previously derived scoring function SVM-SP.
Collapse
Affiliation(s)
- Liwei Li
- Department of Biochemistry and Molecular Biology, Indiana University School of Medicine, Indiana University, Indianapolis, Indiana, United States
| | | | | |
Collapse
|