1
|
Caba K, Tran-Nguyen VK, Rahman T, Ballester PJ. Comprehensive machine learning boosts structure-based virtual screening for PARP1 inhibitors. J Cheminform 2024; 16:40. [PMID: 38582911 PMCID: PMC10999096 DOI: 10.1186/s13321-024-00832-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2023] [Accepted: 03/23/2024] [Indexed: 04/08/2024] Open
Abstract
Poly ADP-ribose polymerase 1 (PARP1) is an attractive therapeutic target for cancer treatment. Machine-learning scoring functions constitute a promising approach to discovering novel PARP1 inhibitors. Cutting-edge PARP1-specific machine-learning scoring functions were investigated using semi-synthetic training data from docking activity-labelled molecules: known PARP1 inhibitors, hard-to-discriminate decoys property-matched to them with generative graph neural networks and confirmed inactives. We further made test sets harder by including only molecules dissimilar to those in the training set. Comprehensive analysis of these datasets using five supervised learning algorithms, and protein-ligand fingerprints extracted from docking poses and ligand only features revealed one highly predictive scoring function. This is the PARP1-specific support vector machine-based regressor, when employing PLEC fingerprints, which achieved a high Normalized Enrichment Factor at the top 1% on the hardest test set (NEF1% = 0.588, median of 10 repetitions), and was more predictive than any other investigated scoring function, especially the classical scoring function employed as baseline.
Collapse
Affiliation(s)
- Klaudia Caba
- Department of Bioengineering, Imperial College London, London, SW7 2AZ, UK
| | - Viet-Khoa Tran-Nguyen
- Unité de Biologie Fonctionnelle et Adaptative (BFA), UFR Sciences du Vivant, Université Paris Cité, 75013, Paris, France
| | - Taufiq Rahman
- Department of Pharmacology, University of Cambridge, Cambridge, CB2 1PD, UK
| | - Pedro J Ballester
- Department of Bioengineering, Imperial College London, London, SW7 2AZ, UK.
| |
Collapse
|
2
|
Gómez-Sacristán P, Simeon S, Tran-Nguyen VK, Patil S, Ballester PJ. Inactive-enriched machine-learning models exploiting patent data improve structure-based virtual screening for PDL1 dimerizers. J Adv Res 2024:S2090-1232(24)00037-7. [PMID: 38280715 DOI: 10.1016/j.jare.2024.01.024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2023] [Revised: 12/01/2023] [Accepted: 01/21/2024] [Indexed: 01/29/2024] Open
Abstract
INTRODUCTION Small-molecule Programmable Cell Death Protein 1/Programmable Death-Ligand 1 (PD1/PDL1) inhibition via PDL1 dimerization has the potential to lead to inexpensive drugs with better cancer patient outcomes and milder side effects. However, this therapeutic approach has proven challenging, with only one PDL1 dimerizer reaching early clinical trials so far. There is hence a need for fast and accurate methods to develop alternative PDL1 dimerizers. OBJECTIVES We aim to show that structure-based virtual screening (SBVS) based on PDL1-specific machine-learning (ML) scoring functions (SFs) is a powerful drug design tool for detecting PD1/PDL1 inhibitors via PDL1 dimerization. METHODS By incorporating the latest MLSF advances, we generated and evaluated PDL1-specific MLSFs (classifiers and inactive-enriched regressors) on two demanding test sets. RESULTS 60 PDL1-specific MLSFs (30 classifiers and 30 regressors) were generated. Our large-scale analysis provides highly predictive PDL1-specific MLSFs that benefitted from training with large volumes of docked inactives and enabling inactive-enriched regression. CONCLUSION PDL1-specific MLSFs strongly outperformed generic SFs of various types on this target and are released here without restrictions.
Collapse
Affiliation(s)
| | - Saw Simeon
- Centre de Recherche en Cancérologie de Marseille, Marseille 13009, France
| | | | - Sachin Patil
- NanoBio Laboratory, Widener University, Chester, PA 19013, USA
| | - Pedro J Ballester
- Department of Bioengineering, Imperial College London, London SW7 2AZ, UK.
| |
Collapse
|
3
|
Tran-Nguyen VK, Junaid M, Simeon S, Ballester PJ. A practical guide to machine-learning scoring for structure-based virtual screening. Nat Protoc 2023; 18:3460-3511. [PMID: 37845361 DOI: 10.1038/s41596-023-00885-w] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2022] [Accepted: 07/03/2023] [Indexed: 10/18/2023]
Abstract
Structure-based virtual screening (SBVS) via docking has been used to discover active molecules for a range of therapeutic targets. Chemical and protein data sets that contain integrated bioactivity information have increased both in number and in size. Artificial intelligence and, more concretely, its machine-learning (ML) branch, including deep learning, have effectively exploited these data sets to build scoring functions (SFs) for SBVS against targets with an atomic-resolution 3D model (e.g., generated by X-ray crystallography or predicted by AlphaFold2). Often outperforming their generic and non-ML counterparts, target-specific ML-based SFs represent the state of the art for SBVS. Here, we present a comprehensive and user-friendly protocol to build and rigorously evaluate these new SFs for SBVS. This protocol is organized into four sections: (i) using a public benchmark of a given target to evaluate an existing generic SF; (ii) preparing experimental data for a target from public repositories; (iii) partitioning data into a training set and a test set for subsequent target-specific ML modeling; and (iv) generating and evaluating target-specific ML SFs by using the prepared training-test partitions. All necessary code and input/output data related to three example targets (acetylcholinesterase, HMG-CoA reductase, and peroxisome proliferator-activated receptor-α) are available at https://github.com/vktrannguyen/MLSF-protocol , can be run by using a single computer within 1 week and make use of easily accessible software/programs (e.g., Smina, CNN-Score, RF-Score-VS and DeepCoy) and web resources. Our aim is to provide practical guidance on how to augment training data to enhance SBVS performance, how to identify the most suitable supervised learning algorithm for a data set, and how to build an SF with the highest likelihood of discovering target-active molecules within a given compound library.
Collapse
Affiliation(s)
| | - Muhammad Junaid
- Centre de Recherche en Cancérologie de Marseille, Marseille, France
| | - Saw Simeon
- Centre de Recherche en Cancérologie de Marseille, Marseille, France
| | | |
Collapse
|
4
|
Tran-Nguyen VK, Ballester PJ. Beware of Simple Methods for Structure-Based Virtual Screening: The Critical Importance of Broader Comparisons. J Chem Inf Model 2023; 63:1401-1405. [PMID: 36848585 PMCID: PMC10015451 DOI: 10.1021/acs.jcim.3c00218] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/01/2023]
Abstract
We discuss how data unbiasing and simple methods such as protein-ligand Interaction FingerPrint (IFP) can overestimate virtual screening performance. We also show that IFP is strongly outperformed by target-specific machine-learning scoring functions, which were not considered in a recent report concluding that simple methods were better than machine-learning scoring functions at virtual screening.
Collapse
Affiliation(s)
| | - Pedro J Ballester
- Department of Bioengineering, Imperial College London, London SW7 2AZ, U.K
| |
Collapse
|
5
|
Jin LP, Zhang C, Xie Q, Xu J, Wang L, Yang LC, Huang EF, Wan DCC, Hu C. Design, synthesis and biological activity against estrogen receptor-dependent breast cancer of furo[1]benzofuran derivatives. ARAB J CHEM 2022. [DOI: 10.1016/j.arabjc.2022.104227] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022] Open
|
6
|
Tran-Nguyen VK, Simeon S, Junaid M, Ballester PJ. Structure-based virtual screening for PDL1 dimerizers: Evaluating generic scoring functions. Curr Res Struct Biol 2022; 4:206-210. [PMID: 35769111 PMCID: PMC9234010 DOI: 10.1016/j.crstbi.2022.06.002] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2022] [Revised: 05/14/2022] [Accepted: 06/02/2022] [Indexed: 10/31/2022] Open
Abstract
The interaction between PD1 and its ligand PDL1 has been shown to render tumor cells resistant to apoptosis and promote tumor progression. An innovative mechanism to inhibit the PD1/PDL1 interaction is PDL1 dimerization induced by small-molecule PDL1 binders. Structure-based virtual screening is a promising approach to discovering such small-molecule PD1/PDL1 inhibitors. Here we investigate which type of generic scoring functions is most suitable to tackle this problem. We consider CNN-Score, an ensemble of convolutional neural networks, as the representative of machine-learning scoring functions. We also evaluate Smina, a commonly used classical scoring function, and IFP, a top structural fingerprint similarity scoring function. These three types of scoring functions were evaluated on two test sets sharing the same set of small-molecule PD1/PDL1 inhibitors, but using different types of inactives: either true inactives (molecules with no in vitro PD1/PDL1 inhibition activity) or assumed inactives (property-matched decoy molecules generated from each active). On both test sets, CNN-Score performed much better than Smina, which in turn strongly outperformed IFP. The fact that the latter was the case, despite precluding any possibility of exploiting decoy bias, demonstrates the predictive value of CNN-Score for PDL1. These results suggest that re-scoring Smina-docked molecules with CNN-Score is a promising structure-based virtual screening method to discover new small-molecule inhibitors of this therapeutic target.
Collapse
|
7
|
Bule M, Jalalimanesh N, Bayrami Z, Baeeri M, Abdollahi M. The rise of deep learning and transformations in bioactivity prediction power of molecular modeling tools. Chem Biol Drug Des 2021; 98:954-967. [PMID: 34532977 DOI: 10.1111/cbdd.13750] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2019] [Revised: 04/21/2020] [Accepted: 06/07/2020] [Indexed: 12/18/2022]
Abstract
The search and design for the better use of bioactive compounds are used in many experiments to best mimic compounds' functions in the human body. However, finding a cost-effective and timesaving approach is a top priority in different disciplines. Nowadays, artificial intelligence (AI) and particularly deep learning (DL) methods are widely applied to improve the precision and accuracy of models used in the drug discovery process. DL approaches have been used to provide more opportunities for a faster, efficient, cost-effective, and reliable computer-aided drug discovery. Moreover, the increasing biomedical data volume in areas, like genome sequences, medical images, protein structures, etc., has made data mining algorithms very important in finding novel compounds that could be drugs, uncovering or repurposing drugs and improving the area of genetic markers-based personalized medicine. Furthermore, deep neural networks (DNNs) have been demonstrated to outperform other techniques such as random forests and SVMs for QSAR studies and ligand-based virtual screening. Despite this, in QSAR studies, the quality of different data sources and potential experimental errors has greatly affected the accuracy of QSAR predictions. Therefore, further researches are still needed to improve the accuracy, selectivity, and sensitivity of the DL approach in building the best models of drug discovery.
Collapse
Affiliation(s)
- Mohammed Bule
- Department of Pharmacy, College of Medicine and Health Sciences, Ambo University, Ambo, Ethiopia.,Department of Medicinal Chemistry, School of Pharmacy, Tehran University of Medical Sciences, Tehran, Iran.,Toxicology and Diseases Group, Pharmaceutical Sciences Research Center (PSRC), The Institute of Pharmaceutical Sciences (TIPS), Tehran University of Medical Sciences, Tehran, Iran
| | - Nafiseh Jalalimanesh
- Toxicology and Diseases Group, Pharmaceutical Sciences Research Center (PSRC), The Institute of Pharmaceutical Sciences (TIPS), Tehran University of Medical Sciences, Tehran, Iran
| | - Zahra Bayrami
- Toxicology and Diseases Group, Pharmaceutical Sciences Research Center (PSRC), The Institute of Pharmaceutical Sciences (TIPS), Tehran University of Medical Sciences, Tehran, Iran
| | - Maryam Baeeri
- Toxicology and Diseases Group, Pharmaceutical Sciences Research Center (PSRC), The Institute of Pharmaceutical Sciences (TIPS), Tehran University of Medical Sciences, Tehran, Iran
| | - Mohammad Abdollahi
- Toxicology and Diseases Group, Pharmaceutical Sciences Research Center (PSRC), The Institute of Pharmaceutical Sciences (TIPS), Tehran University of Medical Sciences, Tehran, Iran.,Department of Toxicology and Pharmacology, School of Pharmacy, Tehran University of Medical Sciences, Tehran, Iran
| |
Collapse
|
8
|
Ghislat G, Rahman T, Ballester PJ. Recent progress on the prospective application of machine learning to structure-based virtual screening. Curr Opin Chem Biol 2021; 65:28-34. [PMID: 34052776 DOI: 10.1016/j.cbpa.2021.04.009] [Citation(s) in RCA: 32] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2021] [Revised: 04/13/2021] [Accepted: 04/23/2021] [Indexed: 12/30/2022]
Abstract
As more bioactivity and protein structure data become available, scoring functions (SFs) using machine learning (ML) to leverage these data sets continue to gain further accuracy and broader applicability. Advances in our understanding of the optimal ways to train and evaluate these ML-based SFs have introduced further improvements. One of these advances is how to select the most suitable decoys (molecules assumed inactive) to train or test an ML-based SF on a given target. We also review the latest applications of ML-based SFs for prospective structure-based virtual screening (SBVS), with a focus on the observed improvement over those using classical SFs. Finally, we provide recommendations for future prospective SBVS studies based on the findings of recent methodological studies.
Collapse
Affiliation(s)
- Ghita Ghislat
- U1104, CNRS UMR7280, Centre D'Immunologie de Marseille-Luminy, Inserm, Marseille, France
| | - Taufiq Rahman
- Department of Pharmacology, University of Cambridge, Cambridge, CB2 1PD, UK
| | - Pedro J Ballester
- Centre de Recherche en Cancérologie de Marseille (CRCM), Inserm, U1068, Marseille, F-13009, France; CNRS, UMR7258, Marseille, F-13009, France; Institut Paoli-Calmettes, Marseille, F-13009, France; Aix-Marseille University, UM 105, F-13284, Marseille, France.
| |
Collapse
|
9
|
Jiménez-Luna J, Grisoni F, Weskamp N, Schneider G. Artificial intelligence in drug discovery: recent advances and future perspectives. Expert Opin Drug Discov 2021; 16:949-959. [PMID: 33779453 DOI: 10.1080/17460441.2021.1909567] [Citation(s) in RCA: 83] [Impact Index Per Article: 27.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
Introduction: Artificial intelligence (AI) has inspired computer-aided drug discovery. The widespread adoption of machine learning, in particular deep learning, in multiple scientific disciplines, and the advances in computing hardware and software, among other factors, continue to fuel this development. Much of the initial skepticism regarding applications of AI in pharmaceutical discovery has started to vanish, consequently benefitting medicinal chemistry.Areas covered: The current status of AI in chemoinformatics is reviewed. The topics discussed herein include quantitative structure-activity/property relationship and structure-based modeling, de novo molecular design, and chemical synthesis prediction. Advantages and limitations of current deep learning applications are highlighted, together with a perspective on next-generation AI for drug discovery.Expert opinion: Deep learning-based approaches have only begun to address some fundamental problems in drug discovery. Certain methodological advances, such as message-passing models, spatial-symmetry-preserving networks, hybrid de novo design, and other innovative machine learning paradigms, will likely become commonplace and help address some of the most challenging questions. Open data sharing and model development will play a central role in the advancement of drug discovery with AI.
Collapse
Affiliation(s)
- José Jiménez-Luna
- Department of Chemistry and Applied Biosciences, ETH Zurich, Zurich, Switzerland
| | - Francesca Grisoni
- Department of Chemistry and Applied Biosciences, ETH Zurich, Zurich, Switzerland
| | - Nils Weskamp
- Boehringer Ingelheim Pharma GmbH & Co. KG, Biberach an Der Riss, Germany
| | - Gisbert Schneider
- Department of Chemistry and Applied Biosciences, ETH Zurich, Zurich, Switzerland
| |
Collapse
|
10
|
Selecting machine-learning scoring functions for structure-based virtual screening. DRUG DISCOVERY TODAY. TECHNOLOGIES 2020; 32-33:81-87. [PMID: 33386098 DOI: 10.1016/j.ddtec.2020.09.001] [Citation(s) in RCA: 27] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/01/2020] [Revised: 09/02/2020] [Accepted: 09/07/2020] [Indexed: 12/27/2022]
Abstract
Interest in docking technologies has grown parallel to the ever increasing number and diversity of 3D models for macromolecular therapeutic targets. Structure-Based Virtual Screening (SBVS) aims at leveraging these experimental structures to discover the necessary starting points for the drug discovery process. It is now established that Machine Learning (ML) can strongly enhance the predictive accuracy of scoring functions for SBVS by exploiting large datasets from targets, molecules and their associations. However, with greater choice, the question of which ML-based scoring function is the most suitable for prospective use on a given target has gained importance. Here we analyse two approaches to select an existing scoring function for the target along with a third approach consisting in generating a scoring function tailored to the target. These analyses required discussing the limitations of popular SBVS benchmarks, the alternatives to benchmark scoring functions for SBVS and how to generate them or use them using freely-available software.
Collapse
|
11
|
Mazurek AH, Szeleszczuk Ł, Simonson T, Pisklak DM. Application of Various Molecular Modelling Methods in the Study of Estrogens and Xenoestrogens. Int J Mol Sci 2020; 21:E6411. [PMID: 32899216 PMCID: PMC7504198 DOI: 10.3390/ijms21176411] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2020] [Revised: 08/30/2020] [Accepted: 09/01/2020] [Indexed: 12/14/2022] Open
Abstract
In this review, applications of various molecular modelling methods in the study of estrogens and xenoestrogens are summarized. Selected biomolecules that are the most commonly chosen as molecular modelling objects in this field are presented. In most of the reviewed works, ligand docking using solely force field methods was performed, employing various molecular targets involved in metabolism and action of estrogens. Other molecular modelling methods such as molecular dynamics and combined quantum mechanics with molecular mechanics have also been successfully used to predict the properties of estrogens and xenoestrogens. Among published works, a great number also focused on the application of different types of quantitative structure-activity relationship (QSAR) analyses to examine estrogen's structures and activities. Although the interactions between estrogens and xenoestrogens with various proteins are the most commonly studied, other aspects such as penetration of estrogens through lipid bilayers or their ability to adsorb on different materials are also explored using theoretical calculations. Apart from molecular mechanics and statistical methods, quantum mechanics calculations are also employed in the studies of estrogens and xenoestrogens. Their applications include computation of spectroscopic properties, both vibrational and Nuclear Magnetic Resonance (NMR), and also in quantum molecular dynamics simulations and crystal structure prediction. The main aim of this review is to present the great potential and versatility of various molecular modelling methods in the studies on estrogens and xenoestrogens.
Collapse
Affiliation(s)
- Anna Helena Mazurek
- Chair and Department of Physical Pharmacy and Bioanalysis, Department of Physical Chemistry, Medical Faculty of Pharmacy, University of Warsaw, Banacha 1 str., 02-093 Warsaw Poland; (A.H.M.); (D.M.P.)
| | - Łukasz Szeleszczuk
- Chair and Department of Physical Pharmacy and Bioanalysis, Department of Physical Chemistry, Medical Faculty of Pharmacy, University of Warsaw, Banacha 1 str., 02-093 Warsaw Poland; (A.H.M.); (D.M.P.)
| | - Thomas Simonson
- Laboratoire de Biochimie (CNRS UMR7654), Ecole Polytechnique, 91-120 Palaiseau, France;
| | - Dariusz Maciej Pisklak
- Chair and Department of Physical Pharmacy and Bioanalysis, Department of Physical Chemistry, Medical Faculty of Pharmacy, University of Warsaw, Banacha 1 str., 02-093 Warsaw Poland; (A.H.M.); (D.M.P.)
| |
Collapse
|
12
|
Adeshina YO, Deeds EJ, Karanicolas J. Machine learning classification can reduce false positives in structure-based virtual screening. Proc Natl Acad Sci U S A 2020; 117:18477-18488. [PMID: 32669436 PMCID: PMC7414157 DOI: 10.1073/pnas.2000585117] [Citation(s) in RCA: 95] [Impact Index Per Article: 23.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022] Open
Abstract
With the recent explosion in the size of libraries available for screening, virtual screening is positioned to assume a more prominent role in early drug discovery's search for active chemical matter. In typical virtual screens, however, only about 12% of the top-scoring compounds actually show activity when tested in biochemical assays. We argue that most scoring functions used for this task have been developed with insufficient thoughtfulness into the datasets on which they are trained and tested, leading to overly simplistic models and/or overtraining. These problems are compounded in the literature because studies reporting new scoring methods have not validated their models prospectively within the same study. Here, we report a strategy for building a training dataset (D-COID) that aims to generate highly compelling decoy complexes that are individually matched to available active complexes. Using this dataset, we train a general-purpose classifier for virtual screening (vScreenML) that is built on the XGBoost framework. In retrospective benchmarks, our classifier shows outstanding performance relative to other scoring functions. In a prospective context, nearly all candidate inhibitors from a screen against acetylcholinesterase show detectable activity; beyond this, 10 of 23 compounds have IC50 better than 50 μM. Without any medicinal chemistry optimization, the most potent hit has IC50 280 nM, corresponding to Ki of 173 nM. These results support using the D-COID strategy for training classifiers in other computational biology tasks, and for vScreenML in virtual screening campaigns against other protein targets. Both D-COID and vScreenML are freely distributed to facilitate such efforts.
Collapse
Affiliation(s)
- Yusuf O Adeshina
- Program in Molecular Therapeutics, Fox Chase Cancer Center, Philadelphia, PA 19111
- Center for Computational Biology, University of Kansas, Lawrence, KS 66045
| | - Eric J Deeds
- Center for Computational Biology, University of Kansas, Lawrence, KS 66045
- Department of Molecular Biosciences, University of Kansas, Lawrence, KS 66045
| | - John Karanicolas
- Program in Molecular Therapeutics, Fox Chase Cancer Center, Philadelphia, PA 19111;
| |
Collapse
|
13
|
Fresnais L, Ballester PJ. The impact of compound library size on the performance of scoring functions for structure-based virtual screening. Brief Bioinform 2020; 22:5855396. [PMID: 32568385 DOI: 10.1093/bib/bbaa095] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2020] [Revised: 04/20/2020] [Accepted: 04/28/2020] [Indexed: 12/20/2022] Open
Abstract
Larger training datasets have been shown to improve the accuracy of machine learning (ML)-based scoring functions (SFs) for structure-based virtual screening (SBVS). In addition, massive test sets for SBVS, known as ultra-large compound libraries, have been demonstrated to enable the fast discovery of selective drug leads with low-nanomolar potency. This proof-of-concept was carried out on two targets using a single docking tool along with its SF. It is thus unclear whether this high level of performance would generalise to other targets, docking tools and SFs. We found that screening a larger compound library results in more potent actives being identified in all six additional targets using a different docking tool along with its classical SF. Furthermore, we established that a way to improve the potency of the retrieved molecules further is to rank them with more accurate ML-based SFs (we found this to be true in four of the six targets; the difference was not significant in the remaining two targets). A 3-fold increase in average hit rate across targets was also achieved by the ML-based SFs. Lastly, we observed that classical and ML-based SFs often find different actives, which supports using both types of SFs on those targets.
Collapse
|
14
|
Bafna D, Ban F, Rennie PS, Singh K, Cherkasov A. Computer-Aided Ligand Discovery for Estrogen Receptor Alpha. Int J Mol Sci 2020; 21:E4193. [PMID: 32545494 PMCID: PMC7352601 DOI: 10.3390/ijms21124193] [Citation(s) in RCA: 26] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2020] [Revised: 05/30/2020] [Accepted: 06/09/2020] [Indexed: 02/08/2023] Open
Abstract
Breast cancer (BCa) is one of the most predominantly diagnosed cancers in women. Notably, 70% of BCa diagnoses are Estrogen Receptor α positive (ERα+) making it a critical therapeutic target. With that, the two subtypes of ER, ERα and ERβ, have contrasting effects on BCa cells. While ERα promotes cancerous activities, ERβ isoform exhibits inhibitory effects on the same. ER-directed small molecule drug discovery for BCa has provided the FDA approved drugs tamoxifen, toremifene, raloxifene and fulvestrant that all bind to the estrogen binding site of the receptor. These ER-directed inhibitors are non-selective in nature and may eventually induce resistance in BCa cells as well as increase the risk of endometrial cancer development. Thus, there is an urgent need to develop novel drugs with alternative ERα targeting mechanisms that can overcome the limitations of conventional anti-ERα therapies. Several functional sites on ERα, such as Activation Function-2 (AF2), DNA binding domain (DBD), and F-domain, have been recently considered as potential targets in the context of drug research and discovery. In this review, we summarize methods of computer-aided drug design (CADD) that have been employed to analyze and explore potential targetable sites on ERα, discuss recent advancement of ERα inhibitor development, and highlight the potential opportunities and challenges of future ERα-directed drug discovery.
Collapse
Affiliation(s)
| | | | | | | | - Artem Cherkasov
- Vancouver Prostate Centre, University of British Columbia, 2660 Oak Street, Vancouver, BC V6H 3Z6, Canada; (D.B.); (F.B.); (P.S.R.); (K.S.)
| |
Collapse
|
15
|
Shen C, Hu Y, Wang Z, Zhang X, Pang J, Wang G, Zhong H, Xu L, Cao D, Hou T. Beware of the generic machine learning-based scoring functions in structure-based virtual screening. Brief Bioinform 2020; 22:5850047. [PMID: 32484221 DOI: 10.1093/bib/bbaa070] [Citation(s) in RCA: 31] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2020] [Revised: 04/17/2020] [Accepted: 03/30/2020] [Indexed: 12/14/2022] Open
Abstract
Machine learning-based scoring functions (MLSFs) have attracted extensive attention recently and are expected to be potential rescoring tools for structure-based virtual screening (SBVS). However, a major concern nowadays is whether MLSFs trained for generic uses rather than a given target can consistently be applicable for VS. In this study, a systematic assessment was carried out to re-evaluate the effectiveness of 14 reported MLSFs in VS. Overall, most of these MLSFs could hardly achieve satisfactory results for any dataset, and they could even not outperform the baseline of classical SFs such as Glide SP. An exception was observed for RFscore-VS trained on the Directory of Useful Decoys-Enhanced dataset, which showed its superiority for most targets. However, in most cases, it clearly illustrated rather limited performance on the targets that were dissimilar to the proteins in the corresponding training sets. We also used the top three docking poses rather than the top one for rescoring and retrained the models with the updated versions of the training set, but only minor improvements were observed. Taken together, generic MLSFs may have poor generalization capabilities to be applicable for the real VS campaigns. Therefore, it should be quite cautious to use this type of methods for VS.
Collapse
Affiliation(s)
| | - Ye Hu
- Central South University, China
| | | | | | | | | | | | - Lei Xu
- Central South University, China
| | | | | |
Collapse
|
16
|
Li H, Sze K, Lu G, Ballester PJ. Machine‐learning scoring functions for structure‐based virtual screening. WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL MOLECULAR SCIENCE 2020. [DOI: 10.1002/wcms.1478] [Citation(s) in RCA: 37] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Affiliation(s)
- Hongjian Li
- Cancer Research Center of Marseille (INSERM U1068, Institut Paoli‐Calmettes, Aix‐Marseille Université UM105, CNRS UMR7258) Marseille France
- CUHK‐SDU Joint Laboratory on Reproductive Genetics, School of Biomedical Sciences Chinese University of Hong Kong Shatin Hong Kong
| | - Kam‐Heung Sze
- CUHK‐SDU Joint Laboratory on Reproductive Genetics, School of Biomedical Sciences Chinese University of Hong Kong Shatin Hong Kong
| | - Gang Lu
- CUHK‐SDU Joint Laboratory on Reproductive Genetics, School of Biomedical Sciences Chinese University of Hong Kong Shatin Hong Kong
| | - Pedro J. Ballester
- Cancer Research Center of Marseille (INSERM U1068, Institut Paoli‐Calmettes, Aix‐Marseille Université UM105, CNRS UMR7258) Marseille France
| |
Collapse
|
17
|
Su M, Feng G, Liu Z, Li Y, Wang R. Tapping on the Black Box: How Is the Scoring Power of a Machine-Learning Scoring Function Dependent on the Training Set? J Chem Inf Model 2020; 60:1122-1136. [DOI: 10.1021/acs.jcim.9b00714] [Citation(s) in RCA: 29] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Affiliation(s)
- Minyi Su
- State Key Laboratory of Bioorganic and Natural Products Chemistry, Center for Excellence in Molecular Synthesis, Shanghai Institute of Organic Chemistry, Chinese Academy of Sciences, 345 Lingling Road, Shanghai 200032, People’s Republic of China
- University of Chinese Academy of Sciences, Beijing 100049, People’s Republic of China
| | - Guoqin Feng
- State Key Laboratory of Bioorganic and Natural Products Chemistry, Center for Excellence in Molecular Synthesis, Shanghai Institute of Organic Chemistry, Chinese Academy of Sciences, 345 Lingling Road, Shanghai 200032, People’s Republic of China
- University of Chinese Academy of Sciences, Beijing 100049, People’s Republic of China
| | - Zhihai Liu
- State Key Laboratory of Bioorganic and Natural Products Chemistry, Center for Excellence in Molecular Synthesis, Shanghai Institute of Organic Chemistry, Chinese Academy of Sciences, 345 Lingling Road, Shanghai 200032, People’s Republic of China
| | - Yan Li
- State Key Laboratory of Bioorganic and Natural Products Chemistry, Center for Excellence in Molecular Synthesis, Shanghai Institute of Organic Chemistry, Chinese Academy of Sciences, 345 Lingling Road, Shanghai 200032, People’s Republic of China
- Department of Medicinal Chemistry, School of Pharmacy, Fudan University, 826 Zhangheng Road, Shanghai 201203, People’s Republic of China
| | - Renxiao Wang
- State Key Laboratory of Bioorganic and Natural Products Chemistry, Center for Excellence in Molecular Synthesis, Shanghai Institute of Organic Chemistry, Chinese Academy of Sciences, 345 Lingling Road, Shanghai 200032, People’s Republic of China
- Department of Medicinal Chemistry, School of Pharmacy, Fudan University, 826 Zhangheng Road, Shanghai 201203, People’s Republic of China
- Shanxi Key Laboratory of Innovative Drugs for the Treatment of Serious Diseases Basing on Chronic Inflammation, College of Traditional Chinese Medicines, Shanxi University of Chinese Medicine, Taiyuan, Shanxi 030619, People’s Republic of China
| |
Collapse
|
18
|
Li H, Sze K, Lu G, Ballester PJ. Machine‐learning scoring functions for structure‐based drug lead optimization. WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL MOLECULAR SCIENCE 2020. [DOI: 10.1002/wcms.1465] [Citation(s) in RCA: 53] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Affiliation(s)
- Hongjian Li
- CUHK‐SDU Joint Laboratory on Reproductive Genetics, School of Biomedical Sciences Chinese University of Hong Kong Shatin Hong Kong
| | - Kam‐Heung Sze
- CUHK‐SDU Joint Laboratory on Reproductive Genetics, School of Biomedical Sciences Chinese University of Hong Kong Shatin Hong Kong
| | - Gang Lu
- CUHK‐SDU Joint Laboratory on Reproductive Genetics, School of Biomedical Sciences Chinese University of Hong Kong Shatin Hong Kong
| | - Pedro J. Ballester
- Cancer Research Center of Marseille (INSERM U1068, Institut Paoli‐Calmettes, Aix‐Marseille Université UM105, CNRS UMR7258) Marseille France
| |
Collapse
|
19
|
Torres PHM, Sodero ACR, Jofily P, Silva-Jr FP. Key Topics in Molecular Docking for Drug Design. Int J Mol Sci 2019; 20:E4574. [PMID: 31540192 PMCID: PMC6769580 DOI: 10.3390/ijms20184574] [Citation(s) in RCA: 176] [Impact Index Per Article: 35.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2019] [Revised: 07/09/2019] [Accepted: 07/10/2019] [Indexed: 12/18/2022] Open
Abstract
Molecular docking has been widely employed as a fast and inexpensive technique in the past decades, both in academic and industrial settings. Although this discipline has now had enough time to consolidate, many aspects remain challenging and there is still not a straightforward and accurate route to readily pinpoint true ligands among a set of molecules, nor to identify with precision the correct ligand conformation within the binding pocket of a given target molecule. Nevertheless, new approaches continue to be developed and the volume of published works grows at a rapid pace. In this review, we present an overview of the method and attempt to summarise recent developments regarding four main aspects of molecular docking approaches: (i) the available benchmarking sets, highlighting their advantages and caveats, (ii) the advances in consensus methods, (iii) recent algorithms and applications using fragment-based approaches, and (iv) the use of machine learning algorithms in molecular docking. These recent developments incrementally contribute to an increase in accuracy and are expected, given time, and together with advances in computing power and hardware capability, to eventually accomplish the full potential of this area.
Collapse
Affiliation(s)
- Pedro H M Torres
- Department of Biochemistry, University of Cambridge, Cambridge CB2 1GA, UK.
| | - Ana C R Sodero
- Department of Drugs and Medicines; School of Pharmacy; Federal University of Rio de Janeiro, Rio de Janeiro 21949-900, RJ, Brazil.
| | - Paula Jofily
- Laboratório de Modelagem e Dinâmica Molecular, Instituto de Biofísica Carlos Chagas Filho, Universidade Federal do Rio de Janeiro, Rio de Janeiro 21949-900, RJ, Brazil.
| | - Floriano P Silva-Jr
- Laboratório de Bioquímica Experimental e Computacional de Fármacos, Instituto Oswaldo Cruz, FIOCRUZ, Rio de Janeiro 21949-900, RJ, Brazil.
| |
Collapse
|
20
|
Shen C, Ding J, Wang Z, Cao D, Ding X, Hou T. From machine learning to deep learning: Advances in scoring functions for protein–ligand docking. WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL MOLECULAR SCIENCE 2019. [DOI: 10.1002/wcms.1429] [Citation(s) in RCA: 76] [Impact Index Per Article: 15.2] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Affiliation(s)
- Chao Shen
- Hangzhou Institute of Innovative Medicine, College of Pharmaceutical Sciences, Zhejiang University Hangzhou P. R. China
| | - Junjie Ding
- Beijing Institute of Pharmaceutical Chemistry Beijing P. R. China
| | - Zhe Wang
- Hangzhou Institute of Innovative Medicine, College of Pharmaceutical Sciences, Zhejiang University Hangzhou P. R. China
| | - Dongsheng Cao
- Xiangya School of Pharmaceutical Sciences, Central South University Changsha P. R. China
| | - Xiaoqin Ding
- Beijing Institute of Pharmaceutical Chemistry Beijing P. R. China
| | - Tingjun Hou
- Hangzhou Institute of Innovative Medicine, College of Pharmaceutical Sciences, Zhejiang University Hangzhou P. R. China
| |
Collapse
|
21
|
Chen SJ, Zhu H, Zhang MM, Xu WW, Wang YC, Zhang ZF. Crystal structure of 1-benzyl-3-cyano-6-phenyl-1,2-dihydropyridine, C 19H 16N 2. Z KRIST-NEW CRYST ST 2019. [DOI: 10.1515/ncrs-2018-0516] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Abstract
C19H16N2, orthorhombic, P212121 (no. 19), a = 7.5177(10) Å, b = 12.4761(16) Å, c = 16.118(2) Å, V = 1511.7(3) Å3, Z = 4, R
gt(F) = 0.0424, wR
ref(F
2) = 0.0967, T = 293(2) K.
Collapse
Affiliation(s)
- Shi-Jun Chen
- School of Pharmacy , North China University of Science and Technology , 063210 Caofeidian District , Tangshan , P.R. China
| | - Hao Zhu
- School of Public Health , North China University of Science and Technology , 063210 Caofeidian District , Tangshan , P.R. China
| | - Meng-Meng Zhang
- School of Pharmacy , North China University of Science and Technology , 063210 Caofeidian District , Tangshan , P.R. China
| | - Wen-Wu Xu
- School of Pharmacy , North China University of Science and Technology , 063210 Caofeidian District , Tangshan , P.R. China
| | - Yu-Cai Wang
- Jia Mu Si University , School of Continuing Education , No.148 Xuefu St, Jiamusi , Heilongjiang , P.R. China
| | - Zhi-Fei Zhang
- School of Pharmacy , North China University of Science and Technology , 063210 Caofeidian District , Tangshan , P.R. China
| |
Collapse
|
22
|
Li H, Peng J, Sidorov P, Leung Y, Leung KS, Wong MH, Lu G, Ballester PJ. Classical scoring functions for docking are unable to exploit large volumes of structural and interaction data. Bioinformatics 2019; 35:3989-3995. [DOI: 10.1093/bioinformatics/btz183] [Citation(s) in RCA: 41] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2018] [Revised: 02/04/2019] [Accepted: 03/13/2019] [Indexed: 12/15/2022] Open
Abstract
Abstract
Motivation
Studies have shown that the accuracy of random forest (RF)-based scoring functions (SFs), such as RF-Score-v3, increases with more training samples, whereas that of classical SFs, such as X-Score, does not. Nevertheless, the impact of the similarity between training and test samples on this matter has not been studied in a systematic manner. It is therefore unclear how these SFs would perform when only trained on protein-ligand complexes that are highly dissimilar or highly similar to the test set. It is also unclear whether SFs based on machine learning algorithms other than RF can also improve accuracy with increasing training set size and to what extent they learn from dissimilar or similar training complexes.
Results
We present a systematic study to investigate how the accuracy of classical and machine-learning SFs varies with protein-ligand complex similarities between training and test sets. We considered three types of similarity metrics, based on the comparison of either protein structures, protein sequences or ligand structures. Regardless of the similarity metric, we found that incorporating a larger proportion of similar complexes to the training set did not make classical SFs more accurate. In contrast, RF-Score-v3 was able to outperform X-Score even when trained on just 32% of the most dissimilar complexes, showing that its superior performance owes considerably to learning from dissimilar training complexes to those in the test set. In addition, we generated the first SF employing Extreme Gradient Boosting (XGBoost), XGB-Score, and observed that it also improves with training set size while outperforming the rest of SFs. Given the continuous growth of training datasets, the development of machine-learning SFs has become very appealing.
Availability and implementation
https://github.com/HongjianLi/MLSF
Supplementary information
Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Hongjian Li
- SDIVF R&D Centre, Hong Kong Science Park, Sha Tin, New Territories, Hong Kong
- CUHK-SDU Joint Laboratory on Reproductive Genetics School of Biomedical Sciences, The Chinese University of Hong Kong, Sha Tin, New Territories, Hong Kong
| | - Jiangjun Peng
- School of Mathematics and Statistics, Xi'an Jiaotong University, Xi’an, China
| | - Pavel Sidorov
- Cancer Research Center of Marseille CRCM, INSERM, Institut Paoli-Calmettes, Aix-Marseille University, CNRS, F-13009 Marseille, France
| | | | - Kwong-Sak Leung
- Institute of Future Cities
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Sha Tin, New Territories, Hong Kong
| | - Man-Hon Wong
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Sha Tin, New Territories, Hong Kong
| | - Gang Lu
- CUHK-SDU Joint Laboratory on Reproductive Genetics School of Biomedical Sciences, The Chinese University of Hong Kong, Sha Tin, New Territories, Hong Kong
| | - Pedro J Ballester
- Cancer Research Center of Marseille CRCM, INSERM, Institut Paoli-Calmettes, Aix-Marseille University, CNRS, F-13009 Marseille, France
| |
Collapse
|
23
|
Zahorulko SP, Varenichenko SА, Farat OK, Mazepa AV, Okovytyy SI, Markov VI. Reactions of 2Н(4Н)-chromenes with dinucleophiles: one-step synthesis of 2-(1H-(bi)pyrazol-3-yl)- and 2-(1,4(5)-(benzo)diazepin-4-yl)phenols. Chem Heterocycl Compd (N Y) 2018. [DOI: 10.1007/s10593-018-2367-y] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
24
|
Wingert BM, Camacho CJ. Improving small molecule virtual screening strategies for the next generation of therapeutics. Curr Opin Chem Biol 2018; 44:87-92. [PMID: 29920436 DOI: 10.1016/j.cbpa.2018.06.006] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2018] [Revised: 04/27/2018] [Accepted: 06/04/2018] [Indexed: 01/05/2023]
Abstract
The new generation of post-genomic targets, such as protein-protein interactions (PPIs), often require new chemotypes not well represented in current compound libraries. This is one reason for why traditional high throughput screening (HTS) approaches are not more successful in delivering medicinal chemistry starting points for PPIs. In silico screening methods of an expanded chemical space are then potential alternatives for developing novel chemical probes to modulate PPIs. In this review, we report on the state-of-the-art pipelines for virtual screening, emphasizing prospectively validated methods capable of addressing the challenge of drugging difficult targets in the human interactome. Collectively, we show that optimal strategies for structure based virtual screening vary depending on receptor structure and degree of flexibility.
Collapse
Affiliation(s)
- Bentley M Wingert
- Department of Computational and Systems Biology, University of Pittsburgh, Pittsburgh, PA 15261, USA
| | - Carlos J Camacho
- Department of Computational and Systems Biology, University of Pittsburgh, Pittsburgh, PA 15261, USA.
| |
Collapse
|
25
|
Ragoza M, Hochuli J, Idrobo E, Sunseri J, Koes DR. Protein-Ligand Scoring with Convolutional Neural Networks. J Chem Inf Model 2017; 57:942-957. [PMID: 28368587 PMCID: PMC5479431 DOI: 10.1021/acs.jcim.6b00740] [Citation(s) in RCA: 438] [Impact Index Per Article: 62.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Computational approaches to drug discovery can reduce the time and cost associated with experimental assays and enable the screening of novel chemotypes. Structure-based drug design methods rely on scoring functions to rank and predict binding affinities and poses. The ever-expanding amount of protein-ligand binding and structural data enables the use of deep machine learning techniques for protein-ligand scoring. We describe convolutional neural network (CNN) scoring functions that take as input a comprehensive three-dimensional (3D) representation of a protein-ligand interaction. A CNN scoring function automatically learns the key features of protein-ligand interactions that correlate with binding. We train and optimize our CNN scoring functions to discriminate between correct and incorrect binding poses and known binders and nonbinders. We find that our CNN scoring function outperforms the AutoDock Vina scoring function when ranking poses both for pose prediction and virtual screening.
Collapse
Affiliation(s)
| | | | - Elisa Idrobo
- Department of Computer Science, The College of New Jersey , Ewing, New Jersey 08628, United States
| | | | | |
Collapse
|
26
|
Recyclization of carbonyl-substituted 4H-chromenes and 1H-benzo[f]chromenes by the action of amidines and guanidine: a novel method for the synthesis of ortho-hydroxybenzylpyrimidines. Chem Heterocycl Compd (N Y) 2016. [DOI: 10.1007/s10593-016-1969-5] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
27
|
Karatay DU, Zhang J, Harrison JS, Ginger DS. Classifying Force Spectroscopy of DNA Pulling Measurements Using Supervised and Unsupervised Machine Learning Methods. J Chem Inf Model 2016; 56:621-9. [DOI: 10.1021/acs.jcim.5b00722] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Affiliation(s)
- Durmus U. Karatay
- Department of Chemistry, University of Washington, Seattle, Washington 98195, United States
| | - Jie Zhang
- Department of Chemistry, University of Washington, Seattle, Washington 98195, United States
| | - Jeffrey S. Harrison
- Department of Chemistry, University of Washington, Seattle, Washington 98195, United States
| | - David S. Ginger
- Department of Chemistry, University of Washington, Seattle, Washington 98195, United States
| |
Collapse
|