1
|
Daina A, Zoete V. Testing the predictive power of reverse screening to infer drug targets, with the help of machine learning. Commun Chem 2024; 7:105. [PMID: 38724725 PMCID: PMC11082207 DOI: 10.1038/s42004-024-01179-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2023] [Accepted: 04/16/2024] [Indexed: 05/12/2024] Open
Abstract
Estimating protein targets of compounds based on the similarity principle-similar molecules are likely to show comparable bioactivity-is a long-standing strategy in drug research. Having previously quantified this principle, we present here a large-scale evaluation of its predictive power for inferring macromolecular targets by reverse screening an unprecedented vast external test set of more than 300,000 active small molecules against another bioactivity set of more than 500,000 compounds. We show that machine-learning can predict the correct targets, with the highest probability among 2069 proteins, for more than 51% of the external molecules. The strong enrichment thus obtained demonstrates its usefulness in supporting phenotypic screens, polypharmacology, or repurposing. Moreover, we quantified the impact of the bioactivity knowledge available for proteins in terms of number and diversity of actives. Finally, we advise that developers of such approaches follow an application-oriented benchmarking strategy and use large, high-quality, non-overlapping datasets as provided here.
Collapse
Affiliation(s)
- Antoine Daina
- Molecular Modeling Group, SIB Swiss Institute of Bioinformatics, CH-1015, Lausanne, Switzerland
| | - Vincent Zoete
- Molecular Modeling Group, SIB Swiss Institute of Bioinformatics, CH-1015, Lausanne, Switzerland.
- Computer-Aided Molecular Engineering, Department of Oncology UNIL-CHUV, Ludwig Institute for Cancer Research Lausanne Branch, University of Lausanne, Lausanne, Switzerland.
| |
Collapse
|
2
|
Mervin LH, Trapotsi MA, Afzal AM, Barrett IP, Bender A, Engkvist O. Probabilistic Random Forest improves bioactivity predictions close to the classification threshold by taking into account experimental uncertainty. J Cheminform 2021; 13:62. [PMID: 34412708 PMCID: PMC8375213 DOI: 10.1186/s13321-021-00539-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2021] [Accepted: 07/30/2021] [Indexed: 11/24/2022] Open
Abstract
Measurements of protein–ligand interactions have reproducibility limits due to experimental errors. Any model based on such assays will consequentially have such unavoidable errors influencing their performance which should ideally be factored into modelling and output predictions, such as the actual standard deviation of experimental measurements (σ) or the associated comparability of activity values between the aggregated heterogenous activity units (i.e., Ki versus IC50 values) during dataset assimilation. However, experimental errors are usually a neglected aspect of model generation. In order to improve upon the current state-of-the-art, we herein present a novel approach toward predicting protein–ligand interactions using a Probabilistic Random Forest (PRF) classifier. The PRF algorithm was applied toward in silico protein target prediction across ~ 550 tasks from ChEMBL and PubChem. Predictions were evaluated by taking into account various scenarios of experimental standard deviations in both training and test sets and performance was assessed using fivefold stratified shuffled splits for validation. The largest benefit in incorporating the experimental deviation in PRF was observed for data points close to the binary threshold boundary, when such information was not considered in any way in the original RF algorithm. For example, in cases when σ ranged between 0.4–0.6 log units and when ideal probability estimates between 0.4–0.6, the PRF outperformed RF with a median absolute error margin of ~ 17%. In comparison, the baseline RF outperformed PRF for cases with high confidence to belong to the active class (far from the binary decision threshold), although the RF models gave errors smaller than the experimental uncertainty, which could indicate that they were overtrained and/or over-confident. Finally, the PRF models trained with putative inactives decreased the performance compared to PRF models without putative inactives and this could be because putative inactives were not assigned an experimental pXC50 value, and therefore they were considered inactives with a low uncertainty (which in practice might not be true). In conclusion, PRF can be useful for target prediction models in particular for data where class boundaries overlap with the measurement uncertainty, and where a substantial part of the training data is located close to the classification threshold.
Collapse
Affiliation(s)
- Lewis H Mervin
- Molecular AI, Discovery Sciences, R&D, AstraZeneca, Cambridge, UK.
| | - Maria-Anna Trapotsi
- Department of Chemistry, Centre for Molecular Informatics, University of Cambridge, Lensfield Road, Cambridge, CB2 1EW, UK
| | - Avid M Afzal
- Data Sciences & Quantitative Biology, Discovery Sciences, R&D, AstraZeneca, Cambridge, UK
| | - Ian P Barrett
- Data Sciences & Quantitative Biology, Discovery Sciences, R&D, AstraZeneca, Cambridge, UK
| | - Andreas Bender
- Department of Chemistry, Centre for Molecular Informatics, University of Cambridge, Lensfield Road, Cambridge, CB2 1EW, UK
| | - Ola Engkvist
- Molecular AI, Discovery Sciences, R&D, AstraZeneca, Gothenburg, Sweden.,Department of Computer Science and Engineering, Chalmers University of Technology, Gothenburg, Sweden
| |
Collapse
|
3
|
Daina A, Michielin O, Zoete V. SwissTargetPrediction: updated data and new features for efficient prediction of protein targets of small molecules. Nucleic Acids Res 2020; 47:W357-W364. [PMID: 31106366 PMCID: PMC6602486 DOI: 10.1093/nar/gkz382] [Citation(s) in RCA: 1513] [Impact Index Per Article: 378.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2019] [Revised: 04/26/2019] [Accepted: 05/01/2019] [Indexed: 12/14/2022] Open
Abstract
SwissTargetPrediction is a web tool, on-line since 2014, that aims to predict the most probable protein targets of small molecules. Predictions are based on the similarity principle, through reverse screening. Here, we describe the 2019 version, which represents a major update in terms of underlying data, backend and web interface. The bioactivity data were updated, the model retrained and similarity thresholds redefined. In the new version, the predictions are performed by searching for similar molecules, in 2D and 3D, within a larger collection of 376 342 compounds known to be experimentally active on an extended set of 3068 macromolecular targets. An efficient backend implementation allows to speed up the process that returns results for a druglike molecule on human proteins in 15-20 s. The refreshed web interface enhances user experience with new features for easy input and improved analysis. Interoperability capacity enables straightforward submission of any input or output molecule to other on-line computer-aided drug design tools, developed by the SIB Swiss Institute of Bioinformatics. High levels of predictive performance were maintained despite more extended biological and chemical spaces to be explored, e.g. achieving at least one correct human target in the top 15 predictions for >70% of external compounds. The new SwissTargetPrediction is available free of charge (www.swisstargetprediction.ch).
Collapse
Affiliation(s)
- Antoine Daina
- Molecular Modeling Group, SIB Swiss Institute of Bioinformatics, University of Lausanne, Quartier UNIL-Sorge, Bâtiment Amphipôle, CH-1015 Lausanne, Switzerland
| | - Olivier Michielin
- Molecular Modeling Group, SIB Swiss Institute of Bioinformatics, University of Lausanne, Quartier UNIL-Sorge, Bâtiment Amphipôle, CH-1015 Lausanne, Switzerland.,Department of Oncology, University Hospital of Lausanne, Ludwig Cancer Research - Lausanne Branch, CH-1011 Lausanne, Switzerland
| | - Vincent Zoete
- Molecular Modeling Group, SIB Swiss Institute of Bioinformatics, University of Lausanne, Quartier UNIL-Sorge, Bâtiment Amphipôle, CH-1015 Lausanne, Switzerland.,Department of Fundamental Oncology, University of Lausanne, Ludwig Cancer Research - Lausanne Branch, Route de la Corniche 9A, CH-1066 Epalinges, Switzerland
| |
Collapse
|
4
|
Daina A, Zoete V. Application of the SwissDrugDesign Online Resources in Virtual Screening. Int J Mol Sci 2019; 20:ijms20184612. [PMID: 31540350 PMCID: PMC6770839 DOI: 10.3390/ijms20184612] [Citation(s) in RCA: 39] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2019] [Revised: 09/13/2019] [Accepted: 09/14/2019] [Indexed: 02/06/2023] Open
Abstract
SwissDrugDesign is an important initiative led by the Molecular Modeling Group of the SIB Swiss Institute of Bioinformatics. This project provides a collection of freely available online tools for computer-aided drug design. Some of these web-based methods, i.e., SwissSimilarity and SwissTargetPrediction, were especially developed to perform virtual screening, while others such as SwissADME, SwissDock, SwissParam and SwissBioisostere can find applications in related activities. The present review aims at providing a short description of these methods together with examples of their application in virtual screening, where SwissDrugDesign tools successfully supported the discovery of bioactive small molecules.
Collapse
Affiliation(s)
- Antoine Daina
- Molecular Modeling Group, SIB Swiss Institute of Bioinformatics, University of Lausanne, Quartier UNIL-Sorge, Bâtiment Amphipôle, CH-1015 Lausanne, Switzerland.
| | - Vincent Zoete
- Molecular Modeling Group, SIB Swiss Institute of Bioinformatics, University of Lausanne, Quartier UNIL-Sorge, Bâtiment Amphipôle, CH-1015 Lausanne, Switzerland.
- Department of Fundamental Oncology, University of Lausanne, Ludwig Lausanne Branch, Route de la Corniche 9A, CH-1066 Epalinges, Switzerland.
| |
Collapse
|
5
|
Mervin LH, Bulusu KC, Kalash L, Afzal AM, Svensson F, Firth MA, Barrett I, Engkvist O, Bender A. Orthologue chemical space and its influence on target prediction. Bioinformatics 2018; 34:72-79. [PMID: 28961699 PMCID: PMC5870859 DOI: 10.1093/bioinformatics/btx525] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2017] [Accepted: 08/25/2017] [Indexed: 01/05/2023] Open
Abstract
Motivation In silico approaches often fail to utilize bioactivity data available for orthologous targets due to insufficient evidence highlighting the benefit for such an approach. Deeper investigation into orthologue chemical space and its influence toward expanding compound and target coverage is necessary to improve the confidence in this practice. Results Here we present analysis of the orthologue chemical space in ChEMBL and PubChem and its impact on target prediction. We highlight the number of conflicting bioactivities between human and orthologues is low and annotations are overall compatible. Chemical space analysis shows orthologues are chemically dissimilar to human with high intra-group similarity, suggesting they could effectively extend the chemical space modelled. Based on these observations, we show the benefit of orthologue inclusion in terms of novel target coverage. We also benchmarked predictive models using a time-series split and also using bioactivities from Chemistry Connect and HTS data available at AstraZeneca, showing that orthologue bioactivity inclusion statistically improved performance. Availability and implementation Orthologue-based bioactivity prediction and the compound training set are available at www.github.com/lhm30/PIDGINv2. Contact ab454@cam.ac.uk. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Lewis H Mervin
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Cambridge, UK
| | - Krishna C Bulusu
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Cambridge, UK
- Oncology Innovative Medicines and Early Development, AstraZeneca, Cambridge, UK
| | - Leen Kalash
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Cambridge, UK
| | - Avid M Afzal
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Cambridge, UK
| | - Fredrik Svensson
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Cambridge, UK
| | - Mike A Firth
- Discovery Sciences, AstraZeneca R&D, Cambridge Science Park, Cambridge, UK
| | - Ian Barrett
- Discovery Sciences, AstraZeneca R&D, Cambridge Science Park, Cambridge, UK
| | - Ola Engkvist
- Discovery Sciences, AstraZeneca R&D Gothenburg, Mölndal, Sweden
| | - Andreas Bender
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Cambridge, UK
- To whom correspondence should be addressed.
| |
Collapse
|
6
|
Mervin LH, Cao Q, Barrett IP, Firth MA, Murray D, McWilliams L, Haddrick M, Wigglesworth M, Engkvist O, Bender A. Understanding Cytotoxicity and Cytostaticity in a High-Throughput Screening Collection. ACS Chem Biol 2016; 11:3007-3023. [PMID: 27571164 DOI: 10.1021/acschembio.6b00538] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
While mechanisms of cytotoxicity and cytostaticity have been studied extensively from the biological side, relatively little is currently understood regarding areas of chemical space leading to cytotoxicity and cytostasis in large compound collections. Predicting and rationalizing potential adverse mechanism-of-actions (MoAs) of small molecules is however crucial for screening library design, given the link of even low level cytotoxicity and adverse events observed in man. In this study, we analyzed results from a cell-based cytotoxicity screening cascade, comprising 296 970 nontoxic, 5784 cytotoxic and cytostatic, and 2327 cytostatic-only compounds evaluated on the THP-1 cell-line. We employed an in silico MoA analysis protocol, utilizing 9.5 million active and 602 million inactive bioactivity points to generate target predictions, annotate predicted targets with pathways, and calculate enrichment metrics to highlight targets and pathways. Predictions identify known mechanisms for the top ranking targets and pathways for both phenotypes after review and indicate that while processes involved in cytotoxicity versus cytostaticity seem to overlap, differences between both phenotypes seem to exist to some extent. Cytotoxic predictions highlight many kinases, including the potentially novel cytotoxicity-related target STK32C, while cytostatic predictions outline targets linked with response to DNA damage, metabolism, and cytoskeletal machinery. Fragment analysis was also employed to generate a library of toxicophores to improve general understanding of the chemical features driving toxicity. We highlight substructures with potential kinase-dependent and kinase-independent mechanisms of toxicity. We also trained a cytotoxic classification model on proprietary and public compound readouts, and prospectively validated these on 988 novel compounds comprising difficult and trivial testing instances, to establish the applicability domain of models. The proprietary model performed with precision and recall scores of 77.9% and 83.8%, respectively. The MoA results and top ranking substructures with accompanying MoA predictions are available as a platform to assess screening collections.
Collapse
Affiliation(s)
- Lewis H. Mervin
- Centre
for Molecular Informatics, Department of Chemistry, University of Cambridge, Cambridge, United Kingdom
| | - Qing Cao
- Discovery Sciences, AstraZeneca R&D, Waltham, United States
| | - Ian P. Barrett
- Discovery Sciences, AstraZeneca R&D, Cambridge Science Park, Cambridge, United Kingdom
| | - Mike A. Firth
- Discovery Sciences, AstraZeneca R&D, Cambridge Science Park, Cambridge, United Kingdom
| | - David Murray
- Discovery Sciences, AstraZeneca R&D, Alderley Park, Macclesfield, United Kingdom
| | - Lisa McWilliams
- Discovery Sciences, AstraZeneca R&D, Alderley Park, Macclesfield, United Kingdom
| | - Malcolm Haddrick
- Discovery Sciences, AstraZeneca R&D, Alderley Park, Macclesfield, United Kingdom
| | - Mark Wigglesworth
- Discovery Sciences, AstraZeneca R&D, Alderley Park, Macclesfield, United Kingdom
| | - Ola Engkvist
- Discovery Sciences, AstraZeneca R&D, Mölndal, Sweden
| | - Andreas Bender
- Centre
for Molecular Informatics, Department of Chemistry, University of Cambridge, Cambridge, United Kingdom
| |
Collapse
|
7
|
Liu X, Baarsma H, Thiam C, Montrone C, Brauner B, Fobo G, Heier JS, Duscha S, Königshoff M, Angeli V, Ruepp A, Campillos M. Systematic Identification of Pharmacological Targets from Small-Molecule Phenotypic Screens. Cell Chem Biol 2016; 23:1302-1313. [DOI: 10.1016/j.chembiol.2016.08.011] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2016] [Revised: 06/10/2016] [Accepted: 08/05/2016] [Indexed: 01/29/2023]
|
8
|
The SIB Swiss Institute of Bioinformatics' resources: focus on curated databases. Nucleic Acids Res 2015; 44:D27-37. [PMID: 26615188 PMCID: PMC4702916 DOI: 10.1093/nar/gkv1310] [Citation(s) in RCA: 46] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2015] [Accepted: 11/09/2015] [Indexed: 12/15/2022] Open
Abstract
The SIB Swiss Institute of Bioinformatics (www.isb-sib.ch) provides world-class bioinformatics databases, software tools, services and training to the international life science community in academia and industry. These solutions allow life scientists to turn the exponentially growing amount of data into knowledge. Here, we provide an overview of SIB's resources and competence areas, with a strong focus on curated databases and SIB's most popular and widely used resources. In particular, SIB's Bioinformatics resource portal ExPASy features over 150 resources, including UniProtKB/Swiss-Prot, ENZYME, PROSITE, neXtProt, STRING, UniCarbKB, SugarBindDB, SwissRegulon, EPD, arrayMap, Bgee, SWISS-MODEL Repository, OMA, OrthoDB and other databases, which are briefly described in this article.
Collapse
|