1
|
Abbas MKG, Rassam A, Karamshahi F, Abunora R, Abouseada M. The Role of AI in Drug Discovery. Chembiochem 2024; 25:e202300816. [PMID: 38735845 DOI: 10.1002/cbic.202300816] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2023] [Revised: 05/09/2024] [Accepted: 05/10/2024] [Indexed: 05/14/2024]
Abstract
The emergence of Artificial Intelligence (AI) in drug discovery marks a pivotal shift in pharmaceutical research, blending sophisticated computational techniques with conventional scientific exploration to break through enduring obstacles. This review paper elucidates the multifaceted applications of AI across various stages of drug development, highlighting significant advancements and methodologies. It delves into AI's instrumental role in drug design, polypharmacology, chemical synthesis, drug repurposing, and the prediction of drug properties such as toxicity, bioactivity, and physicochemical characteristics. Despite AI's promising advancements, the paper also addresses the challenges and limitations encountered in the field, including data quality, generalizability, computational demands, and ethical considerations. By offering a comprehensive overview of AI's role in drug discovery, this paper underscores the technology's potential to significantly enhance drug development, while also acknowledging the hurdles that must be overcome to fully realize its benefits.
Collapse
Affiliation(s)
- M K G Abbas
- Center for Advanced Materials, Qatar University, P.O. Box, 2713, Doha, Qatar
| | - Abrar Rassam
- Secondary Education, Educational Sciences, Qatar University, P.O. Box, 2713, Doha, Qatar
| | - Fatima Karamshahi
- Department of Chemistry and Earth Sciences, Qatar University, P.O. Box, 2713, Doha, Qatar
| | - Rehab Abunora
- Faculty of Medicine, General Medicine and Surgery, Helwan University, Cairo, Egypt
| | - Maha Abouseada
- Department of Chemistry and Earth Sciences, Qatar University, P.O. Box, 2713, Doha, Qatar
| |
Collapse
|
2
|
Tan L, Hirte S, Palmacci V, Stork C, Kirchmair J. Tackling assay interference associated with small molecules. Nat Rev Chem 2024; 8:319-339. [PMID: 38622244 DOI: 10.1038/s41570-024-00593-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/29/2024] [Indexed: 04/17/2024]
Abstract
Biochemical and cell-based assays are essential to discovering and optimizing efficacious and safe drugs, agrochemicals and cosmetics. However, false assay readouts stemming from colloidal aggregation, chemical reactivity, chelation, light signal attenuation and emission, membrane disruption, and other interference mechanisms remain a considerable challenge in screening synthetic compounds and natural products. To address assay interference, a range of powerful experimental approaches are available and in silico methods are now gaining traction. This Review begins with an overview of the scope and limitations of experimental approaches for tackling assay interference. It then focuses on theoretical methods, discusses strategies for their integration with experimental approaches, and provides recommendations for best practices. The Review closes with a summary of the critical facts and an outlook on potential future developments.
Collapse
Affiliation(s)
- Lu Tan
- Drug Discovery Sciences, Boehringer Ingelheim RCV GmbH & Co KG, Vienna, Austria
| | - Steffen Hirte
- Department of Pharmaceutical Sciences, Division of Pharmaceutical Chemistry, Faculty of Life Sciences, University of Vienna, Vienna, Austria
- Vienna Doctoral School of Pharmaceutical, Nutritional and Sport Sciences (PhaNuSpo), University of Vienna, Vienna, Austria
| | - Vincenzo Palmacci
- Department of Pharmaceutical Sciences, Division of Pharmaceutical Chemistry, Faculty of Life Sciences, University of Vienna, Vienna, Austria
- Vienna Doctoral School of Pharmaceutical, Nutritional and Sport Sciences (PhaNuSpo), University of Vienna, Vienna, Austria
| | - Conrad Stork
- Department of Informatics, Center for Bioinformatics, Faculty of Mathematics, Informatics and Natural Sciences, Universität Hamburg, Hamburg, Germany
- BASF SE, Ludwigshafen am Rhein, Germany
| | - Johannes Kirchmair
- Department of Pharmaceutical Sciences, Division of Pharmaceutical Chemistry, Faculty of Life Sciences, University of Vienna, Vienna, Austria.
- Christian Doppler Laboratory for Molecular Informatics in the Biosciences, Department for Pharmaceutical Sciences, University of Vienna, Vienna, Austria.
| |
Collapse
|
3
|
Boldini D, Friedrich L, Kuhn D, Sieber SA. Machine Learning Assisted Hit Prioritization for High Throughput Screening in Drug Discovery. ACS CENTRAL SCIENCE 2024; 10:823-832. [PMID: 38680560 PMCID: PMC11046457 DOI: 10.1021/acscentsci.3c01517] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/07/2023] [Revised: 03/01/2024] [Accepted: 03/01/2024] [Indexed: 05/01/2024]
Abstract
Efficient prioritization of bioactive compounds from high throughput screening campaigns is a fundamental challenge for accelerating drug development efforts. In this study, we present the first data-driven approach to simultaneously detect assay interferents and prioritize true bioactive compounds. By analyzing the learning dynamics during training of a gradient boosting model on noisy high throughput screening data using a novel formulation of sample influence, we are able to distinguish between compounds exhibiting the desired biological response and those producing assay artifacts. Therefore, our method enables false positive and true positive detection without relying on prior screens or assay interference mechanisms, making it applicable to any high throughput screening campaign. We demonstrate that our approach consistently excludes assay interferents with different mechanisms and prioritizes biologically relevant compounds more efficiently than all tested baselines, including a retrospective case study simulating its use in a real drug discovery campaign. Finally, our tool is extremely computationally efficient, requiring less than 30 s per assay on low-resource hardware. As such, our findings show that our method is an ideal addition to existing false positive detection tools and can be used to guide further pharmacological optimization after high throughput screening campaigns.
Collapse
Affiliation(s)
- Davide Boldini
- TUM
School of Natural Sciences, Department of Bioscience, Center for Functional
Protein Assemblies (CPA), Technical University
of Munich, 85748 Garching bei München, Germany
| | - Lukas Friedrich
- The
Healthcare business of Merck KGaA, 64293 Darmstadt, Germany
| | - Daniel Kuhn
- The
Healthcare business of Merck KGaA, 64293 Darmstadt, Germany
| | - Stephan A. Sieber
- TUM
School of Natural Sciences, Department of Bioscience, Center for Functional
Protein Assemblies (CPA), Technical University
of Munich, 85748 Garching bei München, Germany
| |
Collapse
|
4
|
Nesabi A, Kalayan J, Al-Rawashdeh S, Ghattas MA, Bryce RA. Molecular dynamics simulations as a guide for modulating small molecule aggregation. J Comput Aided Mol Des 2024; 38:11. [PMID: 38470532 PMCID: PMC10933209 DOI: 10.1007/s10822-024-00557-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2024] [Accepted: 02/29/2024] [Indexed: 03/14/2024]
Abstract
Small colloidally aggregating molecules (SCAMs) can be problematic for biological assays in drug discovery campaigns. However, the self-associating properties of SCAMs have potential applications in drug delivery and analytical biochemistry. Consequently, the ability to predict the aggregation propensity of a small organic molecule is of considerable interest. Chemoinformatics-based filters such as ChemAGG and Aggregator Advisor offer rapid assessment but are limited by the assay quality and structural diversity of their training set data. Complementary to these tools, we explore here the ability of molecular dynamics (MD) simulations as a physics-based method capable of predicting the aggregation propensity of diverse chemical structures. For a set of 32 molecules, using simulations of 100 ns in explicit solvent, we find a success rate of 97% (one molecule misclassified) as opposed to 75% by Aggregator Advisor and 72% by ChemAGG. These short timescale MD simulations are representative of longer microsecond trajectories and yield an informative spectrum of aggregation propensities across the set of solutes, capturing the dynamic behaviour of weakly aggregating compounds. Implicit solvent simulations using the generalized Born model were less successful in predicting aggregation propensity. MD simulations were also performed to explore structure-aggregation relationships for selected molecules, identifying chemical modifications that reversed the predicted behaviour of a given aggregator/non-aggregator compound. While lower throughput than rapid cheminformatics-based SCAM filters, MD-based prediction of aggregation has potential to be deployed on the scale of focused subsets of moderate size, and, depending on the target application, provide guidance on removing or optimizing a compound's aggregation propensity.
Collapse
Affiliation(s)
- Azam Nesabi
- Division of Pharmacy and Optometry, School of Health Sciences, Manchester Academic Health Sciences Centre, University of Manchester, Oxford Road, Manchester, M13 9PL, UK
| | - Jas Kalayan
- Daresbury Laboratory, Science and Technologies Facilities Council (STFC), Keckwick Lane, Daresbury, Warrington, WA4 4AD, UK
| | - Sara Al-Rawashdeh
- Division of Pharmacy and Optometry, School of Health Sciences, Manchester Academic Health Sciences Centre, University of Manchester, Oxford Road, Manchester, M13 9PL, UK
| | | | - Richard A Bryce
- Division of Pharmacy and Optometry, School of Health Sciences, Manchester Academic Health Sciences Centre, University of Manchester, Oxford Road, Manchester, M13 9PL, UK.
| |
Collapse
|
5
|
Aldahish A, Balaji P, Vasudevan R, Kandasamy G, James JP, Prabahar K. Elucidating the Potential Inhibitor against Type 2 Diabetes Mellitus Associated Gene of GLUT4. J Pers Med 2023; 13:jpm13040660. [PMID: 37109046 PMCID: PMC10146764 DOI: 10.3390/jpm13040660] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2023] [Revised: 04/02/2023] [Accepted: 04/10/2023] [Indexed: 04/29/2023] Open
Abstract
Diabetes is a chronic hyperglycemic disorder that leads to a group of metabolic diseases. This condition of chronic hyperglycemia is caused by abnormal insulin levels. The impact of hyperglycemia on the human vascular tree is the leading cause of disease and death in type 1 and type 2 diabetes. People with type 2 diabetes mellitus (T2DM) have abnormal secretion as well as the action of insulin. Type 2 (non-insulin-dependent) diabetes is caused by a combination of genetic factors associated with decreased insulin production, insulin resistance, and environmental conditions. These conditions include overeating, lack of exercise, obesity, and aging. Glucose transport limits the rate of dietary glucose used by fat and muscle. The glucose transporter GLUT4 is kept intracellular and sorted dynamically, and GLUT4 translocation or insulin-regulated vesicular traffic distributes it to the plasma membrane. Different chemical compounds have antidiabetic properties. The complexity, metabolism, digestion, and interaction of these chemical compounds make it difficult to understand and apply them to reduce chronic inflammation and thus prevent chronic disease. In this study, we have applied a virtual screening approach to screen the most suitable and drug-able chemical compounds to be used as potential drug targets against T2DM. We have found that out of 5000 chemical compounds that we have analyzed, only two are known to be more effective as per our experiments based upon molecular docking studies and virtual screening through Lipinski's rule and ADMET properties.
Collapse
Affiliation(s)
- Afaf Aldahish
- Department of Pharmacology, College of Pharmacy, King Khalid University, Abha 61421, Saudi Arabia
| | | | - Rajalakshimi Vasudevan
- Department of Pharmacology, College of Pharmacy, King Khalid University, Abha 61421, Saudi Arabia
| | - Geetha Kandasamy
- Department of Clinical Pharmacy, College of Pharmacy, King Khalid University, Abha 62529, Saudi Arabia
| | - Jainey P James
- Department of Pharmaceutical Chemistry, NGSM Institute of Pharmaceutical Sciences (NGSMIPS), Nitte (Deemed to be University), Deralakatte, Mangaluru 575018, Karnataka, India
| | - Kousalya Prabahar
- Department of Pharmacy Practice, Faculty of Pharmacy, University of Tabuk, Tabuk 71491, Saudi Arabia
| |
Collapse
|
6
|
James JP, Devaraji V, Sasidharan P, T. S. P. Pharmacophore Modeling, 3D QSAR, Molecular Dynamics Studies and Virtual Screening on Pyrazolopyrimidines as anti-Breast Cancer Agents. Polycycl Aromat Compd 2022. [DOI: 10.1080/10406638.2022.2135545] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Affiliation(s)
- Jainey P. James
- Department of Pharmaceutical Chemistry, Nitte (Deemed to Be University), NGSM Institute of Pharmaceutical Sciences (NGSMIPS), Deralakatte, India
| | - Vinod Devaraji
- Computational Drug Design Lab, School of Bio Sciences and Technology, Vellore Institute of Technology, Vellore, India
| | - Pradija Sasidharan
- Department of Pharmaceutical Chemistry, Nitte (Deemed to Be University), NGSM Institute of Pharmaceutical Sciences (NGSMIPS), Deralakatte, India
| | - Pavan T. S.
- Department of Pharmaceutical Chemistry, Nitte (Deemed to Be University), NGSM Institute of Pharmaceutical Sciences (NGSMIPS), Deralakatte, India
| |
Collapse
|
7
|
Kim S, Chen J, Cheng T, Gindulyte A, He J, He S, Li Q, Shoemaker BA, Thiessen PA, Yu B, Zaslavsky L, Zhang J, Bolton EE. PubChem 2023 update. Nucleic Acids Res 2022; 51:D1373-D1380. [PMID: 36305812 PMCID: PMC9825602 DOI: 10.1093/nar/gkac956] [Citation(s) in RCA: 593] [Impact Index Per Article: 296.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2022] [Revised: 10/06/2022] [Accepted: 10/13/2022] [Indexed: 01/30/2023] Open
Abstract
PubChem (https://pubchem.ncbi.nlm.nih.gov) is a popular chemical information resource that serves a wide range of use cases. In the past two years, a number of changes were made to PubChem. Data from more than 120 data sources was added to PubChem. Some major highlights include: the integration of Google Patents data into PubChem, which greatly expanded the coverage of the PubChem Patent data collection; the creation of the Cell Line and Taxonomy data collections, which provide quick and easy access to chemical information for a given cell line and taxon, respectively; and the update of the bioassay data model. In addition, new functionalities were added to the PubChem programmatic access protocols, PUG-REST and PUG-View, including support for target-centric data download for a given protein, gene, pathway, cell line, and taxon and the addition of the 'standardize' option to PUG-REST, which returns the standardized form of an input chemical structure. A significant update was also made to PubChemRDF. The present paper provides an overview of these changes.
Collapse
Affiliation(s)
- Sunghwan Kim
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, Bethesda, MD, 20894, USA
| | - Jie Chen
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, Bethesda, MD, 20894, USA
| | - Tiejun Cheng
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, Bethesda, MD, 20894, USA
| | - Asta Gindulyte
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, Bethesda, MD, 20894, USA
| | - Jia He
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, Bethesda, MD, 20894, USA
| | - Siqian He
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, Bethesda, MD, 20894, USA
| | - Qingliang Li
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, Bethesda, MD, 20894, USA
| | - Benjamin A Shoemaker
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, Bethesda, MD, 20894, USA
| | - Paul A Thiessen
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, Bethesda, MD, 20894, USA
| | - Bo Yu
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, Bethesda, MD, 20894, USA
| | - Leonid Zaslavsky
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, Bethesda, MD, 20894, USA
| | - Jian Zhang
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, Bethesda, MD, 20894, USA
| | - Evan E Bolton
- To whom correspondence should be addressed. Tel: +1 301 451 1811; Fax: +1 301 480 4559;
| |
Collapse
|
8
|
Molina C, Ait-Ouarab L, Minoux H. Isometric Stratified Ensembles: A Partial and Incremental Adaptive Applicability Domain and Consensus-Based Classification Strategy for Highly Imbalanced Data Sets with Application to Colloidal Aggregation. J Chem Inf Model 2022; 62:1849-1862. [PMID: 35357194 DOI: 10.1021/acs.jcim.2c00293] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Partial and incremental stratification analysis of a quantitative structure-interference relationship (QSIR) is a novel strategy intended to categorize classification provided by machine learning techniques. It is based on a 2D mapping of classification statistics onto two categorical axes: the degree of consensus and level of applicability domain. An internal cross-validation set allows to determine the statistical performance of the ensemble at every 2D map stratum and hence to define isometric local performance regions with the aim of better hit ranking and selection. During training, isometric stratified ensembles (ISE) applies a recursive decorrelated variable selection and considers the cardinal ratio of classes to balance training sets and thus avoid bias due to possible class imbalance. To exemplify the interest of this strategy, three different highly imbalanced PubChem pairs of AmpC β-lactamase and cruzain inhibition assay campaigns of colloidal aggregators and complementary aggregators data set available at the AGGREGATOR ADVISOR predictor web page were employed. Statistics obtained using this new strategy show outperforming results compared to former published tools, with and without a classical applicability domain. ISE performance on classifying colloidal aggregators shows from a global AUC of 0.82, when the whole test data set is considered, up to a maximum AUC of 0.88, when its highest confidence isometric stratum is retained.
Collapse
Affiliation(s)
- Christophe Molina
- PIKAÏROS S.A., B03 - 2 Allée de la Clairière, 31650 Saint Orens de Gameville, France
| | - Lilia Ait-Ouarab
- AMOA Ingénierie, INFOGENE S.A., 19, rue d'Orleans, 92200 Neuilly-sur-Seine, France
| | - Hervé Minoux
- Data and Data Science, SANOFI R&D, 91380 Chilly-Mazarin, France
| |
Collapse
|
9
|
Sun J, Zhong H, Wang K, Li N, Chen L. Gains from no real PAINS: Where 'Fair Trial Strategy' stands in the development of multi-target ligands. Acta Pharm Sin B 2021; 11:3417-3432. [PMID: 34900527 PMCID: PMC8642439 DOI: 10.1016/j.apsb.2021.02.023] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2020] [Revised: 02/15/2021] [Accepted: 02/25/2021] [Indexed: 12/26/2022] Open
Abstract
Compounds that selectively modulate multiple targets can provide clinical benefits and are an alternative to traditional highly selective agents for unique targets. High-throughput screening (HTS) for multitarget-directed ligands (MTDLs) using approved drugs, and fragment-based drug design has become a regular strategy to achieve an ideal multitarget combination. However, the unexpected presence of pan-assay interference compounds (PAINS) suspects in the development of MTDLs frequently results in nonspecific interactions or other undesirable effects leading to artefacts or false-positive data of biological assays. Publicly available filters can help to identify PAINS suspects; however, these filters cannot comprehensively conclude whether these suspects are "bad" or innocent. Additionally, these in silico approaches may inappropriately label a ligand as PAINS. More than 80% of the initial hits can be identified as PAINS by the filters if appropriate biochemical tests are not used resulting in false positive data that are unacceptable for medicinal chemists in manuscript peer review and future studies. Therefore, extensive offline experiments should be used after online filtering to discriminate "bad" PAINS and avoid incorrect evaluation of good scaffolds. We suggest that the use of "Fair Trial Strategy" to identify interesting molecules in PAINS suspects to provide certain structure‒function insight in MTDL development.
Collapse
Key Words
- AD, Alzheimer disease
- ALARM NMR, a La assay to detect reactive molecules by nuclear magnetic resonance
- Biochemical experiment
- CADD, computer-aided drug design technology
- CoA, coenzyme A
- EGFR, epidermal growth factor receptor
- Fair trial strategy
- GSH, glutathione
- HER2, human epidermal growth factor receptor 2
- HTS, high-throughput screening
- In silico filtering
- LC−MS, liquid chromatography−mass spectrometry
- MTDLs, multitarget-directed ligands
- Multitarget-directed ligands
- PAINS suspects
- PAINS, pan-assay interference compounds
- QSAR, quantitative structure–activity relationship
- ROS, radicals and oxygen reactive species
Collapse
Affiliation(s)
- Jianbo Sun
- State Key Laboratory of Natural Medicines, Department of Natural Medicinal Chemistry, School of Traditional Chinese Pharmacy, China Pharmaceutical University, Nanjing 210009, China
| | - Hui Zhong
- Department of Pharmacology of Traditional Chinese Medicine, School of Traditional Chinese Pharmacy, China Pharmaceutical University, Nanjing 210009, China
| | - Kun Wang
- State Key Laboratory of Natural Medicines, Department of Natural Medicinal Chemistry, School of Traditional Chinese Pharmacy, China Pharmaceutical University, Nanjing 210009, China
| | - Na Li
- State Key Laboratory of Natural Medicines, Department of Natural Medicinal Chemistry, School of Traditional Chinese Pharmacy, China Pharmaceutical University, Nanjing 210009, China
| | - Li Chen
- State Key Laboratory of Natural Medicines, Department of Natural Medicinal Chemistry, School of Traditional Chinese Pharmacy, China Pharmaceutical University, Nanjing 210009, China
| |
Collapse
|
10
|
Ghosh D, Koch U, Hadian K, Sattler M, Tetko IV. Highly Accurate Filters to Flag Frequent Hitters in AlphaScreen Assays by Suggesting their Mechanism. Mol Inform 2021; 41:e2100151. [PMID: 34676998 DOI: 10.1002/minf.202100151] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2021] [Accepted: 09/29/2021] [Indexed: 11/06/2022]
Abstract
AlphaScreen is one of the most widely used assay technologies in drug discovery due to its versatility, dynamic range and sensitivity. However, a presence of false positives and frequent hitters contributes to difficulties with an interpretation of measured HTS data. Although filters do exist to identify frequent hitters for AlphaScreen, they are frequently based on privileged scaffolds. The development of such filters is time consuming and requires deep domain knowledge. Recently, machine learning and artificial intelligence methods are emerging as important tools to advance drug discovery and chemoinformatics, including their application to identification of frequent hitters in screening assays. However, the relative performance and complementarity of the Machine Learning and scaffold-based techniques has not yet been comprehensively compared. In this study, we analysed filters based on the privileged scaffolds with filters built using machine learning. Our results demonstrate that machine-learning methods provide more accurate filters for identification of frequent hitters in AlphaScreen assays than scaffold-based methods and can be easily redeveloped once new data are measured. We present highly accurate models to identify frequent hitters in AlphaScreen assays.
Collapse
Affiliation(s)
- Dipan Ghosh
- Lead Discovery Center GmbH, Otto-Hahn-Straße 15, 44227, Dortmund, Germany
| | - Uwe Koch
- Lead Discovery Center GmbH, Otto-Hahn-Straße 15, 44227, Dortmund, Germany
| | - Kamyar Hadian
- Assay Development and Screening Platform, Helmholtz Zentrum München - German Research Center for Environmental Health (GmbH), Ingolstädter Landstraße 1, D-85764, Neuherberg, Germany
| | - Michael Sattler
- Bavarian NMR Center, Department Chemie, Technische Universität München, Ernst-Otto-Fischerstraße 2, D-85747, Garching, Germany.,Institute of Structural Biology, Helmholtz Zentrum München - German Research Center for Environmental Health (GmbH), Ingolstädter Landstraße 1, D-85764, Neuherberg, Germany
| | - Igor V Tetko
- Institute of Structural Biology, Helmholtz Zentrum München - German Research Center for Environmental Health (GmbH), Ingolstädter Landstraße 1, D-85764, Neuherberg, Germany.,G.A. Krestov Institute of Solution Chemistry of the Russian Academy of Sciences, Akademicheskaya Street 1, 153045, Ivanovo, Russia.,BIGCHEM GmbH, Valerystr. 49, D-85716, Unterschleißheim, Germany
| |
Collapse
|
11
|
Plonka W, Stork C, Šícho M, Kirchmair J. CYPlebrity: Machine learning models for the prediction of inhibitors of cytochrome P450 enzymes. Bioorg Med Chem 2021; 46:116388. [PMID: 34488021 DOI: 10.1016/j.bmc.2021.116388] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2021] [Revised: 08/19/2021] [Accepted: 08/24/2021] [Indexed: 10/20/2022]
Abstract
The vast majority of approved drugs are metabolized by the five major cytochrome P450 (CYP) isozymes, 1A2, 2C9, 2C19, 2D6 and 3A4. Inhibition of CYP isozymes can cause drug-drug interactions with severe pharmacological and toxicological consequences. Computational methods for the fast and reliable prediction of the inhibition of CYP isozymes by small molecules are therefore of high interest and relevance to pharmaceutical companies and a host of other industries, including the cosmetics and agrochemical industries. Today, a large number of machine learning models for predicting the inhibition of the major CYP isozymes by small molecules are available. With this work we aim to go beyond the coverage of existing models, by combining data from several major public and proprietary sources. More specifically, we used up to 18815 compounds with measured bioactivities to train random forest classification models for the individual CYP isozymes. A major advantage of the new data collection over existing ones is the better representation of the minority class, the CYP inhibitors. With the new data collection we achieved inhibitor-to-non-inhibitor ratios in the order of 1:1 (CYP1A2) to 1:3 (CYP2D6). We show that our models reach competitive performance on external data, with Matthews correlation coefficients (MCCs) ranging from 0.62 (CYP2C19) to 0.70 (CYP2D6), and areas under the receiver operating characteristic curve (AUCs) between 0.89 (CYP2C19) and 0.92 (CYPs 2D6 and 3A4). Importantly, the models show a high level of robustness, reflected in a good predictivity also for compounds that are structurally dissimilar to the compounds represented in the training data. The best models presented in this work are freely accessible for academic research via a web service.
Collapse
Affiliation(s)
- Wojciech Plonka
- Universität Hamburg, Center for Bioinformatics (ZBH), Hamburg, Bundesstr. 43, 20146, Germany; FQS Poland (Fujitsu Group), Parkowa 11, 30-538 Cracow, Poland
| | - Conrad Stork
- Universität Hamburg, Center for Bioinformatics (ZBH), Hamburg, Bundesstr. 43, 20146, Germany
| | - Martin Šícho
- CZ-OPENSCREEN: National Infrastructure for Chemical Biology, Department of Informatics and Chemistry, Faculty of Chemical Technology, University of Chemistry and Technology Prague, Technická 5, 166 28, Prague, Czech Republic
| | - Johannes Kirchmair
- Universität Hamburg, Center for Bioinformatics (ZBH), Hamburg, Bundesstr. 43, 20146, Germany; Department of Pharmaceutical Sciences, Division of Pharmaceutical Chemistry, Faculty of Life Sciences, University of Vienna, Althanstr. 14, 1090 Vienna, Austria.
| |
Collapse
|
12
|
Holmer M, de Bruyn Kops C, Stork C, Kirchmair J. CYPstrate: A Set of Machine Learning Models for the Accurate Classification of Cytochrome P450 Enzyme Substrates and Non-Substrates. Molecules 2021; 26:molecules26154678. [PMID: 34361831 PMCID: PMC8347321 DOI: 10.3390/molecules26154678] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2021] [Revised: 07/23/2021] [Accepted: 07/26/2021] [Indexed: 11/16/2022] Open
Abstract
The interaction of small organic molecules such as drugs, agrochemicals, and cosmetics with cytochrome P450 enzymes (CYPs) can lead to substantial changes in the bioavailability of active substances and hence consequences with respect to pharmacological efficacy and toxicity. Therefore, efficient means of predicting the interactions of small organic molecules with CYPs are of high importance to a host of different industries. In this work, we present a new set of machine learning models for the classification of xenobiotics into substrates and non-substrates of nine human CYP isozymes: CYPs 1A2, 2A6, 2B6, 2C8, 2C9, 2C19, 2D6, 2E1, and 3A4. The models are trained on an extended, high-quality collection of known substrates and non-substrates and have been subjected to thorough validation. Our results show that the models yield competitive performance and are favorable for the detection of CYP substrates. In particular, a new consensus model reached high performance, with Matthews correlation coefficients (MCCs) between 0.45 (CYP2C8) and 0.85 (CYP3A4), although at the cost of coverage. The best models presented in this work are accessible free of charge via the "CYPstrate" module of the New E-Resource for Drug Discovery (NERDD).
Collapse
Affiliation(s)
- Malte Holmer
- Center for Bioinformatics (ZBH), Department of Informatics, Universität Hamburg, 20146 Hamburg, Germany; (M.H.); (C.d.B.K.); (C.S.)
| | - Christina de Bruyn Kops
- Center for Bioinformatics (ZBH), Department of Informatics, Universität Hamburg, 20146 Hamburg, Germany; (M.H.); (C.d.B.K.); (C.S.)
| | - Conrad Stork
- Center for Bioinformatics (ZBH), Department of Informatics, Universität Hamburg, 20146 Hamburg, Germany; (M.H.); (C.d.B.K.); (C.S.)
| | - Johannes Kirchmair
- Center for Bioinformatics (ZBH), Department of Informatics, Universität Hamburg, 20146 Hamburg, Germany; (M.H.); (C.d.B.K.); (C.S.)
- Division of Pharmaceutical Chemistry, Department of Pharmaceutical Sciences, University of Vienna, 1090 Vienna, Austria
- Correspondence:
| |
Collapse
|
13
|
Borah P, Hazarika S, Deka S, Venugopala KN, Nair AB, Attimarad M, Sreeharsha N, Mailavaram RP. Application of Advanced Technologies in Natural Product Research: A Review with Special Emphasis on ADMET Profiling. Curr Drug Metab 2020; 21:751-767. [PMID: 32664837 DOI: 10.2174/1389200221666200714144911] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2020] [Revised: 05/12/2020] [Accepted: 06/17/2020] [Indexed: 12/14/2022]
Abstract
The successful conversion of natural products (NPs) into lead compounds and novel pharmacophores has emboldened the researchers to harness the drug discovery process with a lot more enthusiasm. However, forfeit of bioactive NPs resulting from an overabundance of metabolites and their wide dynamic range have created the bottleneck in NP researches. Similarly, the existence of multidimensional challenges, including the evaluation of pharmacokinetics, pharmacodynamics, and safety parameters, has been a concerning issue. Advancement of technology has brought the evolution of traditional natural product researches into the computer-based assessment exhibiting pretentious remarks about their efficiency in drug discovery. The early attention to the quality of the NPs may reduce the attrition rate of drug candidates by parallel assessment of ADMET profiling. This article reviews the status, challenges, opportunities, and integration of advanced technologies in natural product research. Indeed, emphasis will be laid on the current and futuristic direction towards the application of newer technologies in early-stage ADMET profiling of bioactive moieties from the natural sources. It can be expected that combinatorial approaches in ADMET profiling will fortify the natural product-based drug discovery in the near future.
Collapse
Affiliation(s)
- Pobitra Borah
- Pratiksha Institute of Pharmaceutical Sciences, Chandrapur Road, Panikhaiti, Guwahati-26, Assam, India
| | - Sangeeta Hazarika
- Department of Pharmaceutical Engineering & Technology, Indian Institute of Technology (Banaras Hindu University), Varanasi, Uttar Pradesh-221005, India
| | - Satyendra Deka
- Pratiksha Institute of Pharmaceutical Sciences, Chandrapur Road, Panikhaiti, Guwahati-26, Assam, India
| | - Katharigatta N Venugopala
- Department of Pharmaceutical Sciences, College of Clinical Pharmacy, King Faisal University, Al-Ahsa-31982, Saudi Arabia
| | - Anroop B Nair
- Department of Pharmaceutical Sciences, College of Clinical Pharmacy, King Faisal University, Al-Ahsa-31982, Saudi Arabia
| | - Mahesh Attimarad
- Department of Pharmaceutical Sciences, College of Clinical Pharmacy, King Faisal University, Al-Ahsa-31982, Saudi Arabia
| | - Nagaraja Sreeharsha
- Department of Pharmaceutical Sciences, College of Clinical Pharmacy, King Faisal University, Al-Ahsa-31982, Saudi Arabia
| | - Raghu P Mailavaram
- Department of Pharmaceutical Chemistry, Shri Vishnu College of Pharmacy, Vishnupur (Affiliated to Andhra University), Bhimavaram, W.G. Dist., Andhra Pradesh, India
| |
Collapse
|
14
|
Alves VM, Capuzzi SJ, Braga RC, Korn D, Hochuli JE, Bowler KH, Yasgar A, Rai G, Simeonov A, Muratov EN, Zakharov AV, Tropsha A. SCAM Detective: Accurate Predictor of Small, Colloidally Aggregating Molecules. J Chem Inf Model 2020; 60:4056-4063. [PMID: 32678597 DOI: 10.1021/acs.jcim.0c00415] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Small, colloidally aggregating molecules (SCAMs) are the most common source of false positives in high-throughput screening (HTS) campaigns. Although SCAMs can be experimentally detected and suppressed by the addition of detergent in the assay buffer, detergent sensitivity is not routinely monitored in HTS. Computational methods are thus needed to flag potential SCAMs during HTS triage. In this study, we have developed and rigorously validated quantitative structure-interference relationship (QSIR) models of detergent-sensitive aggregation in several HTS campaigns under various assay conditions and screening concentrations. In particular, we have modeled detergent-sensitive aggregation in an AmpC β-lactamase assay, the preferred HTS counter-screen for aggregation, as well as in another assay that measures cruzain inhibition. Our models increase the accuracy of aggregation prediction by ∼53% in the β-lactamase assay and by ∼46% in the cruzain assay compared to previously published methods. We also discuss the importance of both assay conditions and screening concentrations in the development of QSIR models for various interference mechanisms besides aggregation. The models developed in this study are publicly available for fast prediction within the SCAM detective web application (https://scamdetective.mml.unc.edu/).
Collapse
Affiliation(s)
- Vinicius M Alves
- Laboratory for Molecular Modeling, Division of Chemical Biology and Medicinal Chemistry, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, North Carolina 27599, United States
| | - Stephen J Capuzzi
- Laboratory for Molecular Modeling, Division of Chemical Biology and Medicinal Chemistry, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, North Carolina 27599, United States
| | | | - Daniel Korn
- Laboratory for Molecular Modeling, Division of Chemical Biology and Medicinal Chemistry, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, North Carolina 27599, United States
| | - Joshua E Hochuli
- Laboratory for Molecular Modeling, Division of Chemical Biology and Medicinal Chemistry, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, North Carolina 27599, United States
| | - Kyle H Bowler
- Laboratory for Molecular Modeling, Division of Chemical Biology and Medicinal Chemistry, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, North Carolina 27599, United States
| | - Adam Yasgar
- National Center for Advancing Translational Sciences (NCATS), National Institutes of Health, 9800 Medical Center Drive, Rockville, Maryland 20850, United States
| | - Ganesha Rai
- National Center for Advancing Translational Sciences (NCATS), National Institutes of Health, 9800 Medical Center Drive, Rockville, Maryland 20850, United States
| | - Anton Simeonov
- National Center for Advancing Translational Sciences (NCATS), National Institutes of Health, 9800 Medical Center Drive, Rockville, Maryland 20850, United States
| | - Eugene N Muratov
- Laboratory for Molecular Modeling, Division of Chemical Biology and Medicinal Chemistry, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, North Carolina 27599, United States.,Department of Pharmaceutical Sciences, Federal University of Paraiba, João Pessoa, Paraíba 58059, Brazil
| | - Alexey V Zakharov
- National Center for Advancing Translational Sciences (NCATS), National Institutes of Health, 9800 Medical Center Drive, Rockville, Maryland 20850, United States
| | - Alexander Tropsha
- Laboratory for Molecular Modeling, Division of Chemical Biology and Medicinal Chemistry, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, North Carolina 27599, United States
| |
Collapse
|
15
|
Mathai N, Kirchmair J. Similarity-Based Methods and Machine Learning Approaches for Target Prediction in Early Drug Discovery: Performance and Scope. Int J Mol Sci 2020; 21:ijms21103585. [PMID: 32438666 PMCID: PMC7279241 DOI: 10.3390/ijms21103585] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2020] [Revised: 05/13/2020] [Accepted: 05/16/2020] [Indexed: 12/20/2022] Open
Abstract
Computational methods for predicting the macromolecular targets of drugs and drug-like compounds have evolved as a key technology in drug discovery. However, the established validation protocols leave several key questions regarding the performance and scope of methods unaddressed. For example, prediction success rates are commonly reported as averages over all compounds of a test set and do not consider the structural relationship between the individual test compounds and the training instances. In order to obtain a better understanding of the value of ligand-based methods for target prediction, we benchmarked a similarity-based method and a random forest based machine learning approach (both employing 2D molecular fingerprints) under three testing scenarios: a standard testing scenario with external data, a standard time-split scenario, and a scenario that is designed to most closely resemble real-world conditions. In addition, we deconvoluted the results based on the distances of the individual test molecules from the training data. We found that, surprisingly, the similarity-based approach generally outperformed the machine learning approach in all testing scenarios, even in cases where queries were structurally clearly distinct from the instances in the training (or reference) data, and despite a much higher coverage of the known target space.
Collapse
Affiliation(s)
- Neann Mathai
- Department of Chemistry and Computational Biology Unit (CBU), University of Bergen, N-5020 Bergen, Norway;
| | - Johannes Kirchmair
- Department of Chemistry and Computational Biology Unit (CBU), University of Bergen, N-5020 Bergen, Norway;
- Department of Pharmaceutical Chemistry, Faculty of Life Sciences, University of Vienna, 1090 Vienna, Austria
- Correspondence:
| |
Collapse
|
16
|
Yang ZY, He JH, Lu AP, Hou TJ, Cao DS. Frequent hitters: nuisance artifacts in high-throughput screening. Drug Discov Today 2020; 25:657-667. [DOI: 10.1016/j.drudis.2020.01.014] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2019] [Revised: 12/28/2019] [Accepted: 01/16/2020] [Indexed: 11/27/2022]
|
17
|
Yang ZY, He JH, Lu AP, Hou TJ, Cao DS. Application of Negative Design To Design a More Desirable Virtual Screening Library. J Med Chem 2020; 63:4411-4429. [DOI: 10.1021/acs.jmedchem.9b01476] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
Affiliation(s)
- Zi-Yi Yang
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410013, Hunan, P. R. China
| | - Jun-Hong He
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410013, Hunan, P. R. China
| | - Ai-Ping Lu
- Institute for Advancing Translational Medicine in Bone and Joint Diseases, School of Chinese Medicine, Hong Kong Baptist University, Hong Kong, SAR, P. R. China
| | - Ting-Jun Hou
- Hangzhou Institute of Innovative Medicine, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, P. R. China
| | - Dong-Sheng Cao
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410013, Hunan, P. R. China
- Institute for Advancing Translational Medicine in Bone and Joint Diseases, School of Chinese Medicine, Hong Kong Baptist University, Hong Kong, SAR, P. R. China
| |
Collapse
|
18
|
Abstract
Aim: The druggability of epigenetic targets has prompted researchers to develop small-molecule therapeutics. However, no systematic assessment has ever been done to investigate the chemical space of epigenetic modulators. Herein, we report a comprehensive chemoinformatic analysis of epigenetic ligands from EpiDBase, HEMD, ChEMBL and PubChem databases. Results: Nearly, 0.45 × 106 ligands were analyzed for assay interference compounds, target profiling, drug-like properties and hit prioritization. After eliminating approximately 96,000 problematic compounds, the remaining 0.36 × 106 compounds were studied for their physicochemical distributions, principal component analysis and hit prioritization. More than 30% of assay interference compounds were determined for many proteins. Conclusion: This systematic assessment of epigenetic ligands will help in the enrichment of screening libraries with high-quality compounds and thus, the generation of efficacious drug candidates.
Collapse
|
19
|
David L, Arús-Pous J, Karlsson J, Engkvist O, Bjerrum EJ, Kogej T, Kriegl JM, Beck B, Chen H. Applications of Deep-Learning in Exploiting Large-Scale and Heterogeneous Compound Data in Industrial Pharmaceutical Research. Front Pharmacol 2019; 10:1303. [PMID: 31749705 PMCID: PMC6848277 DOI: 10.3389/fphar.2019.01303] [Citation(s) in RCA: 26] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2019] [Accepted: 10/14/2019] [Indexed: 12/21/2022] Open
Abstract
In recent years, the development of high-throughput screening (HTS) technologies and their establishment in an industrialized environment have given scientists the possibility to test millions of molecules and profile them against a multitude of biological targets in a short period of time, generating data in a much faster pace and with a higher quality than before. Besides the structure activity data from traditional bioassays, more complex assays such as transcriptomics profiling or imaging have also been established as routine profiling experiments thanks to the advancement of Next Generation Sequencing or automated microscopy technologies. In industrial pharmaceutical research, these technologies are typically established in conjunction with automated platforms in order to enable efficient handling of screening collections of thousands to millions of compounds. To exploit the ever-growing amount of data that are generated by these approaches, computational techniques are constantly evolving. In this regard, artificial intelligence technologies such as deep learning and machine learning methods play a key role in cheminformatics and bio-image analytics fields to address activity prediction, scaffold hopping, de novo molecule design, reaction/retrosynthesis predictions, or high content screening analysis. Herein we summarize the current state of analyzing large-scale compound data in industrial pharmaceutical research and describe the impact it has had on the drug discovery process over the last two decades, with a specific focus on deep-learning technologies.
Collapse
Affiliation(s)
- Laurianne David
- Hit Discovery, Discovery Sciences, Biopharmaceutical R&D, AstraZeneca, Gothenburg, Sweden
- Department of Life Science Informatics, B-IT, Rheinische Friedrich-Wilhelms-Universität Bonn, Bonn, Germany
| | - Josep Arús-Pous
- Hit Discovery, Discovery Sciences, Biopharmaceutical R&D, AstraZeneca, Gothenburg, Sweden
- Department of Chemistry and Biochemistry, University of Bern, Bern, Switzerland
| | - Johan Karlsson
- Quantitative Biology, Discovery Sciences, Biopharmaceutical R&D, AstraZeneca, Gothenburg, Sweden
| | - Ola Engkvist
- Hit Discovery, Discovery Sciences, Biopharmaceutical R&D, AstraZeneca, Gothenburg, Sweden
| | - Esben Jannik Bjerrum
- Hit Discovery, Discovery Sciences, Biopharmaceutical R&D, AstraZeneca, Gothenburg, Sweden
| | - Thierry Kogej
- Hit Discovery, Discovery Sciences, Biopharmaceutical R&D, AstraZeneca, Gothenburg, Sweden
| | - Jan M. Kriegl
- Department of Medicinal Chemistry, Boehringer Ingelheim Pharma GmbH & Co. KG, Biberach an der Riss, Germany
| | - Bernd Beck
- Department of Medicinal Chemistry, Boehringer Ingelheim Pharma GmbH & Co. KG, Biberach an der Riss, Germany
| | - Hongming Chen
- Hit Discovery, Discovery Sciences, Biopharmaceutical R&D, AstraZeneca, Gothenburg, Sweden
- Chemistry and Chemical Biology Centre, Guangzhou Regenerative Medicine and Health – Guangdong Laboratory, Guangzhou, China
| |
Collapse
|
20
|
Cheminformatics Explorations of Natural Products. PROGRESS IN THE CHEMISTRY OF ORGANIC NATURAL PRODUCTS 2019; 110:1-35. [PMID: 31621009 DOI: 10.1007/978-3-030-14632-0_1] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
The chemistry of natural products is fascinating and has continuously attracted the attention of the scientific community for many reasons including, but not limited to, biosynthesis pathways, chemical diversity, the source of bioactive compounds and their marked impact on drug discovery. There is a broad range of experimental and computational techniques (molecular modeling and cheminformatics) that have evolved over the years and have assisted the investigation of natural products. Herein, we discuss cheminformatics strategies to explore the chemistry and applications of natural products. Since the potential synergisms between cheminformatics and natural products are vast, we will focus on three major aspects: (1) exploration of the chemical space of natural products to identify bioactive compounds, with emphasis on drug discovery; (2) assessment of the toxicity profile of natural products; and (3) diversity analysis of natural product collections and the design of chemical collections inspired by natural sources.
Collapse
|
21
|
David L, Walsh J, Sturm N, Feierberg I, Nissink JWM, Chen H, Bajorath J, Engkvist O. Identification of Compounds That Interfere with High-Throughput Screening Assay Technologies. ChemMedChem 2019; 14:1795-1802. [PMID: 31479198 PMCID: PMC6856845 DOI: 10.1002/cmdc.201900395] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2019] [Revised: 08/21/2019] [Indexed: 01/23/2023]
Abstract
A significant challenge in high-throughput screening (HTS) campaigns is the identification of assay technology interference compounds. A Compound Interfering with an Assay Technology (CIAT) gives false readouts in many assays. CIATs are often considered viable hits and investigated in follow-up studies, thus impeding research and wasting resources. In this study, we developed a machine-learning (ML) model to predict CIATs for three assay technologies. The model was trained on known CIATs and non-CIATs (NCIATs) identified in artefact assays and described by their 2D structural descriptors. Usual methods identifying CIATs are based on statistical analysis of historical primary screening data and do not consider experimental assays identifying CIATs. Our results show successful prediction of CIATs for existing and novel compounds and provide a complementary and wider set of predicted CIATs compared to BSF, a published structure-independent model, and to the PAINS substructural filters. Our analysis is an example of how well-curated datasets can provide powerful predictive models despite their relatively small size.
Collapse
Affiliation(s)
- Laurianne David
- Hit Discovery, Discovery Sciences, R&D BioPharmaceuticalsAstraZeneca GoteborgPepparedsleden 1431 83MölndalSweden
- Department of Life Science Informatics, B-ITLIMES Program Unit Chemical Biology and Medicinal ChemistryRheinische Friedrich-Wilhelms-Universität BonnEndenicher Allee 19c53115BonnGermany
| | - Jarrod Walsh
- Hit Discovery, Discovery Sciences, R&D BioPharmaceuticalsAstraZeneca CambridgeAlderley ParkMacclesfieldSK10 4TGUK
| | - Noé Sturm
- Data Science and AI, Drug Safety & Metabolism, R&D BioPharmaceuticalsAstraZeneca GothenburgPepparedsleden 1431 83MölndalSweden
| | - Isabella Feierberg
- Hit Discovery, Discovery Sciences, R&D BioPharmaceuticalsAstraZeneca Boston35 Gatehouse DriveWalthamMA02451USA
| | - J. Willem M. Nissink
- Computational Chemistry, Oncology R&DAstraZenecaCambridge Science Park, Milton RoadCambridgeCB4 0WGUK
| | - Hongming Chen
- Hit Discovery, Discovery Sciences, R&D BioPharmaceuticalsAstraZeneca GoteborgPepparedsleden 1431 83MölndalSweden
| | - Jürgen Bajorath
- Department of Life Science Informatics, B-ITLIMES Program Unit Chemical Biology and Medicinal ChemistryRheinische Friedrich-Wilhelms-Universität BonnEndenicher Allee 19c53115BonnGermany
| | - Ola Engkvist
- Hit Discovery, Discovery Sciences, R&D BioPharmaceuticalsAstraZeneca GoteborgPepparedsleden 1431 83MölndalSweden
| |
Collapse
|
22
|
Wilm A, Stork C, Bauer C, Schepky A, Kühnl J, Kirchmair J. Skin Doctor: Machine Learning Models for Skin Sensitization Prediction that Provide Estimates and Indicators of Prediction Reliability. Int J Mol Sci 2019; 20:E4833. [PMID: 31569429 PMCID: PMC6801714 DOI: 10.3390/ijms20194833] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2019] [Revised: 09/17/2019] [Accepted: 09/18/2019] [Indexed: 12/19/2022] Open
Abstract
The ability to predict the skin sensitization potential of small organic molecules is of high importance to the development and safe application of cosmetics, drugs and pesticides. One of the most widely accepted methods for predicting this hazard is the local lymph node assay (LLNA). The goal of this work was to develop in silico models for the prediction of the skin sensitization potential of small molecules that go beyond the state of the art, with larger LLNA data sets and, most importantly, a robust and intuitive definition of the applicability domain, paired with additional indicators of the reliability of predictions. We explored a large variety of molecular descriptors and fingerprints in combination with random forest and support vector machine classifiers. The most suitable models were tested on holdout data, on which they yielded competitive performance (Matthews correlation coefficients up to 0.52; accuracies up to 0.76; areas under the receiver operating characteristic curves up to 0.83). The most favorable models are available via a public web service that, in addition to predictions, provides assessments of the applicability domain and indicators of the reliability of the individual predictions.
Collapse
Affiliation(s)
- Anke Wilm
- Center for Bioinformatics, Universität Hamburg, 20146 Hamburg, Germany.
- HITeC e.V, 22527 Hamburg, Germany.
| | - Conrad Stork
- Center for Bioinformatics, Universität Hamburg, 20146 Hamburg, Germany.
| | - Christoph Bauer
- Department of Chemistry, University of Bergen, 5020 Bergen, Norway.
- Computational Biology Unit (CBU), University of Bergen, 5020 Bergen, Norway.
| | - Andreas Schepky
- Front End Innovation, Beiersdorf AG, 20253 Hamburg, Germany.
| | - Jochen Kühnl
- Front End Innovation, Beiersdorf AG, 20253 Hamburg, Germany.
| | - Johannes Kirchmair
- Center for Bioinformatics, Universität Hamburg, 20146 Hamburg, Germany.
- Department of Chemistry, University of Bergen, 5020 Bergen, Norway.
- Computational Biology Unit (CBU), University of Bergen, 5020 Bergen, Norway.
| |
Collapse
|
23
|
Dantas RF, Evangelista TCS, Neves BJ, Senger MR, Andrade CH, Ferreira SB, Silva-Junior FP. Dealing with frequent hitters in drug discovery: a multidisciplinary view on the issue of filtering compounds on biological screenings. Expert Opin Drug Discov 2019; 14:1269-1282. [DOI: 10.1080/17460441.2019.1654453] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
Affiliation(s)
- Rafael Ferreira Dantas
- LaBECFar – Laboratório de Bioquímica Experimental e Computacional de Fármacos, Instituto Oswaldo Cruz, Fundação Oswaldo Cruz, Rio de Janeiro, Brazil
| | - Tereza Cristina Santos Evangelista
- LaSOPB – Laboratório de Síntese Orgânica e Prospecção Biológica, Instituto de Química, Universidade Federal do Rio de Janeiro, Rio de Janeiro, Brazil
| | - Bruno Junior Neves
- LabChem – Laboratory of Cheminformatics, Centro Universitário de Anápolis, UniEVANGÉLICA, Anápolis, Brazil
| | - Mario Roberto Senger
- LaBECFar – Laboratório de Bioquímica Experimental e Computacional de Fármacos, Instituto Oswaldo Cruz, Fundação Oswaldo Cruz, Rio de Janeiro, Brazil
| | - Carolina Horta Andrade
- LabMol – Laboratory for Molecular Modeling and Drug Design, Faculdade de Farmácia, Universidade Federal de Goiás, Goiânia, Brazil
| | - Sabrina Baptista Ferreira
- LaSOPB – Laboratório de Síntese Orgânica e Prospecção Biológica, Instituto de Química, Universidade Federal do Rio de Janeiro, Rio de Janeiro, Brazil
| | - Floriano Paes Silva-Junior
- LaBECFar – Laboratório de Bioquímica Experimental e Computacional de Fármacos, Instituto Oswaldo Cruz, Fundação Oswaldo Cruz, Rio de Janeiro, Brazil
| |
Collapse
|
24
|
Reker D, Bernardes GJL, Rodrigues T. Computational advances in combating colloidal aggregation in drug discovery. Nat Chem 2019; 11:402-418. [PMID: 30988417 DOI: 10.1038/s41557-019-0234-9] [Citation(s) in RCA: 36] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2018] [Accepted: 02/21/2019] [Indexed: 02/07/2023]
Abstract
Small molecule effectors are essential for drug discovery. Specific molecular recognition, reversible binding and dose-dependency are usually key requirements to ensure utility of a novel chemical entity. However, artefactual frequent-hitter and assay interference compounds may divert lead optimization and screening programmes towards attrition-prone chemical matter. Colloidal aggregates are the prime source of false positive readouts, either through protein sequestration or protein-scaffold mimicry. Nevertheless, assessment of colloidal aggregation remains somewhat overlooked and under-appreciated. In this Review, we discuss the impact of aggregation in drug discovery by analysing select examples from the literature and publicly-available datasets. We also examine and comment on technologies used to experimentally identify these potentially problematic entities. We focus on evidence-based computational filters and machine learning algorithms that may be swiftly deployed to flag chemical matter and mitigate the impact of aggregates in discovery programmes. We highlight the tools that can be used to scrutinize libraries, and identify and eliminate these problematic compounds.
Collapse
Affiliation(s)
- Daniel Reker
- Koch Institute for Integrative Cancer Research, Massachusetts Institute of Technology, Cambridge, MA, USA. .,Division of Gastroenterology, Hepatology and Endoscopy, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA. .,MIT-IBM Watson AI Lab, Massachusetts Institute of Technology, Cambridge, MA, USA.
| | - Gonçalo J L Bernardes
- Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge, UK.,Instituto de Medicina Molecular, Faculdade de Medicina da Universidade de Lisboa, Lisboa, Portugal
| | - Tiago Rodrigues
- Instituto de Medicina Molecular, Faculdade de Medicina da Universidade de Lisboa, Lisboa, Portugal.
| |
Collapse
|
25
|
Chen Y, Stork C, Hirte S, Kirchmair J. NP-Scout: Machine Learning Approach for the Quantification and Visualization of the Natural Product-Likeness of Small Molecules. Biomolecules 2019; 9:biom9020043. [PMID: 30682850 PMCID: PMC6406893 DOI: 10.3390/biom9020043] [Citation(s) in RCA: 32] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2018] [Revised: 01/21/2019] [Accepted: 01/21/2019] [Indexed: 01/11/2023] Open
Abstract
Natural products (NPs) remain the most prolific resource for the development of small-molecule drugs. Here we report a new machine learning approach that allows the identification of natural products with high accuracy. The method also generates similarity maps, which highlight atoms that contribute significantly to the classification of small molecules as a natural product or synthetic molecule. The method can hence be utilized to (i) identify natural products in large molecular libraries, (ii) quantify the natural product-likeness of small molecules, and (iii) visualize atoms in small molecules that are characteristic of natural products or synthetic molecules. The models are based on random forest classifiers trained on data sets consisting of more than 265,000 to 322,000 natural products and synthetic molecules. Two-dimensional molecular descriptors, MACCS keys and Morgan2 fingerprints were explored. On an independent test set the models reached areas under the receiver operating characteristic curve (AUC) of 0.997 and Matthews correlation coefficients (MCCs) of 0.954 and higher. The method was further tested on data from the Dictionary of Natural Products, ChEMBL and other resources. The best-performing models are accessible as a free web service at http://npscout.zbh.uni-hamburg.de/npscout.
Collapse
Affiliation(s)
- Ya Chen
- Center for Bioinformatics (ZBH), Department of Informatics, Faculty of Mathematics, Informatics and Natural Sciences, Universität Hamburg, 20146 Hamburg, Germany.
| | - Conrad Stork
- Center for Bioinformatics (ZBH), Department of Informatics, Faculty of Mathematics, Informatics and Natural Sciences, Universität Hamburg, 20146 Hamburg, Germany.
| | - Steffen Hirte
- Center for Bioinformatics (ZBH), Department of Informatics, Faculty of Mathematics, Informatics and Natural Sciences, Universität Hamburg, 20146 Hamburg, Germany.
| | - Johannes Kirchmair
- Center for Bioinformatics (ZBH), Department of Informatics, Faculty of Mathematics, Informatics and Natural Sciences, Universität Hamburg, 20146 Hamburg, Germany.
- Department of Chemistry, University of Bergen, 5007 Bergen, Norway.
- Computational Biology Unit (CBU), Department of Informatics, University of Bergen, 5008 Bergen, Norway.
| |
Collapse
|
26
|
Stork C, Chen Y, Šícho M, Kirchmair J. Hit Dexter 2.0: Machine-Learning Models for the Prediction of Frequent Hitters. J Chem Inf Model 2019; 59:1030-1043. [DOI: 10.1021/acs.jcim.8b00677] [Citation(s) in RCA: 54] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2023]
Affiliation(s)
- Conrad Stork
- Center for Bioinformatics (ZBH), Department of Computer Science, Faculty of Mathematics, Informatics and Natural Sciences, Universität Hamburg, Hamburg, 20146, Germany
| | - Ya Chen
- Center for Bioinformatics (ZBH), Department of Computer Science, Faculty of Mathematics, Informatics and Natural Sciences, Universität Hamburg, Hamburg, 20146, Germany
| | - Martin Šícho
- Center for Bioinformatics (ZBH), Department of Computer Science, Faculty of Mathematics, Informatics and Natural Sciences, Universität Hamburg, Hamburg, 20146, Germany
- CZ-OPENSCREEN: National Infrastructure for Chemical Biology, Laboratory of Informatics and Chemistry, Faculty of Chemical Technology, University of Chemistry and Technology Prague, 166 28 Prague 6, Czech Republic
| | - Johannes Kirchmair
- Center for Bioinformatics (ZBH), Department of Computer Science, Faculty of Mathematics, Informatics and Natural Sciences, Universität Hamburg, Hamburg, 20146, Germany
- Department of Chemistry, University of Bergen, N-5020 Bergen, Norway
- Computational Biology Unit (CBU), University of Bergen, N-5020 Bergen, Norway
| |
Collapse
|
27
|
PAIN(S) relievers for medicinal chemists: how computational methods can assist in hit evaluation. Future Med Chem 2018; 10:1533-1535. [PMID: 29956552 DOI: 10.4155/fmc-2018-0116] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
|