1
|
Abou Hajal A, Bryce RA, Amor BB, Atatreh N, Ghattas MA. Boosting the Accuracy and Chemical Space Coverage of the Detection of Small Colloidal Aggregating Molecules Using the BAD Molecule Filter. J Chem Inf Model 2024; 64:4991-5005. [PMID: 38920403 DOI: 10.1021/acs.jcim.4c00363] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/27/2024]
Abstract
The ability to conduct effective high throughput screening (HTS) campaigns in drug discovery is often hampered by the detection of false positives in these assays due to small colloidally aggregating molecules (SCAMs). SCAMs can produce artifactual hits in HTS by nonspecific inhibition of the protein target. In this work, we present a new computational prediction tool for detecting SCAMs based on their 2D chemical structure. The tool, called the boosted aggregation detection (BAD) molecule filter, employs decision tree ensemble methods, namely, the CatBoost classifier and the light gradient-boosting machine, to significantly improve the detection of SCAMs. In developing the filter, we explore models trained on individual data sets, a consensus approach using these models, and, third, a merged data set approach, each tailored for specific drug discovery needs. The individual data set method emerged as most effective, achieving 93% sensitivity and 90% specificity, outperforming existing state-of-the-art models by 20 and 5%, respectively. The consensus models offer broader chemical space coverage, exceeding 90% for all testing sets. This feature is an important aspect particularly for early stage medicinal chemistry projects, and provides information on applicability domain. Meanwhile, the merged data set models demonstrated robust performance, with a notable sensitivity of 79% in the comprehensive 10-fold cross-validation test set. A SHAP analysis of model features indicates the importance of hydrophobicity and molecular complexity as primary factors influencing the aggregation propensity. The BAD molecule filter is readily accessible for the public usage on https://molmodlab-aau.com/Tools.html. This filter provides a new, more robust tool for aggregate prediction in the early stages of drug discovery to optimize hit rates and reduce associated testing and validation overheads.
Collapse
Affiliation(s)
- Abdallah Abou Hajal
- College of Pharmacy, Al Ain University, Abu Dhabi 112612, United Arab Emirates
- AAU Health and Biomedical Research Center, Al Ain University, Abu Dhabi 112612, United Arab Emirates
| | - Richard A Bryce
- Division of Pharmacy and Optometry, School of Health Sciences, University of Manchester, Oxford Road, Manchester M13 9PL, U.K
| | - Boulbaba Ben Amor
- Core42, Inception/G42, Abu Dhabi 2282, United Arab Emirates
- IMT Nord Europe, Villeneuve D'Ascq 59650 France
| | - Noor Atatreh
- College of Pharmacy, Al Ain University, Abu Dhabi 112612, United Arab Emirates
- AAU Health and Biomedical Research Center, Al Ain University, Abu Dhabi 112612, United Arab Emirates
| | - Mohammad A Ghattas
- College of Pharmacy, Al Ain University, Abu Dhabi 112612, United Arab Emirates
- AAU Health and Biomedical Research Center, Al Ain University, Abu Dhabi 112612, United Arab Emirates
| |
Collapse
|
2
|
Kombo DC, Stepp JD, Lim S, Elshorst B, Li Y, Cato L, Shomali M, Fink D, LaMarche MJ. Predictions of Colloidal Molecular Aggregation Using AI/ML Models. ACS OMEGA 2024; 9:28691-28706. [PMID: 38973835 PMCID: PMC11223200 DOI: 10.1021/acsomega.4c02886] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/28/2024] [Revised: 06/10/2024] [Accepted: 06/12/2024] [Indexed: 07/09/2024]
Abstract
To facilitate the triage of hits from small molecule screens, we have used various AI/ML techniques and experimentally observed data sets to build models aimed at predicting colloidal aggregation of small organic molecules in aqueous solution. We have found that Naïve Bayesian and deep neural networks outperform logistic regression, recursive partitioning tree, support vector machine, and random forest techniques by having the lowest balanced error rate (BER) for the test set. Derived predictive classification models consistently and successfully discriminated aggregator molecules from nonaggregator hits. An analysis of molecular descriptors in favor of colloidal aggregation confirms previous observations (hydrophobicity, molecular weight, and solubility) in addition to undescribed molecular descriptors such as the fraction of sp3 carbon atoms (Fsp3), and electrotopological state of hydroxyl groups (ES_Sum_sOH). Naïve Bayesian modeling and scaffold tree analysis have revealed chemical features/scaffolds contributing the most to colloidal aggregation and nonaggregation, respectively. These results highlight the importance of scaffolds with high Fsp3 values in promoting nonaggregation. Matched molecular pair analysis (MMPA) has also deciphered context-dependent substitutions, which can be used to design nonaggregator molecules. We found that most matched molecular pairs have a neutral effect on aggregation propensity. We have prospectively applied our predictive models to assist in chemical library triage for optimal plate selection diversity and purchase for high throughput screening (HTS) in drug discovery projects.
Collapse
Affiliation(s)
- David C. Kombo
- Integrated
Drug Discovery, Sanofi, 350 Water St., Cambridge, Massachusetts 02141, United States
| | - J. David Stepp
- Integrated
Drug Discovery, Sanofi, 350 Water St., Cambridge, Massachusetts 02141, United States
| | - Sungtaek Lim
- Integrated
Drug Discovery, Sanofi, 350 Water St., Cambridge, Massachusetts 02141, United States
| | - Bettina Elshorst
- CMC
Synthetics Early Development Analytics, Sanofi, Industriepark Hochst, Frankfurt 65926, Germany
| | - Yi Li
- Integrated
Drug Discovery, Sanofi, 350 Water St., Cambridge, Massachusetts 02141, United States
| | - Laura Cato
- Molecular
Oncology, Sanofi, 350
Water St., Cambridge, Massachusetts 02141, United States
| | - Maysoun Shomali
- Molecular
Oncology, Sanofi, 350
Water St., Cambridge, Massachusetts 02141, United States
| | - David Fink
- Integrated
Drug Discovery, Sanofi, 350 Water St., Cambridge, Massachusetts 02141, United States
| | - Matthew J. LaMarche
- Integrated
Drug Discovery, Sanofi, 350 Water St., Cambridge, Massachusetts 02141, United States
| |
Collapse
|
3
|
Molina C, Ait-Ouarab L, Minoux H. Isometric Stratified Ensembles: A Partial and Incremental Adaptive Applicability Domain and Consensus-Based Classification Strategy for Highly Imbalanced Data Sets with Application to Colloidal Aggregation. J Chem Inf Model 2022; 62:1849-1862. [PMID: 35357194 DOI: 10.1021/acs.jcim.2c00293] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Partial and incremental stratification analysis of a quantitative structure-interference relationship (QSIR) is a novel strategy intended to categorize classification provided by machine learning techniques. It is based on a 2D mapping of classification statistics onto two categorical axes: the degree of consensus and level of applicability domain. An internal cross-validation set allows to determine the statistical performance of the ensemble at every 2D map stratum and hence to define isometric local performance regions with the aim of better hit ranking and selection. During training, isometric stratified ensembles (ISE) applies a recursive decorrelated variable selection and considers the cardinal ratio of classes to balance training sets and thus avoid bias due to possible class imbalance. To exemplify the interest of this strategy, three different highly imbalanced PubChem pairs of AmpC β-lactamase and cruzain inhibition assay campaigns of colloidal aggregators and complementary aggregators data set available at the AGGREGATOR ADVISOR predictor web page were employed. Statistics obtained using this new strategy show outperforming results compared to former published tools, with and without a classical applicability domain. ISE performance on classifying colloidal aggregators shows from a global AUC of 0.82, when the whole test data set is considered, up to a maximum AUC of 0.88, when its highest confidence isometric stratum is retained.
Collapse
Affiliation(s)
- Christophe Molina
- PIKAÏROS S.A., B03 - 2 Allée de la Clairière, 31650 Saint Orens de Gameville, France
| | - Lilia Ait-Ouarab
- AMOA Ingénierie, INFOGENE S.A., 19, rue d'Orleans, 92200 Neuilly-sur-Seine, France
| | - Hervé Minoux
- Data and Data Science, SANOFI R&D, 91380 Chilly-Mazarin, France
| |
Collapse
|
4
|
Sun J, Zhong H, Wang K, Li N, Chen L. Gains from no real PAINS: Where 'Fair Trial Strategy' stands in the development of multi-target ligands. Acta Pharm Sin B 2021; 11:3417-3432. [PMID: 34900527 PMCID: PMC8642439 DOI: 10.1016/j.apsb.2021.02.023] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2020] [Revised: 02/15/2021] [Accepted: 02/25/2021] [Indexed: 12/26/2022] Open
Abstract
Compounds that selectively modulate multiple targets can provide clinical benefits and are an alternative to traditional highly selective agents for unique targets. High-throughput screening (HTS) for multitarget-directed ligands (MTDLs) using approved drugs, and fragment-based drug design has become a regular strategy to achieve an ideal multitarget combination. However, the unexpected presence of pan-assay interference compounds (PAINS) suspects in the development of MTDLs frequently results in nonspecific interactions or other undesirable effects leading to artefacts or false-positive data of biological assays. Publicly available filters can help to identify PAINS suspects; however, these filters cannot comprehensively conclude whether these suspects are "bad" or innocent. Additionally, these in silico approaches may inappropriately label a ligand as PAINS. More than 80% of the initial hits can be identified as PAINS by the filters if appropriate biochemical tests are not used resulting in false positive data that are unacceptable for medicinal chemists in manuscript peer review and future studies. Therefore, extensive offline experiments should be used after online filtering to discriminate "bad" PAINS and avoid incorrect evaluation of good scaffolds. We suggest that the use of "Fair Trial Strategy" to identify interesting molecules in PAINS suspects to provide certain structure‒function insight in MTDL development.
Collapse
Key Words
- AD, Alzheimer disease
- ALARM NMR, a La assay to detect reactive molecules by nuclear magnetic resonance
- Biochemical experiment
- CADD, computer-aided drug design technology
- CoA, coenzyme A
- EGFR, epidermal growth factor receptor
- Fair trial strategy
- GSH, glutathione
- HER2, human epidermal growth factor receptor 2
- HTS, high-throughput screening
- In silico filtering
- LC−MS, liquid chromatography−mass spectrometry
- MTDLs, multitarget-directed ligands
- Multitarget-directed ligands
- PAINS suspects
- PAINS, pan-assay interference compounds
- QSAR, quantitative structure–activity relationship
- ROS, radicals and oxygen reactive species
Collapse
Affiliation(s)
- Jianbo Sun
- State Key Laboratory of Natural Medicines, Department of Natural Medicinal Chemistry, School of Traditional Chinese Pharmacy, China Pharmaceutical University, Nanjing 210009, China
| | - Hui Zhong
- Department of Pharmacology of Traditional Chinese Medicine, School of Traditional Chinese Pharmacy, China Pharmaceutical University, Nanjing 210009, China
| | - Kun Wang
- State Key Laboratory of Natural Medicines, Department of Natural Medicinal Chemistry, School of Traditional Chinese Pharmacy, China Pharmaceutical University, Nanjing 210009, China
| | - Na Li
- State Key Laboratory of Natural Medicines, Department of Natural Medicinal Chemistry, School of Traditional Chinese Pharmacy, China Pharmaceutical University, Nanjing 210009, China
| | - Li Chen
- State Key Laboratory of Natural Medicines, Department of Natural Medicinal Chemistry, School of Traditional Chinese Pharmacy, China Pharmaceutical University, Nanjing 210009, China
| |
Collapse
|
5
|
Kaya I, Colmenarejo G. Analysis of Nuisance Substructures and Aggregators in a Comprehensive Database of Food Chemical Compounds. JOURNAL OF AGRICULTURAL AND FOOD CHEMISTRY 2020; 68:8812-8824. [PMID: 32687707 DOI: 10.1021/acs.jafc.0c02521] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
The mechanistic understanding of the biological effects of foods involves the testing of food compounds in biochemical and biological assays. Positive results in these assays can be artifactual due to some properties of the compound: namely chemical reactivity, membrane disruption, redox cycling, etc., or through the formation of colloidal aggregates. Within the drug discovery field, a wide set of so-called "nuisance" filters have been developed to identify substructures prone to assay artifacts and/or promiscuity, e.g., the pan-assay interference compounds (PAINS) and others. In the subarea of natural products, a similar concept is the so-called invalid metabolic panaceas (IMPs). Finally, tools to identify putative aggregators have also been developed. Here, we analyzed the presence of nuisance substructures, IMPs, and aggregators in a large database of food compounds (the FooDB), which should be useful to the researchers working in the field, in order to be aware of possible artifact/promiscuity issues in their assays.
Collapse
Affiliation(s)
- Irem Kaya
- Biostatistics and Bioinformatics Unit, IMDEA Food CEI UAM+CSIC, E28049 Madrid, Spain
| | - Gonzalo Colmenarejo
- Biostatistics and Bioinformatics Unit, IMDEA Food CEI UAM+CSIC, E28049 Madrid, Spain
| |
Collapse
|
6
|
Alves VM, Capuzzi SJ, Braga RC, Korn D, Hochuli JE, Bowler KH, Yasgar A, Rai G, Simeonov A, Muratov EN, Zakharov AV, Tropsha A. SCAM Detective: Accurate Predictor of Small, Colloidally Aggregating Molecules. J Chem Inf Model 2020; 60:4056-4063. [PMID: 32678597 DOI: 10.1021/acs.jcim.0c00415] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Small, colloidally aggregating molecules (SCAMs) are the most common source of false positives in high-throughput screening (HTS) campaigns. Although SCAMs can be experimentally detected and suppressed by the addition of detergent in the assay buffer, detergent sensitivity is not routinely monitored in HTS. Computational methods are thus needed to flag potential SCAMs during HTS triage. In this study, we have developed and rigorously validated quantitative structure-interference relationship (QSIR) models of detergent-sensitive aggregation in several HTS campaigns under various assay conditions and screening concentrations. In particular, we have modeled detergent-sensitive aggregation in an AmpC β-lactamase assay, the preferred HTS counter-screen for aggregation, as well as in another assay that measures cruzain inhibition. Our models increase the accuracy of aggregation prediction by ∼53% in the β-lactamase assay and by ∼46% in the cruzain assay compared to previously published methods. We also discuss the importance of both assay conditions and screening concentrations in the development of QSIR models for various interference mechanisms besides aggregation. The models developed in this study are publicly available for fast prediction within the SCAM detective web application (https://scamdetective.mml.unc.edu/).
Collapse
Affiliation(s)
- Vinicius M Alves
- Laboratory for Molecular Modeling, Division of Chemical Biology and Medicinal Chemistry, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, North Carolina 27599, United States
| | - Stephen J Capuzzi
- Laboratory for Molecular Modeling, Division of Chemical Biology and Medicinal Chemistry, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, North Carolina 27599, United States
| | | | - Daniel Korn
- Laboratory for Molecular Modeling, Division of Chemical Biology and Medicinal Chemistry, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, North Carolina 27599, United States
| | - Joshua E Hochuli
- Laboratory for Molecular Modeling, Division of Chemical Biology and Medicinal Chemistry, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, North Carolina 27599, United States
| | - Kyle H Bowler
- Laboratory for Molecular Modeling, Division of Chemical Biology and Medicinal Chemistry, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, North Carolina 27599, United States
| | - Adam Yasgar
- National Center for Advancing Translational Sciences (NCATS), National Institutes of Health, 9800 Medical Center Drive, Rockville, Maryland 20850, United States
| | - Ganesha Rai
- National Center for Advancing Translational Sciences (NCATS), National Institutes of Health, 9800 Medical Center Drive, Rockville, Maryland 20850, United States
| | - Anton Simeonov
- National Center for Advancing Translational Sciences (NCATS), National Institutes of Health, 9800 Medical Center Drive, Rockville, Maryland 20850, United States
| | - Eugene N Muratov
- Laboratory for Molecular Modeling, Division of Chemical Biology and Medicinal Chemistry, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, North Carolina 27599, United States.,Department of Pharmaceutical Sciences, Federal University of Paraiba, João Pessoa, Paraíba 58059, Brazil
| | - Alexey V Zakharov
- National Center for Advancing Translational Sciences (NCATS), National Institutes of Health, 9800 Medical Center Drive, Rockville, Maryland 20850, United States
| | - Alexander Tropsha
- Laboratory for Molecular Modeling, Division of Chemical Biology and Medicinal Chemistry, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, North Carolina 27599, United States
| |
Collapse
|
7
|
Yang ZY, He JH, Lu AP, Hou TJ, Cao DS. Application of Negative Design To Design a More Desirable Virtual Screening Library. J Med Chem 2020; 63:4411-4429. [DOI: 10.1021/acs.jmedchem.9b01476] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
Affiliation(s)
- Zi-Yi Yang
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410013, Hunan, P. R. China
| | - Jun-Hong He
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410013, Hunan, P. R. China
| | - Ai-Ping Lu
- Institute for Advancing Translational Medicine in Bone and Joint Diseases, School of Chinese Medicine, Hong Kong Baptist University, Hong Kong, SAR, P. R. China
| | - Ting-Jun Hou
- Hangzhou Institute of Innovative Medicine, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, P. R. China
| | - Dong-Sheng Cao
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410013, Hunan, P. R. China
- Institute for Advancing Translational Medicine in Bone and Joint Diseases, School of Chinese Medicine, Hong Kong Baptist University, Hong Kong, SAR, P. R. China
| |
Collapse
|
8
|
Ayotte Y, Marando VM, Vaillancourt L, Bouchard P, Heffron G, Coote PW, Larda ST, LaPlante SR. Exposing Small-Molecule Nanoentities by a Nuclear Magnetic Resonance Relaxation Assay. J Med Chem 2019; 62:7885-7896. [PMID: 31422659 DOI: 10.1021/acs.jmedchem.9b00653] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023]
Abstract
Small molecules can self-assemble in aqueous solution into a wide range of nanoentity types and sizes (dimers, n-mers, micelles, colloids, etc.), each having their own unique properties. This has important consequences in the context of drug discovery including issues related to nonspecific binding, off-target effects, and false positives and negatives. Here, we demonstrate the use of the spin-spin relaxation Carr-Purcell-Meiboom-Gill NMR experiment, which is sensitive to molecular tumbling rates and can expose larger aggregate species that have slower rotational correlations. The strategy easily distinguishes lone-tumbling molecules versus nanoentities of various sizes. The technique is highly sensitive to chemical exchange between single-molecule and aggregate states and can therefore be used as a reporter when direct measurement of aggregates is not possible by NMR. Interestingly, we found differences in solution behavior for compounds within structurally related series, demonstrating structure-nanoentity relationships. This practical experiment is a valuable tool to support drug discovery efforts.
Collapse
Affiliation(s)
- Yann Ayotte
- INRS-Centre Armand-Frappier Santé Biotechnologie , 531 Boulevard des Prairies , Laval , Québec H7V 1B7 , Canada
| | - Victoria M Marando
- NMX Research and Solutions, Inc. , 500 Boulevard Cartier Ouest , Laval , Québec , H7V 5B7 , Canada
| | - Louis Vaillancourt
- NMX Research and Solutions, Inc. , 500 Boulevard Cartier Ouest , Laval , Québec , H7V 5B7 , Canada
| | - Patricia Bouchard
- NMX Research and Solutions, Inc. , 500 Boulevard Cartier Ouest , Laval , Québec , H7V 5B7 , Canada
| | - Gregory Heffron
- Harvard Medical School , 240 Longwood Avenue , Boston , Massachusetts 02115 , United States
| | - Paul W Coote
- NMX Research and Solutions, Inc. , 500 Boulevard Cartier Ouest , Laval , Québec , H7V 5B7 , Canada.,Harvard Medical School , 240 Longwood Avenue , Boston , Massachusetts 02115 , United States
| | - Sacha T Larda
- NMX Research and Solutions, Inc. , 500 Boulevard Cartier Ouest , Laval , Québec , H7V 5B7 , Canada
| | - Steven R LaPlante
- INRS-Centre Armand-Frappier Santé Biotechnologie , 531 Boulevard des Prairies , Laval , Québec H7V 1B7 , Canada.,NMX Research and Solutions, Inc. , 500 Boulevard Cartier Ouest , Laval , Québec , H7V 5B7 , Canada.,Harvard Medical School , 240 Longwood Avenue , Boston , Massachusetts 02115 , United States
| |
Collapse
|
9
|
Yang ZY, Yang ZJ, Dong J, Wang LL, Zhang LX, Ding JJ, Ding XQ, Lu AP, Hou TJ, Cao DS. Structural Analysis and Identification of Colloidal Aggregators in Drug Discovery. J Chem Inf Model 2019; 59:3714-3726. [DOI: 10.1021/acs.jcim.9b00541] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Affiliation(s)
- Zi-Yi Yang
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410003, People’s Republic of China
| | - Zhi-Jiang Yang
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410003, People’s Republic of China
| | - Jie Dong
- Central South University of Forestry and Technology, Changsha 410004, People’s Republic of China
| | - Liang-Liang Wang
- Beijing Institute of Pharmaceutical Chemistry, Beijing 102205, People’s Republic of China
| | - Liu-Xia Zhang
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410003, People’s Republic of China
| | - Jun-Jie Ding
- Beijing Institute of Pharmaceutical Chemistry, Beijing 102205, People’s Republic of China
| | - Xiao-Qin Ding
- Beijing Institute of Pharmaceutical Chemistry, Beijing 102205, People’s Republic of China
| | - Ai-Ping Lu
- Institute for Advancing Translational Medicine in Bone and Joint Diseases, School of Chinese Medicine, Hong Kong Baptist University, Hong Kong Special Administrative Region, People’s Republic of China
| | - Ting-Jun Hou
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, People’s Republic of China
| | - Dong-Sheng Cao
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410003, People’s Republic of China
- Institute for Advancing Translational Medicine in Bone and Joint Diseases, School of Chinese Medicine, Hong Kong Baptist University, Hong Kong Special Administrative Region, People’s Republic of China
| |
Collapse
|
10
|
Abstract
INTRODUCTION With the emergence of the 'big data' era, the biomedical research community has great interest in exploiting publicly available chemical information for drug discovery. PubChem is an example of public databases that provide a large amount of chemical information free of charge. AREAS COVERED This article provides an overview of how PubChem's data, tools, and services can be used for virtual screening and reviews recent publications that discuss important aspects of exploiting PubChem for drug discovery. EXPERT OPINION PubChem offers comprehensive chemical information useful for drug discovery. It also provides multiple programmatic access routes, which are essential to build automated virtual screening pipelines that exploit PubChem data. In addition, PubChemRDF allows users to download PubChem data and load them into a local computing facility, facilitating data integration between PubChem and other resources. PubChem resources have been used in many studies for developing bioactivity and toxicity prediction models, discovering polypharmacologic (multi-target) ligands, and identifying new macromolecule targets of compounds (for drug-repurposing or off-target side effect prediction). These studies demonstrate the usefulness of PubChem as a key resource for computer-aided drug discovery and related area.
Collapse
Affiliation(s)
- Sunghwan Kim
- a National Center for Biotechnology Information, National Library of Medicine , National Institutes of Health , Department of Health and Human Services, Bethesda , MD , USA
| |
Collapse
|
11
|
Mining Chemical Activity Status from High-Throughput Screening Assays. PLoS One 2015; 10:e0144426. [PMID: 26658480 PMCID: PMC4682830 DOI: 10.1371/journal.pone.0144426] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2015] [Accepted: 11/18/2015] [Indexed: 01/20/2023] Open
Abstract
High-throughput screening (HTS) experiments provide a valuable resource that reports biological activity of numerous chemical compounds relative to their molecular targets. Building computational models that accurately predict such activity status (active vs. inactive) in specific assays is a challenging task given the large volume of data and frequently small proportion of active compounds relative to the inactive ones. We developed a method, DRAMOTE, to predict activity status of chemical compounds in HTP activity assays. For a class of HTP assays, our method achieves considerably better results than the current state-of-the-art-solutions. We achieved this by modification of a minority oversampling technique. To demonstrate that DRAMOTE is performing better than the other methods, we performed a comprehensive comparison analysis with several other methods and evaluated them on data from 11 PubChem assays through 1,350 experiments that involved approximately 500,000 interactions between chemicals and their target proteins. As an example of potential use, we applied DRAMOTE to develop robust models for predicting FDA approved drugs that have high probability to interact with the thyroid stimulating hormone receptor (TSHR) in humans. Our findings are further partially and indirectly supported by 3D docking results and literature information. The results based on approximately 500,000 interactions suggest that DRAMOTE has performed the best and that it can be used for developing robust virtual screening models. The datasets and implementation of all solutions are available as a MATLAB toolbox online at www.cbrc.kaust.edu.sa/dramote and can be found on Figshare.
Collapse
|
12
|
Irwin JJ, Duan D, Torosyan H, Doak AK, Ziebart KT, Sterling T, Tumanian G, Shoichet BK. An Aggregation Advisor for Ligand Discovery. J Med Chem 2015; 58:7076-87. [PMID: 26295373 DOI: 10.1021/acs.jmedchem.5b01105] [Citation(s) in RCA: 309] [Impact Index Per Article: 34.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
Abstract
Colloidal aggregation of organic molecules is the dominant mechanism for artifactual inhibition of proteins, and controls against it are widely deployed. Notwithstanding an increasingly detailed understanding of this phenomenon, a method to reliably predict aggregation has remained elusive. Correspondingly, active molecules that act via aggregation continue to be found in early discovery campaigns and remain common in the literature. Over the past decade, over 12 thousand aggregating organic molecules have been identified, potentially enabling a precedent-based approach to match known aggregators with new molecules that may be expected to aggregate and lead to artifacts. We investigate an approach that uses lipophilicity, affinity, and similarity to known aggregators to advise on the likelihood that a candidate compound is an aggregator. In prospective experimental testing, five of seven new molecules with Tanimoto coefficients (Tc's) between 0.95 and 0.99 to known aggregators aggregated at relevant concentrations. Ten of 19 with Tc's between 0.94 and 0.90 and three of seven with Tc's between 0.89 and 0.85 also aggregated. Another three of the predicted compounds aggregated at higher concentrations. This method finds that 61 827 or 5.1% of the ligands acting in the 0.1 to 10 μM range in the medicinal chemistry literature are at least 85% similar to a known aggregator with these physical properties and may aggregate at relevant concentrations. Intriguingly, only 0.73% of all drug-like commercially available compounds resemble the known aggregators, suggesting that colloidal aggregators are enriched in the literature. As a percentage of the literature, aggregator-like compounds have increased 9-fold since 1995, partly reflecting the advent of high-throughput and virtual screens against molecular targets. Emerging from this study is an aggregator advisor database and tool ( http://advisor.bkslab.org ), free to the community, that may help distinguish between fruitful and artifactual screening hits acting by this mechanism.
Collapse
Affiliation(s)
- John J Irwin
- Department of Pharmaceutical Chemistry, University of California, San Francisco , Byers Hall, 1700 4th St, San Francisco, California 94158-2550, United States
| | - Da Duan
- Department of Pharmaceutical Chemistry, University of California, San Francisco , Byers Hall, 1700 4th St, San Francisco, California 94158-2550, United States
| | - Hayarpi Torosyan
- Department of Pharmaceutical Chemistry, University of California, San Francisco , Byers Hall, 1700 4th St, San Francisco, California 94158-2550, United States
| | - Allison K Doak
- Department of Pharmaceutical Chemistry, University of California, San Francisco , Byers Hall, 1700 4th St, San Francisco, California 94158-2550, United States
| | - Kristin T Ziebart
- Department of Pharmaceutical Chemistry, University of California, San Francisco , Byers Hall, 1700 4th St, San Francisco, California 94158-2550, United States
| | - Teague Sterling
- Department of Pharmaceutical Chemistry, University of California, San Francisco , Byers Hall, 1700 4th St, San Francisco, California 94158-2550, United States
| | - Gurgen Tumanian
- Department of Pharmaceutical Chemistry, University of California, San Francisco , Byers Hall, 1700 4th St, San Francisco, California 94158-2550, United States
| | - Brian K Shoichet
- Department of Pharmaceutical Chemistry, University of California, San Francisco , Byers Hall, 1700 4th St, San Francisco, California 94158-2550, United States
| |
Collapse
|
13
|
Abstract
IMPORTANCE OF THE FIELD: PubChem is a public molecular information repository, a scientific showcase of the NIH Roadmap Initiative. The PubChem database holds over 27 million records of unique chemical structures of compounds (CID) derived from nearly 70 million substance depositions (SID), and contains more than 449,000 bioassay records with over thousands of in vitro biochemical and cell-based screening bioassays established, with targeting more than 7000 proteins and genes linking to over 1.8 million of substances. AREAS COVERED IN THIS REVIEW: This review builds on recent PubChem-related computational chemistry research reported by other authors while providing readers with an overview of the PubChem database, focusing on its increasing role in cheminformatics, virtual screening and toxicity prediction modeling. WHAT THE READER WILL GAIN: These publicly available datasets in PubChem provide great opportunities for scientists to perform cheminformatics and virtual screening research for computer-aided drug design. However, the high volume and complexity of the datasets, in particular the bioassay-associated false positives/negatives and highly imbalanced datasets in PubChem, also creates major challenges. Several approaches regarding the modeling of PubChem datasets and development of virtual screening models for bioactivity and toxicity predictions are also reviewed. TAKE HOME MESSAGE: Novel data-mining cheminformatics tools and virtual screening algorithms are being developed and used to retrieve, annotate and analyze the large-scale and highly complex PubChem biological screening data for drug design.
Collapse
Affiliation(s)
- Xiang-Qun Xie
- Department of Pharmaceutical Sciences, School of Pharmacy; Drug Discovery Institute/Pittsburgh Molecular Library Screening Center (PMLSC); Pittsburgh Chemical Methodologies & Library Development (PCMLD) Center; Departments of Computational Biology and Structural Biology; University of Pittsburgh, Pittsburgh, PA 15260, USA
| |
Collapse
|