Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Williams AJ, Ekins S. A quality alert and call for improved curation of public chemistry databases. Drug Discov Today 2011;16:747-50. [PMID: 21871970 DOI: 10.1016/j.drudis.2011.07.007] [Citation(s) in RCA: 78] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2011] [Accepted: 07/18/2011] [Indexed: 12/13/2022]

For:	Williams AJ, Ekins S. A quality alert and call for improved curation of public chemistry databases. Drug Discov Today 2011;16:747-50. [PMID: 21871970 DOI: 10.1016/j.drudis.2011.07.007] [Citation(s) in RCA: 78] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2011] [Accepted: 07/18/2011] [Indexed: 12/13/2022]

Collapse

Number

Cited by Other Article(s)

Kim S, Yu B, Li Q, Bolton EE. PubChem synonym filtering process using crowdsourcing. J Cheminform 2024;16:69. [PMID: 38880887 PMCID: PMC11181558 DOI: 10.1186/s13321-024-00868-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2024] [Accepted: 06/09/2024] [Indexed: 06/18/2024] Open

Abstract

PubChem ( https://pubchem.ncbi.nlm.nih.gov ) is a public chemical information resource containing more than 100 million unique chemical structures. One of the most requested tasks in PubChem and other chemical databases is to search chemicals by name (also commonly called a "chemical synonym"). PubChem performs this task by looking up chemical synonym-structure associations provided by individual depositors to PubChem. In addition, these synonyms are used for many purposes, including creating links between chemicals and PubMed articles (using Medical Subject Headings (MeSH) terms). However, these depositor-provided name-structure associations are subject to substantial discrepancies within and between depositors, making it difficult to unambiguously map a chemical name to a specific chemical structure. The present paper describes PubChem's crowdsourcing-based synonym filtering strategy, which resolves inter- and intra-depositor discrepancies in synonym-structure associations as well as in the chemical-MeSH associations. The PubChem synonym filtering process was developed based on the analysis of four crowd-voting strategies, which differ in the consistency threshold value employed (60% vs 70%) and how to resolve intra-depositor discrepancies (a single vote vs. multiple votes per depositor) prior to inter-depositor crowd-voting. The agreement of voting was determined at six levels of chemical equivalency, which considers varying isotopic composition, stereochemistry, and connectivity of chemical structures and their primary components. While all four strategies showed comparable results, Strategy I (one vote per depositor with a 60% consistency threshold) resulted in the most synonyms assigned to a single chemical structure as well as the most synonym-structure associations disambiguated at the six chemical equivalency contexts. Based on the results of this study, Strategy I was implemented in PubChem's filtering process that cleans up synonym-structure associations as well as chemical-MeSH associations. This consistency-based filtering process is designed to look for a consensus in name-structure associations but cannot attest to their correctness. As a result, it can fail to recognize correct name-structure associations (or incorrect ones), for example, when a synonym is provided by only one depositor or when many contributors are incorrect. However, this filtering process is an important starting point for quality control in name-structure associations in large chemical databases like PubChem.

Collapse

Eriksen CA, Andersen JL, Fagerberg R, Merkle D. Toward the Reconciliation of Inconsistent Molecular Structures from Biochemical Databases. J Comput Biol 2024;31:498-512. [PMID: 38758924 DOI: 10.1089/cmb.2024.0520] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/19/2024] Open

Du Y. Binding Curve Viewer: Visualizing the Equilibrium and Kinetics of Protein-Ligand Binding and Competitive Binding. J Chem Inf Model 2024;64:4180-4192. [PMID: 38720179 PMCID: PMC11134506 DOI: 10.1021/acs.jcim.4c00130] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2024] [Revised: 04/21/2024] [Accepted: 04/25/2024] [Indexed: 05/28/2024]

Mansouri K, Moreira-Filho JT, Lowe CN, Charest N, Martin T, Tkachenko V, Judson R, Conway M, Kleinstreuer NC, Williams AJ. Free and open-source QSAR-ready workflow for automated standardization of chemical structures in support of QSAR modeling. J Cheminform 2024;16:19. [PMID: 38378618 PMCID: PMC10880251 DOI: 10.1186/s13321-024-00814-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2023] [Accepted: 02/10/2024] [Indexed: 02/22/2024] Open

Abstract

The rapid increase of publicly available chemical structures and associated experimental data presents a valuable opportunity to build robust QSAR models for applications in different fields. However, the common concern is the quality of both the chemical structure information and associated experimental data. This is especially true when those data are collected from multiple sources as chemical substance mappings can contain many duplicate structures and molecular inconsistencies. Such issues can impact the resulting molecular descriptors and their mappings to experimental data and, subsequently, the quality of the derived models in terms of accuracy, repeatability, and reliability. Herein we describe the development of an automated workflow to standardize chemical structures according to a set of standard rules and generate two and/or three-dimensional "QSAR-ready" forms prior to the calculation of molecular descriptors. The workflow was designed in the KNIME workflow environment and consists of three high-level steps. First, a structure encoding is read, and then the resulting in-memory representation is cross-referenced with any existing identifiers for consistency. Finally, the structure is standardized using a series of operations including desalting, stripping of stereochemistry (for two-dimensional structures), standardization of tautomers and nitro groups, valence correction, neutralization when possible, and then removal of duplicates. This workflow was initially developed to support collaborative modeling QSAR projects to ensure consistency of the results from the different participants. It was then updated and generalized for other modeling applications. This included modification of the "QSAR-ready" workflow to generate "MS-ready structures" to support the generation of substance mappings and searches for software applications related to non-targeted analysis mass spectrometry. Both QSAR and MS-ready workflows are freely available in KNIME, via standalone versions on GitHub, and as docker container resources for the scientific community. Scientific contribution: This work pioneers an automated workflow in KNIME, systematically standardizing chemical structures to ensure their readiness for QSAR modeling and broader scientific applications. By addressing data quality concerns through desalting, stereochemistry stripping, and normalization, it optimizes molecular descriptors' accuracy and reliability. The freely available resources in KNIME, GitHub, and docker containers democratize access, benefiting collaborative research and advancing diverse modeling endeavors in chemistry and mass spectrometry.

Collapse

Gupta MK, Gouda G, Sultana S, Punekar SM, Vadde R, Ravikiran T. Structure-related relationship: Plant-derived antidiabetic compounds. STUDIES IN NATURAL PRODUCTS CHEMISTRY 2023:241-295. [DOI: 10.1016/b978-0-323-91294-5.00008-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/06/2023]

Li L, Zhang Z, Men Y, Baskaran S, Sangion A, Wang S, Arnot JA, Wania F. Retrieval, Selection, and Evaluation of Chemical Property Data for Assessments of Chemical Emissions, Fate, Hazard, Exposure, and Risks. ACS ENVIRONMENTAL AU 2022;2:376-395. [PMID: 37101455 PMCID: PMC10125307 DOI: 10.1021/acsenvironau.2c00010] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/02/2022] [Revised: 07/01/2022] [Accepted: 07/05/2022] [Indexed: 04/28/2023]

Dolciami D, Villasclaras-Fernandez E, Kannas C, Meniconi M, Al-Lazikani B, Antolin AA. canSAR chemistry registration and standardization pipeline. J Cheminform 2022;14:28. [PMID: 35643512 PMCID: PMC9148294 DOI: 10.1186/s13321-022-00606-7] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2022] [Accepted: 04/04/2022] [Indexed: 11/10/2022] Open

Abstract Abstract Background Integration of medicinal chemistry data from numerous public resources is an increasingly important part of academic drug discovery and translational research because it can bring a wealth of important knowledge related to compounds in one place. However, different data sources can report the same or related compounds in various forms (e.g., tautomers, racemates, etc.), thus highlighting the need of organising related compounds in hierarchies that alert the user on important bioactivity data that may be relevant. To generate these compound hierarchies, we have developed and implemented canSARchem, a new compound registration and standardization pipeline as part of the canSAR public knowledgebase. canSARchem builds on previously developed ChEMBL and PubChem pipelines and is developed using KNIME. We describe the pipeline which we make publicly available, and we provide examples on the strengths and limitations of the use of hierarchies for bioactivity data exploration. Finally, we identify canonicalization enrichment in FDA-approved drugs, illustrating the benefits of our approach. Results We created a chemical registration and standardization pipeline in KNIME and made it freely available to the research community. The pipeline consists of five steps to register the compounds and create the compounds’ hierarchy: 1. Structure checker, 2. Standardization, 3. Generation of canonical tautomers and representative structures, 4. Salt strip, and 5. Generation of abstract structure to generate the compound hierarchy. Unlike ChEMBL’s RDKit pipeline, we carry out compound canonicalization ahead of getting the parent structure, similar to PubChem’s OpenEye pipeline. canSARchem has a lower rejection rate compared to both PubChem and ChEMBL. We use our pipeline to assess the impact of grouping the compounds in hierarchies for bioactivity data exploration. We find that FDA-approved drugs show statistically significant sensitivity to canonicalization compared to the majority of bioactive compounds which demonstrates the importance of this step. Conclusions We use canSARchem to standardize all the compounds uploaded in canSAR (> 3 million) enabling efficient data integration and the rapid identification of alternative compound forms with useful bioactivity data. Comparison with PubChem and ChEMBL pipelines evidenced comparable performances in compound standardization, but only PubChem and canSAR canonicalize tautomers and canSAR has a slightly lower rejection rate. Our results highlight the importance of compound hierarchies for bioactivity data exploration. We make canSARchem available under a Creative Commons Attribution-ShareAlike 4.0 International License (CC BY-SA 4.0) at https://gitlab.icr.ac.uk/cansar-public/compound-registration-pipeline. Collapse

Jacobs A, Williams D, Hickey K, Patrick N, Williams AJ, Chalk S, McEwen L, Willighagen E, Walker M, Bolton E, Sinclair G, Sanford A. CAS Common Chemistry in 2021: Expanding Access to Trusted Chemical Information for the Scientific Community. J Chem Inf Model 2022;62:2737-2743. [PMID: 35559614 PMCID: PMC9199008 DOI: 10.1021/acs.jcim.2c00268] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]

Chang X, Tan YM, Allen DG, Bell S, Brown PC, Browning L, Ceger P, Gearhart J, Hakkinen PJ, Kabadi SV, Kleinstreuer NC, Lumen A, Matheson J, Paini A, Pangburn HA, Petersen EJ, Reinke EN, Ribeiro AJS, Sipes N, Sweeney LM, Wambaugh JF, Wange R, Wetmore BA, Mumtaz M. IVIVE: Facilitating the Use of In Vitro Toxicity Data in Risk Assessment and Decision Making. TOXICS 2022;10:232. [PMID: 35622645 PMCID: PMC9143724 DOI: 10.3390/toxics10050232] [Citation(s) in RCA: 35] [Impact Index Per Article: 17.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/25/2022] [Accepted: 04/24/2022] [Indexed: 02/04/2023]

Affiliation(s)

Xiaoqing Chang Inotiv-RTP, 601 Keystone Park Drive, Suite 200, Morrisville, NC 27560, USA; (X.C.); (D.G.A.); (S.B.); (L.B.); (P.C.)
Yu-Mei Tan U.S. Environmental Protection Agency, Office of Pesticide Programs, 109 T.W. Alexander Drive, Durham, NC 27709, USA;
David G. Allen Inotiv-RTP, 601 Keystone Park Drive, Suite 200, Morrisville, NC 27560, USA; (X.C.); (D.G.A.); (S.B.); (L.B.); (P.C.)
Shannon Bell Inotiv-RTP, 601 Keystone Park Drive, Suite 200, Morrisville, NC 27560, USA; (X.C.); (D.G.A.); (S.B.); (L.B.); (P.C.)
Paul C. Brown U.S. Food and Drug Administration, Center for Drug Evaluation and Research, 10903 New Hampshire Avenue, Silver Spring, MD 20903, USA; (P.C.B.); (A.J.S.R.); (R.W.)
Lauren Browning Inotiv-RTP, 601 Keystone Park Drive, Suite 200, Morrisville, NC 27560, USA; (X.C.); (D.G.A.); (S.B.); (L.B.); (P.C.)
Patricia Ceger Inotiv-RTP, 601 Keystone Park Drive, Suite 200, Morrisville, NC 27560, USA; (X.C.); (D.G.A.); (S.B.); (L.B.); (P.C.)
Jeffery Gearhart The Henry M. Jackson Foundation, Air Force Research Laboratory, 711 Human Performance Wing, Wright-Patterson Air Force Base, OH 45433, USA;
Pertti J. Hakkinen National Library of Medicine, National Center for Biotechnology Information, 8600 Rockville Pike, Bethesda, MD 20894, USA;
Shruti V. Kabadi U.S. Food and Drug Administration, Center for Food Safety and Applied Nutrition, Office of Food Additive Safety, 5001 Campus Drive, HFS-275, College Park, MD 20740, USA;
Nicole C. Kleinstreuer National Institute of Environmental Health Sciences, National Toxicology Program Interagency Center for the Evaluation of Alternative Toxicological Methods, P.O. Box 12233, Research Triangle Park, NC 27709, USA;
Annie Lumen U.S. Food and Drug Administration, National Center for Toxicological Research, 3900 NCTR Road, Jefferson, AR 72079, USA;
Joanna Matheson U.S. Consumer Product Safety Commission, Division of Toxicology and Risk Assessment, 5 Research Place, Rockville, MD 20850, USA;
Alicia Paini European Commission, Joint Research Centre (JRC), 21027 Ispra, Italy;
Heather A. Pangburn Air Force Research Laboratory, 711 Human Performance Wing, 2729 R Street, Area B, Building 837, Wright-Patterson Air Force Base, OH 45433, USA;
Elijah J. Petersen U.S. Department of Commerce, National Institute of Standards and Technology, 100 Bureau Drive, Gaithersburg, MD 20899, USA;
Emily N. Reinke U.S. Army Public Health Center, 8252 Blackhawk Rd., Aberdeen Proving Ground, MD 21010, USA;
Alexandre J. S. Ribeiro U.S. Food and Drug Administration, Center for Drug Evaluation and Research, 10903 New Hampshire Avenue, Silver Spring, MD 20903, USA; (P.C.B.); (A.J.S.R.); (R.W.)
Nisha Sipes U.S. Environmental Protection Agency, Center for Computational Toxicology and Exposure, 109 TW Alexander Dr., Research Triangle Park, NC 27711, USA; (N.S.); (J.F.W.); (B.A.W.)
Lisa M. Sweeney UES, Inc., 4401 Dayton-Xenia Road, Beavercreek, OH 45432, Assigned to Air Force Research Laboratory, 711 Human Performance Wing, Wright-Patterson Air Force Base, OH 45433, USA;
John F. Wambaugh U.S. Environmental Protection Agency, Center for Computational Toxicology and Exposure, 109 TW Alexander Dr., Research Triangle Park, NC 27711, USA; (N.S.); (J.F.W.); (B.A.W.)
Ronald Wange U.S. Food and Drug Administration, Center for Drug Evaluation and Research, 10903 New Hampshire Avenue, Silver Spring, MD 20903, USA; (P.C.B.); (A.J.S.R.); (R.W.)
Barbara A. Wetmore U.S. Environmental Protection Agency, Center for Computational Toxicology and Exposure, 109 TW Alexander Dr., Research Triangle Park, NC 27711, USA; (N.S.); (J.F.W.); (B.A.W.)
Moiz Mumtaz Agency for Toxic Substances and Disease Registry, Office of the Associate Director for Science, 1600 Clifton Road, S102-2, Atlanta, GA 30333, USA

Collapse

Kirstgen M, Müller SF, Lowjaga KAAT, Goldmann N, Lehmann F, Alakurtti S, Yli-Kauhaluoma J, Baringhaus KH, Krieg R, Glebe D, Geyer J. Identification of Novel HBV/HDV Entry Inhibitors by Pharmacophore- and QSAR-Guided Virtual Screening. Viruses 2021;13:v13081489. [PMID: 34452354 PMCID: PMC8402622 DOI: 10.3390/v13081489] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2021] [Revised: 07/19/2021] [Accepted: 07/24/2021] [Indexed: 12/17/2022] Open

Abstract

The hepatic bile acid transporter Na⁺/taurocholate co-transporting polypeptide (NTCP) was identified in 2012 as the high-affinity hepatic receptor for the hepatitis B and D viruses (HBV/HDV). Since then, this carrier has emerged as promising drug target for HBV/HDV virus entry inhibitors, but the synthetic peptide Hepcludex^® of high molecular weight is the only approved HDV entry inhibitor so far. The present study aimed to identify small molecules as novel NTCP inhibitors with anti-viral activity. A ligand-based bioinformatic approach was used to generate and validate appropriate pharmacophore and QSAR (quantitative structure–activity relationship) models. Half-maximal inhibitory concentrations (IC₅₀) for binding inhibition of the HBV/HDV-derived preS1 peptide (as surrogate parameter for virus binding to NTCP) were determined in NTCP-expressing HEK293 cells for 150 compounds of different chemical classes. IC₅₀ values ranged from 2 µM up to >1000 µM. The generated pharmacophore and QSAR models were used for virtual screening of drug-like chemicals from the ZINC¹⁵ database (~11 million compounds). The 20 best-performing compounds were then experimentally tested for preS1-peptide binding inhibition in NTCP-HEK293 cells. Among them, four compounds were active and revealed experimental IC₅₀ values for preS1-peptide binding inhibition of 9, 19, 20, and 35 µM, which were comparable to the QSAR-based predictions. All these compounds also significantly inhibited in vitro HDV infection of NTCP-HepG2 cells, without showing any cytotoxicity. The best-performing compound in all assays was ZINC000253533654. In conclusion, the present study demonstrates that virtual compound screening based on NTCP-specific pharmacophore and QSAR models can predict novel active hit compounds for the development of HBV/HDV entry inhibitors.

Collapse

Affiliation(s)

Michael Kirstgen Institute of Pharmacology and Toxicology, Faculty of Veterinary Medicine, Justus Liebig University Giessen, 35392 Giessen, Germany; (M.K.); (S.F.M.); (K.A.A.T.L.)
Simon Franz Müller Institute of Pharmacology and Toxicology, Faculty of Veterinary Medicine, Justus Liebig University Giessen, 35392 Giessen, Germany; (M.K.); (S.F.M.); (K.A.A.T.L.)
Kira Alessandra Alicia Theresa Lowjaga Institute of Pharmacology and Toxicology, Faculty of Veterinary Medicine, Justus Liebig University Giessen, 35392 Giessen, Germany; (M.K.); (S.F.M.); (K.A.A.T.L.)
Nora Goldmann Institute of Medical Virology, National Reference Center for Hepatitis B Viruses and Hepatitis D Viruses, Justus Liebig University Giessen, 35392 Giessen, Germany; (N.G.); (F.L.); (D.G.)
Felix Lehmann Institute of Medical Virology, National Reference Center for Hepatitis B Viruses and Hepatitis D Viruses, Justus Liebig University Giessen, 35392 Giessen, Germany; (N.G.); (F.L.); (D.G.)
Sami Alakurtti Drug Research Program, Division of Pharmaceutical Chemistry and Technology, Faculty of Pharmacy, University of Helsinki, Viikinkaari 5 E, FI-00014 Helsinki, Finland; (S.A.); (J.Y.-K.) VTT Technical Research Centre of Finland, Biologinkuja 7, FI-02044 Espoo, Finland
Jari Yli-Kauhaluoma Drug Research Program, Division of Pharmaceutical Chemistry and Technology, Faculty of Pharmacy, University of Helsinki, Viikinkaari 5 E, FI-00014 Helsinki, Finland; (S.A.); (J.Y.-K.)
Karl-Heinz Baringhaus Sanofi-Aventis Deutschland GmbH, 65926 Frankfurt, Germany;
Reimar Krieg Institute of Anatomy II, University Hospital Jena, Teichgraben 7, 07743 Jena, Germany;
Dieter Glebe Institute of Medical Virology, National Reference Center for Hepatitis B Viruses and Hepatitis D Viruses, Justus Liebig University Giessen, 35392 Giessen, Germany; (N.G.); (F.L.); (D.G.) German Center for Infection Research (DZIF), Partner Site Giessen-Marburg-Langen, 35392 Giessen, Germany
Joachim Geyer Institute of Pharmacology and Toxicology, Faculty of Veterinary Medicine, Justus Liebig University Giessen, 35392 Giessen, Germany; (M.K.); (S.F.M.); (K.A.A.T.L.) Correspondence: ; Tel.: +49-641-99-38404; Fax: +49-641-99-38409

Collapse

Santana K, do Nascimento LD, Lima e Lima A, Damasceno V, Nahum C, Braga RC, Lameira J. Applications of Virtual Screening in Bioprospecting: Facts, Shifts, and Perspectives to Explore the Chemo-Structural Diversity of Natural Products. Front Chem 2021;9:662688. [PMID: 33996755 PMCID: PMC8117418 DOI: 10.3389/fchem.2021.662688] [Citation(s) in RCA: 31] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2021] [Accepted: 02/25/2021] [Indexed: 12/22/2022] Open

Vaitkus A, Merkys A, Gražulis S. Validation of the Crystallography Open Database using the Crystallographic Information Framework. J Appl Crystallogr 2021;54:661-672. [PMID: 33953659 PMCID: PMC8056762 DOI: 10.1107/s1600576720016532] [Citation(s) in RCA: 42] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2020] [Accepted: 12/21/2020] [Indexed: 12/25/2022] Open

Chemoinformatics and QSAR. Adv Bioinformatics 2021. [DOI: 10.1007/978-981-33-6191-1_10] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022] Open

Achary PGR. Applications of Quantitative Structure-Activity Relationships (QSAR) based Virtual Screening in Drug Design: A Review. Mini Rev Med Chem 2020;20:1375-1388. [DOI: 10.2174/1389557520666200429102334] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2019] [Revised: 11/07/2019] [Accepted: 11/08/2019] [Indexed: 12/18/2022]

Abstract The scientists, and the researchers around the globe generate tremendous amount of information everyday; for instance, so far more than 74 million molecules are registered in Chemical Abstract Services. According to a recent study, at present we have around 1060 molecules, which are classified as new drug-like molecules. The library of such molecules is now considered as ‘dark chemical space’ or ‘dark chemistry.’ Now, in order to explore such hidden molecules scientifically, a good number of live and updated databases (protein, cell, tissues, structure, drugs, etc.) are available today. The synchronization of the three different sciences: ‘genomics’, proteomics and ‘in-silico simulation’ will revolutionize the process of drug discovery. The screening of a sizable number of drugs like molecules is a challenge and it must be treated in an efficient manner. Virtual screening (VS) is an important computational tool in the drug discovery process; however, experimental verification of the drugs also equally important for the drug development process. The quantitative structure-activity relationship (QSAR) analysis is one of the machine learning technique, which is extensively used in VS techniques. QSAR is well-known for its high and fast throughput screening with a satisfactory hit rate. The QSAR model building involves (i) chemo-genomics data collection from a database or literature (ii) Calculation of right descriptors from molecular representation (iii) establishing a relationship (model) between biological activity and the selected descriptors (iv) application of QSAR model to predict the biological property for the molecules. All the hits obtained by the VS technique needs to be experimentally verified. The present mini-review highlights: the web-based machine learning tools, the role of QSAR in VS techniques, successful applications of QSAR based VS leading to the drug discovery and advantages and challenges of QSAR. Collapse

Zhao L, Ciallella HL, Aleksunes LM, Zhu H. Advancing computer-aided drug discovery (CADD) by big data and data-driven machine learning modeling. Drug Discov Today 2020;25:1624-1638. [PMID: 32663517 PMCID: PMC7572559 DOI: 10.1016/j.drudis.2020.07.005] [Citation(s) in RCA: 66] [Impact Index Per Article: 16.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2020] [Revised: 06/26/2020] [Accepted: 07/06/2020] [Indexed: 02/06/2023]

Mansouri K, Kleinstreuer N, Abdelaziz AM, Alberga D, Alves VM, Andersson PL, Andrade CH, Bai F, Balabin I, Ballabio D, Benfenati E, Bhhatarai B, Boyer S, Chen J, Consonni V, Farag S, Fourches D, García-Sosa AT, Gramatica P, Grisoni F, Grulke CM, Hong H, Horvath D, Hu X, Huang R, Jeliazkova N, Li J, Li X, Liu H, Manganelli S, Mangiatordi GF, Maran U, Marcou G, Martin T, Muratov E, Nguyen DT, Nicolotti O, Nikolov NG, Norinder U, Papa E, Petitjean M, Piir G, Pogodin P, Poroikov V, Qiao X, Richard AM, Roncaglioni A, Ruiz P, Rupakheti C, Sakkiah S, Sangion A, Schramm KW, Selvaraj C, Shah I, Sild S, Sun L, Taboureau O, Tang Y, Tetko IV, Todeschini R, Tong W, Trisciuzzi D, Tropsha A, Van Den Driessche G, Varnek A, Wang Z, Wedebye EB, Williams AJ, Xie H, Zakharov AV, Zheng Z, Judson RS. CoMPARA: Collaborative Modeling Project for Androgen Receptor Activity. ENVIRONMENTAL HEALTH PERSPECTIVES 2020;128:27002. [PMID: 32074470 DOI: 10.23645/epacomptox.5176876] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 05/24/2023]

Abstract

BACKGROUND

Endocrine disrupting chemicals (EDCs) are xenobiotics that mimic the interaction of natural hormones and alter synthesis, transport, or metabolic pathways. The prospect of EDCs causing adverse health effects in humans and wildlife has led to the development of scientific and regulatory approaches for evaluating bioactivity. This need is being addressed using high-throughput screening (HTS) in vitro approaches and computational modeling.

OBJECTIVES

In support of the Endocrine Disruptor Screening Program, the U.S. Environmental Protection Agency (EPA) led two worldwide consortiums to virtually screen chemicals for their potential estrogenic and androgenic activities. Here, we describe the Collaborative Modeling Project for Androgen Receptor Activity (CoMPARA) efforts, which follows the steps of the Collaborative Estrogen Receptor Activity Prediction Project (CERAPP).

METHODS

The CoMPARA list of screened chemicals built on CERAPP's list of 32,464 chemicals to include additional chemicals of interest, as well as simulated ToxCast™ metabolites, totaling 55,450 chemical structures. Computational toxicology scientists from 25 international groups contributed 91 predictive models for binding, agonist, and antagonist activity predictions. Models were underpinned by a common training set of 1,746 chemicals compiled from a combined data set of 11 ToxCast™/Tox21 HTS in vitro assays.

RESULTS

The resulting models were evaluated using curated literature data extracted from different sources. To overcome the limitations of single-model approaches, CoMPARA predictions were combined into consensus models that provided averaged predictive accuracy of approximately 80% for the evaluation set.

DISCUSSION

The strengths and limitations of the consensus predictions were discussed with example chemicals; then, the models were implemented into the free and open-source OPERA application to enable screening of new chemicals with a defined applicability domain and accuracy assessment. This implementation was used to screen the entire EPA DSSTox database of ∼875,000 chemicals, and their predicted AR activities have been made available on the EPA CompTox Chemicals dashboard and National Toxicology Program's Integrated Chemical Environment. https://doi.org/10.1289/EHP5580.

Collapse

Affiliation(s)

Kamel Mansouri National Center for Computational Toxicology, Office of Research and Development, U.S. Environmental Protection Agency (U.S. EPA), Research Triangle Park, North Carolina, USA ScitoVation LLC, Research Triangle Park, North Carolina, USA Integrated Laboratory Systems, Inc., Morrisville, North Carolina, USA
Nicole Kleinstreuer National Toxicology Program Interagency Center for the Evaluation of Alternative Toxicological Methods (NICEATM), National Institute of Environmental Health Sciences, Research Triangle Park, North Carolina, USA
Ahmed M Abdelaziz Technische Universität München, Wissenschaftszentrum Weihenstephan für Ernährung, Landnutzung und Umwelt, Department für Biowissenschaftliche Grundlagen, Weihenstephaner Steig 23, 85350 Freising, Germany
Domenico Alberga Department of Pharmacy-Drug Sciences, University of Bari, Bari, Italy
Vinicius M Alves Laboratory for Molecular Modeling and Drug Design, Faculty of Pharmacy, Federal University of Goiás, Goiânia, Brazil Laboratory for Molecular Modeling, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
Patrik L Andersson Chemistry Department, Umeå University, Umeå, Sweden
Carolina H Andrade Laboratory for Molecular Modeling and Drug Design, Faculty of Pharmacy, Federal University of Goiás, Goiânia, Brazil
Fang Bai School of Pharmacy, Lanzhou University, China
Ilya Balabin Information Systems & Global Solutions (IS&GS), Lockheed Martin, USA
Davide Ballabio Milano Chemometrics and QSAR Research Group, Department of Earth and Environmental Sciences, University of Milano-Bicocca, Milan, Italy
Emilio Benfenati Istituto di Ricerche Farmacologiche "Mario Negri", IRCCS, Milan, Italy
Barun Bhhatarai QSAR Research Unit in Environmental Chemistry and Ecotoxicology, Department of Theoretical and Applied Sciences, University of Insubria, Varese, Italy
Scott Boyer Swedish Toxicology Sciences Research Center, Karolinska Institutet, Södertälje, Sweden
Jingwen Chen School of Environmental Science and Technology, Dalian University of Technology, Dalian, China
Viviana Consonni Milano Chemometrics and QSAR Research Group, Department of Earth and Environmental Sciences, University of Milano-Bicocca, Milan, Italy
Sherif Farag Laboratory for Molecular Modeling, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
Denis Fourches Department of Chemistry, Bioinformatics Research Center, North Carolina State University, Raleigh, North Carolina, USA
Alfonso T García-Sosa Institute of Chemistry, University of Tartu, Tartu, Estonia
Paola Gramatica QSAR Research Unit in Environmental Chemistry and Ecotoxicology, Department of Theoretical and Applied Sciences, University of Insubria, Varese, Italy
Francesca Grisoni Milano Chemometrics and QSAR Research Group, Department of Earth and Environmental Sciences, University of Milano-Bicocca, Milan, Italy
Chris M Grulke National Center for Computational Toxicology, Office of Research and Development, U.S. Environmental Protection Agency (U.S. EPA), Research Triangle Park, North Carolina, USA
Huixiao Hong Division of Bioinformatics and Biostatistics, National Center for Toxicology Research, U.S. Food and Drug Administration, Jefferson, Arkansas, USA
Dragos Horvath Laboratoire de Chémoinformatique-UMR7140, University of Strasbourg/CNRS, Strasbourg, France
Xin Hu National Center for Advancing Translational Sciences, National Institutes of Health, Rockville, Maryland, USA
Ruili Huang National Center for Advancing Translational Sciences, National Institutes of Health, Rockville, Maryland, USA
Nina Jeliazkova IdeaConsult, Ltd., Sofia, Bulgaria
Jiazhong Li School of Pharmacy, Lanzhou University, China
Xuehua Li School of Environmental Science and Technology, Dalian University of Technology, Dalian, China
Huanxiang Liu School of Pharmacy, Lanzhou University, China
Serena Manganelli Istituto di Ricerche Farmacologiche "Mario Negri", IRCCS, Milan, Italy
Giuseppe F Mangiatordi Department of Pharmacy-Drug Sciences, University of Bari, Bari, Italy
Uko Maran Institute of Chemistry, University of Tartu, Tartu, Estonia
Gilles Marcou Laboratoire de Chémoinformatique-UMR7140, University of Strasbourg/CNRS, Strasbourg, France
Todd Martin National Risk Management Research Laboratory, U.S. EPA, Cincinnati, Ohio, USA
Eugene Muratov Laboratory for Molecular Modeling, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
Dac-Trung Nguyen National Center for Advancing Translational Sciences, National Institutes of Health, Rockville, Maryland, USA
Orazio Nicolotti Department of Pharmacy-Drug Sciences, University of Bari, Bari, Italy
Nikolai G Nikolov Division of Risk Assessment and Nutrition, National Food Institute, Technical University of Denmark, Copenhagen, Denmark
Ulf Norinder Swedish Toxicology Sciences Research Center, Karolinska Institutet, Södertälje, Sweden
Ester Papa QSAR Research Unit in Environmental Chemistry and Ecotoxicology, Department of Theoretical and Applied Sciences, University of Insubria, Varese, Italy
Michel Petitjean Computational Modeling of Protein-Ligand Interactions (CMPLI)-INSERM UMR 8251, INSERM ERL U1133, Functional and Adaptative Biology (BFA), Universite de Paris, Paris, France
Geven Piir Institute of Chemistry, University of Tartu, Tartu, Estonia
Pavel Pogodin Institute of Biomedical Chemistry IBMC, 10 Building 8, Pogodinskaya st., Moscow 119121, Russia
Vladimir Poroikov Institute of Biomedical Chemistry IBMC, 10 Building 8, Pogodinskaya st., Moscow 119121, Russia
Xianliang Qiao School of Environmental Science and Technology, Dalian University of Technology, Dalian, China
Ann M Richard National Center for Computational Toxicology, Office of Research and Development, U.S. Environmental Protection Agency (U.S. EPA), Research Triangle Park, North Carolina, USA
Alessandra Roncaglioni Istituto di Ricerche Farmacologiche "Mario Negri", IRCCS, Milan, Italy
Patricia Ruiz Computational Toxicology and Methods Development Laboratory, Division of Toxicology and Human Health Sciences, Agency for Toxic Substances and Disease Registry, Centers for Disease Control and Prevention, Atlanta, Georgia, USA
Chetan Rupakheti National Risk Management Research Laboratory, U.S. EPA, Cincinnati, Ohio, USA Department of Biochemistry and Molecular Biophysics, University of Chicago, Chicago, Illinois, USA
Sugunadevi Sakkiah Division of Bioinformatics and Biostatistics, National Center for Toxicology Research, U.S. Food and Drug Administration, Jefferson, Arkansas, USA
Alessandro Sangion QSAR Research Unit in Environmental Chemistry and Ecotoxicology, Department of Theoretical and Applied Sciences, University of Insubria, Varese, Italy
Karl-Werner Schramm Technische Universität München, Wissenschaftszentrum Weihenstephan für Ernährung, Landnutzung und Umwelt, Department für Biowissenschaftliche Grundlagen, Weihenstephaner Steig 23, 85350 Freising, Germany
Chandrabose Selvaraj Division of Bioinformatics and Biostatistics, National Center for Toxicology Research, U.S. Food and Drug Administration, Jefferson, Arkansas, USA
Imran Shah National Center for Computational Toxicology, Office of Research and Development, U.S. Environmental Protection Agency (U.S. EPA), Research Triangle Park, North Carolina, USA
Sulev Sild Institute of Chemistry, University of Tartu, Tartu, Estonia
Lixia Sun Department of Pharmaceutical Sciences, School of Pharmacy, East China University of Science and Technology, Shanghai, China
Olivier Taboureau Computational Modeling of Protein-Ligand Interactions (CMPLI)-INSERM UMR 8251, INSERM ERL U1133, Functional and Adaptative Biology (BFA), Universite de Paris, Paris, France
Yun Tang Department of Pharmaceutical Sciences, School of Pharmacy, East China University of Science and Technology, Shanghai, China
Igor V Tetko BIGCHEM GmbH, Neuherberg, Germany Helmholtz Zentrum Muenchen - German Research Center for Environmental Health (GmbH), Neuherberg, Germany
Roberto Todeschini Milano Chemometrics and QSAR Research Group, Department of Earth and Environmental Sciences, University of Milano-Bicocca, Milan, Italy
Weida Tong Division of Bioinformatics and Biostatistics, National Center for Toxicology Research, U.S. Food and Drug Administration, Jefferson, Arkansas, USA
Daniela Trisciuzzi Department of Pharmacy-Drug Sciences, University of Bari, Bari, Italy
Alexander Tropsha Laboratory for Molecular Modeling, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
George Van Den Driessche Department of Chemistry, Bioinformatics Research Center, North Carolina State University, Raleigh, North Carolina, USA
Alexandre Varnek Laboratoire de Chémoinformatique-UMR7140, University of Strasbourg/CNRS, Strasbourg, France
Zhongyu Wang School of Environmental Science and Technology, Dalian University of Technology, Dalian, China
Eva B Wedebye Division of Risk Assessment and Nutrition, National Food Institute, Technical University of Denmark, Copenhagen, Denmark
Antony J Williams National Center for Computational Toxicology, Office of Research and Development, U.S. Environmental Protection Agency (U.S. EPA), Research Triangle Park, North Carolina, USA
Hongbin Xie School of Environmental Science and Technology, Dalian University of Technology, Dalian, China
Alexey V Zakharov National Center for Advancing Translational Sciences, National Institutes of Health, Rockville, Maryland, USA
Ziye Zheng Chemistry Department, Umeå University, Umeå, Sweden
Richard S Judson National Center for Computational Toxicology, Office of Research and Development, U.S. Environmental Protection Agency (U.S. EPA), Research Triangle Park, North Carolina, USA

Collapse

Mansouri K, Kleinstreuer N, Abdelaziz AM, Alberga D, Alves VM, Andersson PL, Andrade CH, Bai F, Balabin I, Ballabio D, Benfenati E, Bhhatarai B, Boyer S, Chen J, Consonni V, Farag S, Fourches D, García-Sosa AT, Gramatica P, Grisoni F, Grulke CM, Hong H, Horvath D, Hu X, Huang R, Jeliazkova N, Li J, Li X, Liu H, Manganelli S, Mangiatordi GF, Maran U, Marcou G, Martin T, Muratov E, Nguyen DT, Nicolotti O, Nikolov NG, Norinder U, Papa E, Petitjean M, Piir G, Pogodin P, Poroikov V, Qiao X, Richard AM, Roncaglioni A, Ruiz P, Rupakheti C, Sakkiah S, Sangion A, Schramm KW, Selvaraj C, Shah I, Sild S, Sun L, Taboureau O, Tang Y, Tetko IV, Todeschini R, Tong W, Trisciuzzi D, Tropsha A, Van Den Driessche G, Varnek A, Wang Z, Wedebye EB, Williams AJ, Xie H, Zakharov AV, Zheng Z, Judson RS. CoMPARA: Collaborative Modeling Project for Androgen Receptor Activity. ENVIRONMENTAL HEALTH PERSPECTIVES 2020;128:27002. [PMID: 32074470 PMCID: PMC7064318 DOI: 10.1289/ehp5580] [Citation(s) in RCA: 96] [Impact Index Per Article: 24.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/06/2019] [Revised: 11/27/2019] [Accepted: 12/05/2019] [Indexed: 05/04/2023]

Abstract

BACKGROUND

OBJECTIVES

METHODS

RESULTS

DISCUSSION

The strengths and limitations of the consensus predictions were discussed with example chemicals; then, the models were implemented into the free and open-source OPERA application to enable screening of new chemicals with a defined applicability domain and accuracy assessment. This implementation was used to screen the entire EPA DSSTox database of ∼ 875,000 chemicals, and their predicted AR activities have been made available on the EPA CompTox Chemicals dashboard and National Toxicology Program's Integrated Chemical Environment. https://doi.org/10.1289/EHP5580.

Collapse

Affiliation(s)

Kamel Mansouri National Center for Computational Toxicology, Office of Research and Development, U.S. Environmental Protection Agency (U.S. EPA), Research Triangle Park, North Carolina, USA ScitoVation LLC, Research Triangle Park, North Carolina, USA Integrated Laboratory Systems, Inc., Morrisville, North Carolina, USA
Nicole Kleinstreuer National Toxicology Program Interagency Center for the Evaluation of Alternative Toxicological Methods (NICEATM), National Institute of Environmental Health Sciences, Research Triangle Park, North Carolina, USA
Ahmed M. Abdelaziz Technische Universität München, Wissenschaftszentrum Weihenstephan für Ernährung, Landnutzung und Umwelt, Department für Biowissenschaftliche Grundlagen, Weihenstephaner Steig 23, 85350 Freising, Germany
Domenico Alberga Department of Pharmacy-Drug Sciences, University of Bari, Bari, Italy
Vinicius M. Alves Laboratory for Molecular Modeling and Drug Design, Faculty of Pharmacy, Federal University of Goiás, Goiânia, Brazil Laboratory for Molecular Modeling, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
Patrik L. Andersson Chemistry Department, Umeå University, Umeå, Sweden
Carolina H. Andrade Laboratory for Molecular Modeling and Drug Design, Faculty of Pharmacy, Federal University of Goiás, Goiânia, Brazil
Fang Bai School of Pharmacy, Lanzhou University, China
Ilya Balabin Information Systems & Global Solutions (IS&GS), Lockheed Martin, USA
Davide Ballabio Milano Chemometrics and QSAR Research Group, Department of Earth and Environmental Sciences, University of Milano-Bicocca, Milan, Italy
Emilio Benfenati Istituto di Ricerche Farmacologiche “Mario Negri”, IRCCS, Milan, Italy
Barun Bhhatarai QSAR Research Unit in Environmental Chemistry and Ecotoxicology, Department of Theoretical and Applied Sciences, University of Insubria, Varese, Italy
Scott Boyer Swedish Toxicology Sciences Research Center, Karolinska Institutet, Södertälje, Sweden
Jingwen Chen School of Environmental Science and Technology, Dalian University of Technology, Dalian, China
Viviana Consonni Milano Chemometrics and QSAR Research Group, Department of Earth and Environmental Sciences, University of Milano-Bicocca, Milan, Italy
Sherif Farag Laboratory for Molecular Modeling, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
Denis Fourches Department of Chemistry, Bioinformatics Research Center, North Carolina State University, Raleigh, North Carolina, USA
Alfonso T. García-Sosa Institute of Chemistry, University of Tartu, Tartu, Estonia
Paola Gramatica QSAR Research Unit in Environmental Chemistry and Ecotoxicology, Department of Theoretical and Applied Sciences, University of Insubria, Varese, Italy
Francesca Grisoni Milano Chemometrics and QSAR Research Group, Department of Earth and Environmental Sciences, University of Milano-Bicocca, Milan, Italy
Chris M. Grulke National Center for Computational Toxicology, Office of Research and Development, U.S. Environmental Protection Agency (U.S. EPA), Research Triangle Park, North Carolina, USA
Huixiao Hong Division of Bioinformatics and Biostatistics, National Center for Toxicology Research, U.S. Food and Drug Administration, Jefferson, Arkansas, USA
Dragos Horvath Laboratoire de Chémoinformatique—UMR7140, University of Strasbourg/CNRS, Strasbourg, France
Xin Hu National Center for Advancing Translational Sciences, National Institutes of Health, Rockville, Maryland, USA
Ruili Huang National Center for Advancing Translational Sciences, National Institutes of Health, Rockville, Maryland, USA
Nina Jeliazkova IdeaConsult, Ltd., Sofia, Bulgaria
Jiazhong Li School of Pharmacy, Lanzhou University, China
Xuehua Li School of Environmental Science and Technology, Dalian University of Technology, Dalian, China
Huanxiang Liu School of Pharmacy, Lanzhou University, China
Serena Manganelli Istituto di Ricerche Farmacologiche “Mario Negri”, IRCCS, Milan, Italy
Giuseppe F. Mangiatordi Department of Pharmacy-Drug Sciences, University of Bari, Bari, Italy
Uko Maran Institute of Chemistry, University of Tartu, Tartu, Estonia
Gilles Marcou Laboratoire de Chémoinformatique—UMR7140, University of Strasbourg/CNRS, Strasbourg, France
Todd Martin National Risk Management Research Laboratory, U.S. EPA, Cincinnati, Ohio, USA
Eugene Muratov Laboratory for Molecular Modeling, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
Dac-Trung Nguyen National Center for Advancing Translational Sciences, National Institutes of Health, Rockville, Maryland, USA
Orazio Nicolotti Department of Pharmacy-Drug Sciences, University of Bari, Bari, Italy
Nikolai G. Nikolov Division of Risk Assessment and Nutrition, National Food Institute, Technical University of Denmark, Copenhagen, Denmark
Ulf Norinder Swedish Toxicology Sciences Research Center, Karolinska Institutet, Södertälje, Sweden
Ester Papa QSAR Research Unit in Environmental Chemistry and Ecotoxicology, Department of Theoretical and Applied Sciences, University of Insubria, Varese, Italy
Michel Petitjean Computational Modeling of Protein-Ligand Interactions (CMPLI)–INSERM UMR 8251, INSERM ERL U1133, Functional and Adaptative Biology (BFA), Universite de Paris, Paris, France
Geven Piir Institute of Chemistry, University of Tartu, Tartu, Estonia
Pavel Pogodin Institute of Biomedical Chemistry IBMC, 10 Building 8, Pogodinskaya st., Moscow 119121, Russia
Vladimir Poroikov Institute of Biomedical Chemistry IBMC, 10 Building 8, Pogodinskaya st., Moscow 119121, Russia
Xianliang Qiao School of Environmental Science and Technology, Dalian University of Technology, Dalian, China
Ann M. Richard National Center for Computational Toxicology, Office of Research and Development, U.S. Environmental Protection Agency (U.S. EPA), Research Triangle Park, North Carolina, USA
Alessandra Roncaglioni Istituto di Ricerche Farmacologiche “Mario Negri”, IRCCS, Milan, Italy
Patricia Ruiz Computational Toxicology and Methods Development Laboratory, Division of Toxicology and Human Health Sciences, Agency for Toxic Substances and Disease Registry, Centers for Disease Control and Prevention, Atlanta, Georgia, USA
Chetan Rupakheti National Risk Management Research Laboratory, U.S. EPA, Cincinnati, Ohio, USA Department of Biochemistry and Molecular Biophysics, University of Chicago, Chicago, Illinois, USA
Sugunadevi Sakkiah Division of Bioinformatics and Biostatistics, National Center for Toxicology Research, U.S. Food and Drug Administration, Jefferson, Arkansas, USA
Alessandro Sangion QSAR Research Unit in Environmental Chemistry and Ecotoxicology, Department of Theoretical and Applied Sciences, University of Insubria, Varese, Italy
Karl-Werner Schramm Technische Universität München, Wissenschaftszentrum Weihenstephan für Ernährung, Landnutzung und Umwelt, Department für Biowissenschaftliche Grundlagen, Weihenstephaner Steig 23, 85350 Freising, Germany
Chandrabose Selvaraj Division of Bioinformatics and Biostatistics, National Center for Toxicology Research, U.S. Food and Drug Administration, Jefferson, Arkansas, USA
Imran Shah National Center for Computational Toxicology, Office of Research and Development, U.S. Environmental Protection Agency (U.S. EPA), Research Triangle Park, North Carolina, USA
Sulev Sild Institute of Chemistry, University of Tartu, Tartu, Estonia
Lixia Sun Department of Pharmaceutical Sciences, School of Pharmacy, East China University of Science and Technology, Shanghai, China
Olivier Taboureau Computational Modeling of Protein-Ligand Interactions (CMPLI)–INSERM UMR 8251, INSERM ERL U1133, Functional and Adaptative Biology (BFA), Universite de Paris, Paris, France
Yun Tang Department of Pharmaceutical Sciences, School of Pharmacy, East China University of Science and Technology, Shanghai, China
Igor V. Tetko BIGCHEM GmbH, Neuherberg, Germany Helmholtz Zentrum Muenchen – German Research Center for Environmental Health (GmbH), Neuherberg, Germany
Roberto Todeschini Milano Chemometrics and QSAR Research Group, Department of Earth and Environmental Sciences, University of Milano-Bicocca, Milan, Italy
Weida Tong Division of Bioinformatics and Biostatistics, National Center for Toxicology Research, U.S. Food and Drug Administration, Jefferson, Arkansas, USA
Daniela Trisciuzzi Department of Pharmacy-Drug Sciences, University of Bari, Bari, Italy
Alexander Tropsha Laboratory for Molecular Modeling, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
George Van Den Driessche Department of Chemistry, Bioinformatics Research Center, North Carolina State University, Raleigh, North Carolina, USA
Alexandre Varnek Laboratoire de Chémoinformatique—UMR7140, University of Strasbourg/CNRS, Strasbourg, France
Zhongyu Wang School of Environmental Science and Technology, Dalian University of Technology, Dalian, China
Eva B. Wedebye Division of Risk Assessment and Nutrition, National Food Institute, Technical University of Denmark, Copenhagen, Denmark
Antony J. Williams National Center for Computational Toxicology, Office of Research and Development, U.S. Environmental Protection Agency (U.S. EPA), Research Triangle Park, North Carolina, USA
Hongbin Xie School of Environmental Science and Technology, Dalian University of Technology, Dalian, China
Alexey V. Zakharov National Center for Advancing Translational Sciences, National Institutes of Health, Rockville, Maryland, USA
Ziye Zheng Chemistry Department, Umeå University, Umeå, Sweden
Richard S. Judson National Center for Computational Toxicology, Office of Research and Development, U.S. Environmental Protection Agency (U.S. EPA), Research Triangle Park, North Carolina, USA

Collapse

Ambure P, Cordeiro MNDS. Importance of Data Curation in QSAR Studies Especially While Modeling Large-Size Datasets. METHODS IN PHARMACOLOGY AND TOXICOLOGY 2020. [DOI: 10.1007/978-1-0716-0150-1_5] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]

Grulke CM, Williams AJ, Thillanadarajah I, Richard AM. EPA's DSSTox database: History of development of a curated chemistry resource supporting computational toxicology research. ACTA ACUST UNITED AC 2019;12. [PMID: 33426407 PMCID: PMC7787967 DOI: 10.1016/j.comtox.2019.100096] [Citation(s) in RCA: 93] [Impact Index Per Article: 18.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]

Abstract

The US Environmental Protection Agency's (EPA) Distributed Structure-Searchable Toxicity (DSSTox) database, launched publicly in 2004, currently exceeds 875 K substances spanning hundreds of lists of interest to EPA and environmental researchers. From its inception, DSSTox has focused curation efforts on resolving chemical identifier errors and conflicts in the public domain towards the goal of assigning accurate chemical structures to data and lists of importance to the environmental research and regulatory community. Accurate structure-data associations, in turn, are necessary inputs to structure-based predictive models supporting hazard and risk assessments. In 2014, the legacy, manually curated DSSTox_V1 content was migrated to a MySQL data model, with modern cheminformatics tools supporting both manual and automated curation processes to increase efficiencies. This was followed by sequential auto-loads of filtered portions of three public datasets: EPA's Substance Registry Services (SRS), the National Library of Medicine's ChemID, and PubChem. This process was constrained by a key requirement of uniquely mapped identifiers (i.e., CAS RN, name and structure) for each substance, rejecting content where any two identifiers were conflicted either within or across datasets. This rejected content highlighted the degree of conflicting, inaccurate substance-structure ID mappings in the public domain, ranging from 12% (within EPA SRS) to 49% (across ChemID and PubChem). Substances successfully added to DSSTox from each auto-load were assigned to one of five qc_levels, conveying curator confidence in each dataset. This process enabled a significant expansion of DSSTox content to provide better coverage of the chemical landscape of interest to environmental scientists, while retaining focus on the accuracy of substance-structure-data associations. Currently, DSSTox serves as the core foundation of EPA's CompTox Chemicals Dashboard [https://comptox.epa.gov/dashboard], which provides public access to DSSTox content in support of a broad range of modeling and research activities within EPA and, increasingly, across the field of computational toxicology.

Collapse

Fan F, Toledo Warshaviak D, Hamadeh HK, Dunn RT. The integration of pharmacophore-based 3D QSAR modeling and virtual screening in safety profiling: A case study to identify antagonistic activities against adenosine receptor, A2A, using 1,897 known drugs. PLoS One 2019;14:e0204378. [PMID: 30605479 PMCID: PMC6317804 DOI: 10.1371/journal.pone.0204378] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2018] [Accepted: 12/12/2018] [Indexed: 12/23/2022] Open

Abstract

Safety pharmacology screening against a wide range of unintended vital targets using in vitro assays is crucial to understand off-target interactions with drug candidates. With the increasing demand for in vitro assays, ligand- and structure-based virtual screening approaches have been evaluated for potential utilization in safety profiling. Although ligand based approaches have been actively applied in retrospective analysis or prospectively within well-defined chemical space during the early discovery stage (i.e., HTS screening and lead optimization), virtual screening is rarely implemented in later stage of drug discovery (i.e., safety). Here we present a case study to evaluate ligand-based 3D QSAR models built based on in vitro antagonistic activity data against adenosine receptor 2A (A2A). The resulting models, obtained from 268 chemically diverse compounds, were used to test a set of 1,897 chemically distinct drugs, simulating the real-world challenge of safety screening when presented with novel chemistry and a limited training set. Due to the unique requirements of safety screening versus discovery screening, the limitations of 3D QSAR methods (i.e., chemotypes, dependence on large training set, and prone to false positives) are less critical than early discovery screen. We demonstrated that 3D QSAR modeling can be effectively applied in safety assessment prior to in vitro assays, even with chemotypes that are drastically different from training compounds. It is also worth noting that our model is able to adequately make the mechanistic distinction between agonists and antagonists, which is important to inform subsequent in vivo studies. Overall, we present an in-depth analysis of the appropriate utilization and interpretation of pharmacophore-based 3D QSAR models for safety screening.

Collapse

Neves BJ, Braga RC, Melo-Filho CC, Moreira-Filho JT, Muratov EN, Andrade CH. QSAR-Based Virtual Screening: Advances and Applications in Drug Discovery. Front Pharmacol 2018;9:1275. [PMID: 30524275 PMCID: PMC6262347 DOI: 10.3389/fphar.2018.01275] [Citation(s) in RCA: 190] [Impact Index Per Article: 31.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2018] [Accepted: 10/18/2018] [Indexed: 02/03/2023] Open

Sobus JR, Wambaugh JF, Isaacs KK, Williams AJ, McEachran AD, Richard AM, Grulke CM, Ulrich EM, Rager JE, Strynar MJ, Newton SR. Integrating tools for non-targeted analysis research and chemical safety evaluations at the US EPA. JOURNAL OF EXPOSURE SCIENCE & ENVIRONMENTAL EPIDEMIOLOGY 2018;28:411-426. [PMID: 29288256 PMCID: PMC6661898 DOI: 10.1038/s41370-017-0012-y] [Citation(s) in RCA: 134] [Impact Index Per Article: 22.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/27/2017] [Revised: 08/04/2017] [Accepted: 08/25/2017] [Indexed: 05/18/2023]

Affiliation(s)

Jon R Sobus U.S. Environmental Protection Agency, Office of Research and Development, National Exposure Research Laboratory, 109 T.W. Alexander Drive, Research Triangle Park, NC, 27709, USA.
John F Wambaugh U.S. Environmental Protection Agency, Office of Research and Development, National Center for Computational Toxicology, 109 T.W. Alexander Drive, Research Triangle Park, NC, 27709, USA
Kristin K Isaacs U.S. Environmental Protection Agency, Office of Research and Development, National Exposure Research Laboratory, 109 T.W. Alexander Drive, Research Triangle Park, NC, 27709, USA
Antony J Williams U.S. Environmental Protection Agency, Office of Research and Development, National Center for Computational Toxicology, 109 T.W. Alexander Drive, Research Triangle Park, NC, 27709, USA
Andrew D McEachran Oak Ridge Institute for Science and Education (ORISE) Participant, 109 T.W. Alexander Drive, Research Triangle Park, NC, 27709, USA
Ann M Richard U.S. Environmental Protection Agency, Office of Research and Development, National Center for Computational Toxicology, 109 T.W. Alexander Drive, Research Triangle Park, NC, 27709, USA
Christopher M Grulke U.S. Environmental Protection Agency, Office of Research and Development, National Center for Computational Toxicology, 109 T.W. Alexander Drive, Research Triangle Park, NC, 27709, USA
Elin M Ulrich U.S. Environmental Protection Agency, Office of Research and Development, National Exposure Research Laboratory, 109 T.W. Alexander Drive, Research Triangle Park, NC, 27709, USA
Julia E Rager Oak Ridge Institute for Science and Education (ORISE) Participant, 109 T.W. Alexander Drive, Research Triangle Park, NC, 27709, USA ToxStrategies, Inc., 9390 Research Blvd., Suite 100, Austin, TX, 78759, USA
Mark J Strynar U.S. Environmental Protection Agency, Office of Research and Development, National Exposure Research Laboratory, 109 T.W. Alexander Drive, Research Triangle Park, NC, 27709, USA
Seth R Newton U.S. Environmental Protection Agency, Office of Research and Development, National Exposure Research Laboratory, 109 T.W. Alexander Drive, Research Triangle Park, NC, 27709, USA

Collapse

Basith S, Cui M, Macalino SJY, Park J, Clavio NAB, Kang S, Choi S. Exploring G Protein-Coupled Receptors (GPCRs) Ligand Space via Cheminformatics Approaches: Impact on Rational Drug Design. Front Pharmacol 2018;9:128. [PMID: 29593527 PMCID: PMC5854945 DOI: 10.3389/fphar.2018.00128] [Citation(s) in RCA: 74] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2017] [Accepted: 02/06/2018] [Indexed: 01/14/2023] Open

Patel M, Chilton ML, Sartini A, Gibson L, Barber C, Covey-Crump L, Przybylak KR, Cronin MTD, Madden JC. Assessment and Reproducibility of Quantitative Structure–Activity Relationship Models by the Nonexpert. J Chem Inf Model 2018;58:673-682. [DOI: 10.1021/acs.jcim.7b00523] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]

Williams AJ, Grulke CM, Edwards J, McEachran AD, Mansouri K, Baker NC, Patlewicz G, Shah I, Wambaugh JF, Judson RS, Richard AM. The CompTox Chemistry Dashboard: a community data resource for environmental chemistry. J Cheminform 2017;9:61. [PMID: 29185060 PMCID: PMC5705535 DOI: 10.1186/s13321-017-0247-6] [Citation(s) in RCA: 554] [Impact Index Per Article: 79.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2017] [Accepted: 11/18/2017] [Indexed: 11/10/2022] Open

Abstract

Despite an abundance of online databases providing access to chemical data, there is increasing demand for high-quality, structure-curated, open data to meet the various needs of the environmental sciences and computational toxicology communities. The U.S. Environmental Protection Agency's (EPA) web-based CompTox Chemistry Dashboard is addressing these needs by integrating diverse types of relevant domain data through a cheminformatics layer, built upon a database of curated substances linked to chemical structures. These data include physicochemical, environmental fate and transport, exposure, usage, in vivo toxicity, and in vitro bioassay data, surfaced through an integration hub with link-outs to additional EPA data and public domain online resources. Batch searching allows for direct chemical identifier (ID) mapping and downloading of multiple data streams in several different formats. This facilitates fast access to available structure, property, toxicity, and bioassay data for collections of chemicals (hundreds to thousands at a time). Advanced search capabilities are available to support, for example, non-targeted analysis and identification of chemicals using mass spectrometry. The contents of the chemistry database, presently containing ~ 760,000 substances, are available as public domain data for download. The chemistry content underpinning the Dashboard has been aggregated over the past 15 years by both manual and auto-curation techniques within EPA's DSSTox project. DSSTox chemical content is subject to strict quality controls to enforce consistency among chemical substance-structure identifiers, as well as list curation review to ensure accurate linkages of DSSTox substances to chemical lists and associated data. The Dashboard, publicly launched in April 2016, has expanded considerably in content and user traffic over the past year. It is continuously evolving with the growth of DSSTox into high-interest or data-rich domains of interest to EPA, such as chemicals on the Toxic Substances Control Act listing, while providing the user community with a flexible and dynamic web-based platform for integration, processing, visualization and delivery of data and resources. The Dashboard provides support for a broad array of research and regulatory programs across the worldwide community of toxicologists and environmental scientists.

Collapse

Minkiewicz P, Iwaniak A, Darewicz M. Annotation of Peptide Structures Using SMILES and Other Chemical Codes-Practical Solutions. Molecules 2017;22:molecules22122075. [PMID: 29186902 PMCID: PMC6149970 DOI: 10.3390/molecules22122075] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2017] [Revised: 11/15/2017] [Accepted: 11/25/2017] [Indexed: 12/20/2022] Open

Liu J, Patlewicz G, Williams AJ, Thomas RS, Shah I. Predicting Organ Toxicity Using in Vitro Bioactivity Data and Chemical Structure. Chem Res Toxicol 2017;30:2046-2059. [PMID: 28768096 DOI: 10.1021/acs.chemrestox.7b00084] [Citation(s) in RCA: 37] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]

Abstract

Animal testing alone cannot practically evaluate the health hazard posed by tens of thousands of environmental chemicals. Computational approaches making use of high-throughput experimental data may provide more efficient means to predict chemical toxicity. Here, we use a supervised machine learning strategy to systematically investigate the relative importance of study type, machine learning algorithm, and type of descriptor on predicting in vivo repeat-dose toxicity at the organ-level. A total of 985 compounds were represented using chemical structural descriptors, ToxPrint chemotype descriptors, and bioactivity descriptors from ToxCast in vitro high-throughput screening assays. Using ToxRefDB, a total of 35 target organ outcomes were identified that contained at least 100 chemicals (50 positive and 50 negative). Supervised machine learning was performed using Naïve Bayes, k-nearest neighbor, random forest, classification and regression trees, and support vector classification approaches. Model performance was assessed based on F1 scores using 5-fold cross-validation with balanced bootstrap replicates. Fixed effects modeling showed the variance in F1 scores was explained mostly by target organ outcome, followed by descriptor type, machine learning algorithm, and interactions between these three factors. A combination of bioactivity and chemical structure or chemotype descriptors were the most predictive. Model performance improved with more chemicals (up to a maximum of 24%), and these gains were correlated (ρ = 0.92) with the number of chemicals. Overall, the results demonstrate that a combination of bioactivity and chemical descriptors can accurately predict a range of target organ toxicity outcomes in repeat-dose studies, but specific experimental and methodologic improvements may increase predictivity.

Collapse

Zhao L, Wang W, Sedykh A, Zhu H. Experimental Errors in QSAR Modeling Sets: What We Can Do and What We Cannot Do. ACS OMEGA 2017;2:2805-2812. [PMID: 28691113 PMCID: PMC5494643 DOI: 10.1021/acsomega.7b00274] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/08/2017] [Accepted: 04/27/2017] [Indexed: 05/04/2023]

Abstract

Numerous chemical data sets have become available for quantitative structure-activity relationship (QSAR) modeling studies. However, the quality of different data sources may be different based on the nature of experimental protocols. Therefore, potential experimental errors in the modeling sets may lead to the development of poor QSAR models and further affect the predictions of new compounds. In this study, we explored the relationship between the ratio of questionable data in the modeling sets, which was obtained by simulating experimental errors, and the QSAR modeling performance. To this end, we used eight data sets (four continuous endpoints and four categorical endpoints) that have been extensively curated both in-house and by our collaborators to create over 1800 various QSAR models. Each data set was duplicated to create several new modeling sets with different ratios of simulated experimental errors (i.e., randomizing the activities of part of the compounds) in the modeling process. A fivefold cross-validation process was used to evaluate the modeling performance, which deteriorates when the ratio of experimental errors increases. All of the resulting models were also used to predict external sets of new compounds, which were excluded at the beginning of the modeling process. The modeling results showed that the compounds with relatively large prediction errors in cross-validation processes are likely to be those with simulated experimental errors. However, after removing a certain number of compounds with large prediction errors in the cross-validation process, the external predictions of new compounds did not show improvement. Our conclusion is that the QSAR predictions, especially consensus predictions, can identify compounds with potential experimental errors. But removing those compounds by the cross-validation procedure is not a reasonable means to improve model predictivity due to overfitting.

Collapse

Krallinger M, Rabal O, Lourenço A, Oyarzabal J, Valencia A. Information Retrieval and Text Mining Technologies for Chemistry. Chem Rev 2017;117:7673-7761. [PMID: 28475312 DOI: 10.1021/acs.chemrev.6b00851] [Citation(s) in RCA: 111] [Impact Index Per Article: 15.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]

Richard AM, Judson RS, Houck KA, Grulke CM, Volarath P, Thillainadarajah I, Yang C, Rathman J, Martin MT, Wambaugh JF, Knudsen TB, Kancherla J, Mansouri K, Patlewicz G, Williams AJ, Little SB, Crofton KM, Thomas RS. ToxCast Chemical Landscape: Paving the Road to 21st Century Toxicology. Chem Res Toxicol 2016;29:1225-51. [PMID: 27367298 DOI: 10.1021/acs.chemrestox.6b00135] [Citation(s) in RCA: 386] [Impact Index Per Article: 48.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]

Abstract

The U.S. Environmental Protection Agency's (EPA) ToxCast program is testing a large library of Agency-relevant chemicals using in vitro high-throughput screening (HTS) approaches to support the development of improved toxicity prediction models. Launched in 2007, Phase I of the program screened 310 chemicals, mostly pesticides, across hundreds of ToxCast assay end points. In Phase II, the ToxCast library was expanded to 1878 chemicals, culminating in the public release of screening data at the end of 2013. Subsequent expansion in Phase III has resulted in more than 3800 chemicals actively undergoing ToxCast screening, 96% of which are also being screened in the multi-Agency Tox21 project. The chemical library unpinning these efforts plays a central role in defining the scope and potential application of ToxCast HTS results. The history of the phased construction of EPA's ToxCast library is reviewed, followed by a survey of the library contents from several different vantage points. CAS Registry Numbers are used to assess ToxCast library coverage of important toxicity, regulatory, and exposure inventories. Structure-based representations of ToxCast chemicals are then used to compute physicochemical properties, substructural features, and structural alerts for toxicity and biotransformation. Cheminformatics approaches using these varied representations are applied to defining the boundaries of HTS testability, evaluating chemical diversity, and comparing the ToxCast library to potential target application inventories, such as used in EPA's Endocrine Disruption Screening Program (EDSP). Through several examples, the ToxCast chemical library is demonstrated to provide comprehensive coverage of the knowledge domains and target inventories of potential interest to EPA. Furthermore, the varied representations and approaches presented here define local chemistry domains potentially worthy of further investigation (e.g., not currently covered in the testing library or defined by toxicity "alerts") to strategically support data mining and predictive toxicology modeling moving forward.

Collapse

Affiliation(s)

Ann M Richard National Center for Computational Toxicology, Office of Research & Development, U.S. Environmental Protection Agency , Mail Code B205-01, Research Triangle Park, Durham, North Carolina 27711, United States
Richard S Judson National Center for Computational Toxicology, Office of Research & Development, U.S. Environmental Protection Agency , Mail Code B205-01, Research Triangle Park, Durham, North Carolina 27711, United States
Keith A Houck National Center for Computational Toxicology, Office of Research & Development, U.S. Environmental Protection Agency , Mail Code B205-01, Research Triangle Park, Durham, North Carolina 27711, United States
Christopher M Grulke National Center for Computational Toxicology, Office of Research & Development, U.S. Environmental Protection Agency , Mail Code B205-01, Research Triangle Park, Durham, North Carolina 27711, United States
Patra Volarath Center for Food Safety and Nutrition, U.S. Food and Drug Administration , 5100 Paint Branch Parkway, College Park, Maryland 20740, United States
Inthirany Thillainadarajah Senior Environmental Employment Program, U.S. Environmental Protection Agency , Research Triangle Park, Durham, North Carolina 27711, United States
Chihae Yang Molecular Networks GmbH , Henkestraße 91, 91052 Erlangen, Germany.,Altamira, LLC , 1455 Candlewood Drive, Columbus, Ohio 43235, United States
James Rathman Altamira, LLC , 1455 Candlewood Drive, Columbus, Ohio 43235, United States.,Department of Chemical and Biomolecular Engineering, The Ohio State University , 151 W. Woodruff Avenue, Columbus, Ohio 43210, United States
Matthew T Martin National Center for Computational Toxicology, Office of Research & Development, U.S. Environmental Protection Agency , Mail Code B205-01, Research Triangle Park, Durham, North Carolina 27711, United States
John F Wambaugh National Center for Computational Toxicology, Office of Research & Development, U.S. Environmental Protection Agency , Mail Code B205-01, Research Triangle Park, Durham, North Carolina 27711, United States
Thomas B Knudsen National Center for Computational Toxicology, Office of Research & Development, U.S. Environmental Protection Agency , Mail Code B205-01, Research Triangle Park, Durham, North Carolina 27711, United States
Jayaram Kancherla ORISE Fellow, U.S. Environmental Protection Agency, Research Triangle Park, Durham, North Carolina 27711, United States
Kamel Mansouri ORISE Fellow, U.S. Environmental Protection Agency, Research Triangle Park, Durham, North Carolina 27711, United States
Grace Patlewicz National Center for Computational Toxicology, Office of Research & Development, U.S. Environmental Protection Agency , Mail Code B205-01, Research Triangle Park, Durham, North Carolina 27711, United States
Antony J Williams National Center for Computational Toxicology, Office of Research & Development, U.S. Environmental Protection Agency , Mail Code B205-01, Research Triangle Park, Durham, North Carolina 27711, United States
Stephen B Little National Center for Computational Toxicology, Office of Research & Development, U.S. Environmental Protection Agency , Mail Code B205-01, Research Triangle Park, Durham, North Carolina 27711, United States
Kevin M Crofton National Center for Computational Toxicology, Office of Research & Development, U.S. Environmental Protection Agency , Mail Code B205-01, Research Triangle Park, Durham, North Carolina 27711, United States
Russell S Thomas National Center for Computational Toxicology, Office of Research & Development, U.S. Environmental Protection Agency , Mail Code B205-01, Research Triangle Park, Durham, North Carolina 27711, United States

Collapse

Neves BJ, Muratov E, Machado RB, Andrade CH, Cravo PVL. Modern approaches to accelerate discovery of new antischistosomal drugs. Expert Opin Drug Discov 2016;11:557-67. [PMID: 27073973 PMCID: PMC6534417 DOI: 10.1080/17460441.2016.1178230] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]

Oprea TI, Overington JP. Computational and Practical Aspects of Drug Repositioning. Assay Drug Dev Technol 2016;13:299-306. [PMID: 26241209 DOI: 10.1089/adt.2015.29011.tiodrrr] [Citation(s) in RCA: 64] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023] Open

Ball N, Cronin MTD, Shen J, Blackburn K, Booth ED, Bouhifd M, Donley E, Egnash L, Hastings C, Juberg DR, Kleensang A, Kleinstreuer N, Kroese ED, Lee AC, Luechtefeld T, Maertens A, Marty S, Naciff JM, Palmer J, Pamies D, Penman M, Richarz AN, Russo DP, Stuard SB, Patlewicz G, van Ravenzwaay B, Wu S, Zhu H, Hartung T. Toward Good Read-Across Practice (GRAP) guidance. ALTEX-ALTERNATIVES TO ANIMAL EXPERIMENTATION 2016;33:149-66. [PMID: 26863606 PMCID: PMC5581000 DOI: 10.14573/altex.1601251] [Citation(s) in RCA: 111] [Impact Index Per Article: 13.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Received: 01/21/2016] [Accepted: 02/11/2016] [Indexed: 12/04/2022]

Affiliation(s)

Nicholas Ball The Dow Chemical Company, Midland, MI, USA
Mark T D Cronin School of Pharmacy and Biomolecular Sciences, Liverpool John Moores University, Liverpool, UK
Jie Shen Research Institute for Fragrance Materials, Inc. Woodcliff Lake, NJ, USA
Karen Blackburn The Procter and Gamble Co., Cincinatti, OH, USA
Ewan D Booth Syngenta Ltd, Jealott's Hill International Research Centre, Bracknell, Berkshire, UK
Mounir Bouhifd Johns Hopkins Bloomberg School of Public Health, Center for Alternatives to Animal Testing (CAAT), Baltimore, MD, USA
Elizabeth Donley Stemina Biomarker Discovery Inc., Madison, WI, USA
Laura Egnash Stemina Biomarker Discovery Inc., Madison, WI, USA
Charles Hastings BASF SE, Ludwigshafen am Rhein, Germany, and Research Triangle Park, NC, USA
Daland R Juberg The Dow Chemical Company, Midland, MI, USA
Andre Kleensang Johns Hopkins Bloomberg School of Public Health, Center for Alternatives to Animal Testing (CAAT), Baltimore, MD, USA
Nicole Kleinstreuer National Toxicology Program Interagency Center for the Evaluation of Alternative Toxicological Methods, National Institute of Environmental Health Sciences, Research Triangle Park, NC, USA
E Dinant Kroese Risk Analysis for Products in Development, TNO Zeist, The Netherlands
Adam C Lee DuPont Haskell Global Centers for Health and Environmental Sciences, Newark, DE, USA
Thomas Luechtefeld Johns Hopkins Bloomberg School of Public Health, Center for Alternatives to Animal Testing (CAAT), Baltimore, MD, USA
Alexandra Maertens Johns Hopkins Bloomberg School of Public Health, Center for Alternatives to Animal Testing (CAAT), Baltimore, MD, USA
Sue Marty The Dow Chemical Company, Midland, MI, USA
Jorge M Naciff The Procter and Gamble Co., Cincinatti, OH, USA
Jessica Palmer Stemina Biomarker Discovery Inc., Madison, WI, USA
David Pamies Johns Hopkins Bloomberg School of Public Health, Center for Alternatives to Animal Testing (CAAT), Baltimore, MD, USA
Mike Penman Penman Consulting, Brussels, Belgium
Andrea-Nicole Richarz School of Pharmacy and Biomolecular Sciences, Liverpool John Moores University, Liverpool, UK
Daniel P Russo Department of Chemistry and Center for Computational and Integrative Biology, Rutgers University, Camden, NJ, USA
Sharon B Stuard The Procter and Gamble Co., Cincinatti, OH, USA
Grace Patlewicz US EPA/ORD, National Center for Computational Toxicology, Research Triangle Park, NC, USA
Bennard van Ravenzwaay Risk Analysis for Products in Development, TNO Zeist, The Netherlands
Shengde Wu The Procter and Gamble Co., Cincinatti, OH, USA
Hao Zhu Department of Chemistry and Center for Computational and Integrative Biology, Rutgers University, Camden, NJ, USA
Thomas Hartung Johns Hopkins Bloomberg School of Public Health, Center for Alternatives to Animal Testing (CAAT), Baltimore, MD, USA.,University of Konstanz, CAAT-Europe, Konstanz, Germany

Collapse

Tales from the war on error: the art and science of curating QSAR data. J Comput Aided Mol Des 2015;29:897-910. [PMID: 26290258 DOI: 10.1007/s10822-015-9865-0] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2015] [Accepted: 08/07/2015] [Indexed: 10/23/2022]

Activity, assay and target data curation and quality in the ChEMBL database. J Comput Aided Mol Des 2015. [PMID: 26201396 PMCID: PMC4607714 DOI: 10.1007/s10822-015-9860-5] [Citation(s) in RCA: 87] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]

Ai N, Fan X, Ekins S. In silico methods for predicting drug-drug interactions with cytochrome P-450s, transporters and beyond. Adv Drug Deliv Rev 2015;86:46-60. [PMID: 25796619 DOI: 10.1016/j.addr.2015.03.006] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2014] [Revised: 01/05/2015] [Accepted: 03/11/2015] [Indexed: 12/13/2022]

Karapetyan K, Batchelor C, Sharpe D, Tkachenko V, Williams AJ. The Chemical Validation and Standardization Platform (CVSP): large-scale automated validation of chemical structure datasets. J Cheminform 2015;7:30. [PMID: 26155308 PMCID: PMC4494041 DOI: 10.1186/s13321-015-0072-8] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2014] [Accepted: 04/28/2015] [Indexed: 11/10/2022] Open

Abstract

BACKGROUND

There are presently hundreds of online databases hosting millions of chemical compounds and associated data. As a result of the number of cheminformatics software tools that can be used to produce the data, subtle differences between the various cheminformatics platforms, as well as the naivety of the software users, there are a myriad of issues that can exist with chemical structure representations online. In order to help facilitate validation and standardization of chemical structure datasets from various sources we have delivered a freely available internet-based platform to the community for the processing of chemical compound datasets.

RESULTS

The chemical validation and standardization platform (CVSP) both validates and standardizes chemical structure representations according to sets of systematic rules. The chemical validation algorithms detect issues with submitted molecular representations using pre-defined or user-defined dictionary-based molecular patterns that are chemically suspicious or potentially requiring manual review. Each identified issue is assigned one of three levels of severity - Information, Warning, and Error - in order to conveniently inform the user of the need to browse and review subsets of their data. The validation process includes validation of atoms and bonds (e.g., making aware of query atoms and bonds), valences, and stereo. The standard form of submission of collections of data, the SDF file, allows the user to map the data fields to predefined CVSP fields for the purpose of cross-validating associated SMILES and InChIs with the connection tables contained within the SDF file. This platform has been applied to the analysis of a large number of data sets prepared for deposition to our ChemSpider database and in preparation of data for the Open PHACTS project. In this work we review the results of the automated validation of the DrugBank dataset, a popular drug and drug target database utilized by the community, and ChEMBL 17 data set. CVSP web site is located at http://cvsp.chemspider.com/.

CONCLUSION

A platform for the validation and standardization of chemical structure representations of various formats has been developed and made available to the community to assist and encourage the processing of chemical structure files to produce more homogeneous compound representations for exchange and interchange between online databases. While the CVSP platform is designed with flexibility inherent to the rules that can be used for processing the data we have produced a recommended rule set based on our own experiences with the large data sets such as DrugBank, ChEMBL, and data sets from ChemSpider.

Collapse

Clark AM, Williams AJ, Ekins S. Machines first, humans second: on the importance of algorithmic interpretation of open chemistry data. J Cheminform 2015;7:9. [PMID: 25798198 PMCID: PMC4369291 DOI: 10.1186/s13321-015-0057-7] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2014] [Accepted: 02/23/2015] [Indexed: 11/12/2022] Open

Lab notebook entries must target both visualisation by scientists and use by machine learning algorithms

Alex M Clark Molecular Materials Informatics, 1900 St. Jacques #302, Montreal, H3J 2S1, QC Canada
Antony J Williams Royal Society of Chemistry, 904 Tamaras Circle, Wake Forest, NC 27587 USA
Sean Ekins Collaborations in Chemistry, 5616 Hilltop Needmore Road, Fuquay-Varina, NC 27526 USA ; Collaborative Drug Discovery, 1633 Bayshore Highway, Suite 342, Burlingame, CA 94010 USA

Collapse

Oprea TI, Overington JP. Computational and Practical Aspects of Drug Repositioning. ACTA ACUST UNITED AC 2015. [DOI: 10.1089/drrr.2014.0009] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]

Nantasenamat C, Prachayasittikul V. Maximizing computational tools for successful drug discovery. Expert Opin Drug Discov 2015;10:321-9. [PMID: 25693813 DOI: 10.1517/17460441.2015.1016497] [Citation(s) in RCA: 42] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023]

Lipinski CA, Litterman NK, Southan C, Williams AJ, Clark AM, Ekins S. Parallel worlds of public and commercial bioactive chemistry data. J Med Chem 2014;58:2068-76. [PMID: 25415348 PMCID: PMC4360371 DOI: 10.1021/jm5011308] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]

Extending in silico mechanism-of-action analysis by annotating targets with pathways: application to cellular cytotoxicity readouts. Future Med Chem 2014;6:2029-56. [DOI: 10.4155/fmc.14.137] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022] Open

Ekins S, Clark AM, Swamidass SJ, Litterman N, Williams AJ. Bigger data, collaborative tools and the future of predictive drug discovery. J Comput Aided Mol Des 2014;28:997-1008. [PMID: 24943138 PMCID: PMC4198464 DOI: 10.1007/s10822-014-9762-y] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2014] [Accepted: 06/09/2014] [Indexed: 12/31/2022]

Ekins S, Freundlich JS, Reynolds RC. Are bigger data sets better for machine learning? Fusing single-point and dual-event dose response data for Mycobacterium tuberculosis. J Chem Inf Model 2014;54:2157-65. [PMID: 24968215 DOI: 10.1021/ci500264r] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]

Cherkasov A, Muratov EN, Fourches D, Varnek A, Baskin II, Cronin M, Dearden J, Gramatica P, Martin YC, Todeschini R, Consonni V, Kuz'min VE, Cramer R, Benigni R, Yang C, Rathman J, Terfloth L, Gasteiger J, Richard A, Tropsha A. QSAR modeling: where have you been? Where are you going to? J Med Chem 2014;57:4977-5010. [PMID: 24351051 PMCID: PMC4074254 DOI: 10.1021/jm4004285] [Citation(s) in RCA: 1040] [Impact Index Per Article: 104.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]

Warr WA. Data sharing matters. J Comput Aided Mol Des 2014;28:1-4. [PMID: 24435495 DOI: 10.1007/s10822-013-9705-z] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2013] [Accepted: 12/26/2013] [Indexed: 10/25/2022]

Scientific Lenses to Support Multiple Views over Linked Chemistry Data. THE SEMANTIC WEB – ISWC 2014 2014. [DOI: 10.1007/978-3-319-11964-9_7] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]

Lagunin AA, Goel RK, Gawande DY, Pahwa P, Gloriozova TA, Dmitriev AV, Ivanov SM, Rudik AV, Konova VI, Pogodin PV, Druzhilovsky DS, Poroikov VV. Chemo- and bioinformatics resources for in silico drug discovery from medicinal plants beyond their traditional use: a critical review. Nat Prod Rep 2014;31:1585-611. [DOI: 10.1039/c4np00068d] [Citation(s) in RCA: 87] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]

Ekins S. Progress in computational toxicology. J Pharmacol Toxicol Methods 2013;69:115-40. [PMID: 24361690 DOI: 10.1016/j.vascn.2013.12.003] [Citation(s) in RCA: 62] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2013] [Accepted: 12/08/2013] [Indexed: 01/02/2023]

Mosca R, Pons T, Céol A, Valencia A, Aloy P. Towards a detailed atlas of protein–protein interactions. Curr Opin Struct Biol 2013;23:929-40. [DOI: 10.1016/j.sbi.2013.07.005] [Citation(s) in RCA: 87] [Impact Index Per Article: 7.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2013] [Revised: 07/04/2013] [Accepted: 07/08/2013] [Indexed: 12/30/2022]

Sean Ekins Collaborations in Chemistry, 5616 Hilltop Needmore Road, Fuquay-Varina, NC, 27526, USA,
Alex M Clark
S Joshua Swamidass
Nadia Litterman
Antony J Williams

Artem Cherkasov Vancouver Prostate Centre, University of British Columbia, Vancouver, BC, V6H3Z6, Canada
Eugene N. Muratov Laboratory for Molecular Modeling, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, NC, 27599, USA Department of Molecular Structure and Cheminformatics, A.V. Bogatsky Physical-Chemical Institute National Academy of Sciences of Ukraine, Odessa, 65080, Ukraine
Denis Fourches Laboratory for Molecular Modeling, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, NC, 27599, USA
Alexandre Varnek Department of Chemistry, L. Pasteur University of Strasbourg, Strasbourg, 67000, France
Igor I. Baskin Department of Physics, Lomonosov Moscow State University, Moscow, 119991, Russia
Mark Cronin School of Pharmacy and Biomolecular Sciences, Liverpool John Moores University, Liverpool L33AF, UK
John Dearden School of Pharmacy and Biomolecular Sciences, Liverpool John Moores University, Liverpool L33AF, UK
Paola Gramatica Department of Structural and Functional Biology, University of Insubria, Varese, 21100, Italy
Yvonne C. Martin Martin Consulting, Waukegan, IL, 60079, USA
Roberto Todeschini Milano Chemometrics and QSAR Research Group, University of Milano-Bicocca, Milan, 20126, Italy
Viviana Consonni Milano Chemometrics and QSAR Research Group, University of Milano-Bicocca, Milan, 20126, Italy
Victor E. Kuz'min Department of Molecular Structure and Cheminformatics, A.V. Bogatsky Physical-Chemical Institute National Academy of Sciences of Ukraine, Odessa, 65080, Ukraine
Richard Cramer Tripos, Inc., St. Louis, MO, 63144, USA
Romualdo Benigni Environment and Health Department, Istituto Superiore di Sanita’, Rome, 00161, Italy
Chihae Yang Altamira LLC, Columbus OH 43235, USA
James Rathman Altamira LLC, Columbus OH 43235, USA Department of Chemical and Biomolecular Engineering, the Ohio State University, Columbus, OH 43215, USA
Lothar Terfloth Molecular Networks GmbH, 91052 Erlangen, Germany
Johann Gasteiger Molecular Networks GmbH, 91052 Erlangen, Germany
Ann Richard National Center for Computational Toxicology, U.S. Environmental Protection Agency, Research Triangle Park, NC, 27519, USA
Alexander Tropsha Laboratory for Molecular Modeling, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, NC, 27599, USA

Alexey A. Lagunin Orekhovich Institute of Biomedical Chemistry of Rus. Acad. Med. Sci Moscow, Russia Russian National Research Medical University Medico-Biologic Faculty Moscow, Russia
Rajesh K. Goel Department of Pharmaceutical Sciences and Drug Research Punjabi University Patiala-147002, India
Dinesh Y. Gawande Department of Pharmaceutical Sciences and Drug Research Punjabi University Patiala-147002, India
Priynka Pahwa Department of Pharmaceutical Sciences and Drug Research Punjabi University Patiala-147002, India
Tatyana A. Gloriozova Orekhovich Institute of Biomedical Chemistry of Rus. Acad. Med. Sci Moscow, Russia
Alexander V. Dmitriev Orekhovich Institute of Biomedical Chemistry of Rus. Acad. Med. Sci Moscow, Russia
Sergey M. Ivanov Orekhovich Institute of Biomedical Chemistry of Rus. Acad. Med. Sci Moscow, Russia
Anastassia V. Rudik Orekhovich Institute of Biomedical Chemistry of Rus. Acad. Med. Sci Moscow, Russia
Varvara I. Konova Orekhovich Institute of Biomedical Chemistry of Rus. Acad. Med. Sci Moscow, Russia
Pavel V. Pogodin Orekhovich Institute of Biomedical Chemistry of Rus. Acad. Med. Sci Moscow, Russia Russian National Research Medical University Medico-Biologic Faculty Moscow, Russia
Dmitry S. Druzhilovsky Orekhovich Institute of Biomedical Chemistry of Rus. Acad. Med. Sci Moscow, Russia
Vladimir V. Poroikov Orekhovich Institute of Biomedical Chemistry of Rus. Acad. Med. Sci Moscow, Russia Russian National Research Medical University Medico-Biologic Faculty Moscow, Russia

Chanin Nantasenamat Mahidol University, Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology , 10700 Bangkok , Thailand
Virapong Prachayasittikul

Christopher A Lipinski Christopher A. Lipinski, Ph.D., LLC , 10 Connshire Drive, Waterford, Connecticut 06385-4122, United States
Nadia K Litterman
Christopher Southan
Antony J Williams
Alex M Clark
Sean Ekins

Sean Ekins Collaborations in Chemistry , 5616 Hilltop Needmore Road, Fuquay-Varina, North Carolina 27526, United States
Joel S Freundlich
Robert C Reynolds