1
|
Kim S, Yu B, Li Q, Bolton EE. PubChem synonym filtering process using crowdsourcing. J Cheminform 2024; 16:69. [PMID: 38880887 PMCID: PMC11181558 DOI: 10.1186/s13321-024-00868-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2024] [Accepted: 06/09/2024] [Indexed: 06/18/2024] Open
Abstract
PubChem ( https://pubchem.ncbi.nlm.nih.gov ) is a public chemical information resource containing more than 100 million unique chemical structures. One of the most requested tasks in PubChem and other chemical databases is to search chemicals by name (also commonly called a "chemical synonym"). PubChem performs this task by looking up chemical synonym-structure associations provided by individual depositors to PubChem. In addition, these synonyms are used for many purposes, including creating links between chemicals and PubMed articles (using Medical Subject Headings (MeSH) terms). However, these depositor-provided name-structure associations are subject to substantial discrepancies within and between depositors, making it difficult to unambiguously map a chemical name to a specific chemical structure. The present paper describes PubChem's crowdsourcing-based synonym filtering strategy, which resolves inter- and intra-depositor discrepancies in synonym-structure associations as well as in the chemical-MeSH associations. The PubChem synonym filtering process was developed based on the analysis of four crowd-voting strategies, which differ in the consistency threshold value employed (60% vs 70%) and how to resolve intra-depositor discrepancies (a single vote vs. multiple votes per depositor) prior to inter-depositor crowd-voting. The agreement of voting was determined at six levels of chemical equivalency, which considers varying isotopic composition, stereochemistry, and connectivity of chemical structures and their primary components. While all four strategies showed comparable results, Strategy I (one vote per depositor with a 60% consistency threshold) resulted in the most synonyms assigned to a single chemical structure as well as the most synonym-structure associations disambiguated at the six chemical equivalency contexts. Based on the results of this study, Strategy I was implemented in PubChem's filtering process that cleans up synonym-structure associations as well as chemical-MeSH associations. This consistency-based filtering process is designed to look for a consensus in name-structure associations but cannot attest to their correctness. As a result, it can fail to recognize correct name-structure associations (or incorrect ones), for example, when a synonym is provided by only one depositor or when many contributors are incorrect. However, this filtering process is an important starting point for quality control in name-structure associations in large chemical databases like PubChem.
Collapse
Affiliation(s)
- Sunghwan Kim
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, 20894, USA
| | - Bo Yu
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, 20894, USA
| | - Qingliang Li
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, 20894, USA
| | - Evan E Bolton
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, 20894, USA.
| |
Collapse
|
2
|
Eriksen CA, Andersen JL, Fagerberg R, Merkle D. Toward the Reconciliation of Inconsistent Molecular Structures from Biochemical Databases. J Comput Biol 2024; 31:498-512. [PMID: 38758924 DOI: 10.1089/cmb.2024.0520] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/19/2024] Open
Abstract
Information on the structure of molecules, retrieved via biochemical databases, plays a pivotal role in various disciplines, including metabolomics, systems biology, and drug discovery. No such database can be complete and it is often necessary to incorporate data from several sources. However, the molecular structure for a given compound is not necessarily consistent between databases. This article presents StructRecon, a novel tool for resolving unique molecular structures from database identifiers. Currently, identifiers from BiGG, ChEBI, Escherichia coli Metabolome Database (ECMDB), MetaNetX, and PubChem are supported. StructRecon traverses the cross-links between entries in different databases to construct what we call identifier graphs. The goal of these graphs is to offer a more complete view of the total information available on a given compound across all the supported databases. To reconcile discrepancies met during the traversal of the databases, we develop an extensible model for molecular structure supporting multiple independent levels of detail, which allows standardization of the structure to be applied iteratively. In some cases, our standardization approach results in multiple candidate structures for a given compound, in which case a random walk-based algorithm is used to select the most likely structure among incompatible alternatives. As a case study, we applied StructRecon to the EColiCore2 model. We found at least one structure for 98.66% of its compounds, which is more than twice as many as possible when using the databases in more standard ways not considering the complex network of cross-database references captured by our identifier graphs. StructRecon is open-source and modular, which enables support for more databases in the future.
Collapse
Affiliation(s)
- Casper Asbjørn Eriksen
- Department of Mathematics and Computer Science, University of Southern Denmark, Odense, Denmark
| | - Jakob Lykke Andersen
- Department of Mathematics and Computer Science, University of Southern Denmark, Odense, Denmark
| | - Rolf Fagerberg
- Department of Mathematics and Computer Science, University of Southern Denmark, Odense, Denmark
| | - Daniel Merkle
- Department of Mathematics and Computer Science, University of Southern Denmark, Odense, Denmark
- Faculty of Technology, Bielefeld University, Bielefeld, Germany
| |
Collapse
|
3
|
Du Y. Binding Curve Viewer: Visualizing the Equilibrium and Kinetics of Protein-Ligand Binding and Competitive Binding. J Chem Inf Model 2024; 64:4180-4192. [PMID: 38720179 PMCID: PMC11134506 DOI: 10.1021/acs.jcim.4c00130] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2024] [Revised: 04/21/2024] [Accepted: 04/25/2024] [Indexed: 05/28/2024]
Abstract
Understanding the thermodynamics and kinetics of the protein-ligand interaction is essential for biologists and pharmacologists. To visualize the equilibrium and kinetics of the binding reaction with 1:1 stoichiometry and no cooperativity, we obtained the exact relationship of the concentration of the protein-ligand complex and the time in the second-order binding process and numerically simulated the process of competitive binding. First, two common concerns in measuring protein-ligand interactions were focused on how to avoid the titration regime and how to establish the appropriate incubation time. Then, we gave examples of how the commonly used experimental conditions of [L]0 ≫ [P]0 and [I]0 ≫ [P]0 affected the estimation of the kinetic and thermodynamic properties. Theoretical inhibition curves were calculated, and the apparent IC50 and IC50 were estimated accordingly under predefined conditions. Using the estimated apparent IC50, we compared the apparent Ki and Ki calculated by using the Cheng-Prusoff equation, Lin-Riggs equation, and Wang's group equation. We also applied our tools to simulate high-throughput screening and compare the results of real experiments. The visualization tool for simulating the saturation experiment, kinetic experiments of binding and competitive binding, and inhibition curve, "Binding Curve Viewer," is available at www.eplatton.net/binding-curve-viewer.
Collapse
Affiliation(s)
- Yu Du
- Department
of Clinical Laboratory, The Second Affiliated
Hospital of Jiaxing University, Huancheng North Road 1518, Jiaxing, Zhejiang 314000, China
- The
Key Laboratory, The Second Affiliated Hospital
of Jiaxing University, Huancheng North Road 1518, Jiaxing, Zhejiang 314000, China
| |
Collapse
|
4
|
Mansouri K, Moreira-Filho JT, Lowe CN, Charest N, Martin T, Tkachenko V, Judson R, Conway M, Kleinstreuer NC, Williams AJ. Free and open-source QSAR-ready workflow for automated standardization of chemical structures in support of QSAR modeling. J Cheminform 2024; 16:19. [PMID: 38378618 PMCID: PMC10880251 DOI: 10.1186/s13321-024-00814-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2023] [Accepted: 02/10/2024] [Indexed: 02/22/2024] Open
Abstract
The rapid increase of publicly available chemical structures and associated experimental data presents a valuable opportunity to build robust QSAR models for applications in different fields. However, the common concern is the quality of both the chemical structure information and associated experimental data. This is especially true when those data are collected from multiple sources as chemical substance mappings can contain many duplicate structures and molecular inconsistencies. Such issues can impact the resulting molecular descriptors and their mappings to experimental data and, subsequently, the quality of the derived models in terms of accuracy, repeatability, and reliability. Herein we describe the development of an automated workflow to standardize chemical structures according to a set of standard rules and generate two and/or three-dimensional "QSAR-ready" forms prior to the calculation of molecular descriptors. The workflow was designed in the KNIME workflow environment and consists of three high-level steps. First, a structure encoding is read, and then the resulting in-memory representation is cross-referenced with any existing identifiers for consistency. Finally, the structure is standardized using a series of operations including desalting, stripping of stereochemistry (for two-dimensional structures), standardization of tautomers and nitro groups, valence correction, neutralization when possible, and then removal of duplicates. This workflow was initially developed to support collaborative modeling QSAR projects to ensure consistency of the results from the different participants. It was then updated and generalized for other modeling applications. This included modification of the "QSAR-ready" workflow to generate "MS-ready structures" to support the generation of substance mappings and searches for software applications related to non-targeted analysis mass spectrometry. Both QSAR and MS-ready workflows are freely available in KNIME, via standalone versions on GitHub, and as docker container resources for the scientific community. Scientific contribution: This work pioneers an automated workflow in KNIME, systematically standardizing chemical structures to ensure their readiness for QSAR modeling and broader scientific applications. By addressing data quality concerns through desalting, stereochemistry stripping, and normalization, it optimizes molecular descriptors' accuracy and reliability. The freely available resources in KNIME, GitHub, and docker containers democratize access, benefiting collaborative research and advancing diverse modeling endeavors in chemistry and mass spectrometry.
Collapse
Affiliation(s)
- Kamel Mansouri
- National Toxicology Program Interagency Center for the Evaluation of Alternative Toxicological Methods, National Institute of Environmental Health Sciences, Research Triangle Park, NC, 27709, USA.
| | - José T Moreira-Filho
- National Toxicology Program Interagency Center for the Evaluation of Alternative Toxicological Methods, National Institute of Environmental Health Sciences, Research Triangle Park, NC, 27709, USA
| | - Charles N Lowe
- Center for Computational Toxicology and Exposure, Office of Research and Development, U.S. Environmental Protection Agency, Research Triangle Park, NC, 27711, USA
| | - Nathaniel Charest
- Center for Computational Toxicology and Exposure, Office of Research and Development, U.S. Environmental Protection Agency, Research Triangle Park, NC, 27711, USA
| | - Todd Martin
- Center for Computational Toxicology and Exposure, Office of Research and Development, U.S. Environmental Protection Agency, Research Triangle Park, NC, 27711, USA
| | | | - Richard Judson
- Center for Computational Toxicology and Exposure, Office of Research and Development, U.S. Environmental Protection Agency, Research Triangle Park, NC, 27711, USA
| | - Mike Conway
- National Institute of Environmental Health Sciences, Research Triangle Park, NC, 27709, USA
| | - Nicole C Kleinstreuer
- National Toxicology Program Interagency Center for the Evaluation of Alternative Toxicological Methods, National Institute of Environmental Health Sciences, Research Triangle Park, NC, 27709, USA
| | - Antony J Williams
- Center for Computational Toxicology and Exposure, Office of Research and Development, U.S. Environmental Protection Agency, Research Triangle Park, NC, 27711, USA
| |
Collapse
|
5
|
Gupta MK, Gouda G, Sultana S, Punekar SM, Vadde R, Ravikiran T. Structure-related relationship: Plant-derived antidiabetic compounds. STUDIES IN NATURAL PRODUCTS CHEMISTRY 2023:241-295. [DOI: 10.1016/b978-0-323-91294-5.00008-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/06/2023]
|
6
|
Li L, Zhang Z, Men Y, Baskaran S, Sangion A, Wang S, Arnot JA, Wania F. Retrieval, Selection, and Evaluation of Chemical Property Data for Assessments of Chemical Emissions, Fate, Hazard, Exposure, and Risks. ACS ENVIRONMENTAL AU 2022; 2:376-395. [PMID: 37101455 PMCID: PMC10125307 DOI: 10.1021/acsenvironau.2c00010] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/02/2022] [Revised: 07/01/2022] [Accepted: 07/05/2022] [Indexed: 04/28/2023]
Abstract
Reliable chemical property data are the key to defensible and unbiased assessments of chemical emissions, fate, hazard, exposure, and risks. However, the retrieval, evaluation, and use of reliable chemical property data can often be a formidable challenge for chemical assessors and model users. This comprehensive review provides practical guidance for use of chemical property data in chemical assessments. We assemble available sources for obtaining experimentally derived and in silico predicted property data; we also elaborate strategies for evaluating and curating the obtained property data. We demonstrate that both experimentally derived and in silico predicted property data can be subject to considerable uncertainty and variability. Chemical assessors are encouraged to use property data derived through the harmonization of multiple carefully selected experimental data if a sufficient number of reliable laboratory measurements is available or through the consensus consolidation of predictions from multiple in silico tools if the data pool from laboratory measurements is not adequate.
Collapse
Affiliation(s)
- Li Li
- School
of Public Health, University of Nevada Reno, Reno, Nevada 89557, United States
- . Phone: +1 (775) 682 7077
| | - Zhizhen Zhang
- School
of Public Health, University of Nevada Reno, Reno, Nevada 89557, United States
| | - Yujie Men
- Department
of Chemical & Environmental Engineering, University of California Riverside, Riverside, California 92521, United States
| | - Sivani Baskaran
- Department
of Physical and Environmental Sciences, University of Toronto Scarborough, Toronto, Ontario M1C 1A4, Canada
| | - Alessandro Sangion
- Department
of Physical and Environmental Sciences, University of Toronto Scarborough, Toronto, Ontario M1C 1A4, Canada
- ARC
Arnot Research & Consulting, Toronto, Ontario M4M 1W4, Canada
| | - Shenghong Wang
- School
of Public Health, University of Nevada Reno, Reno, Nevada 89557, United States
| | - Jon A. Arnot
- Department
of Physical and Environmental Sciences, University of Toronto Scarborough, Toronto, Ontario M1C 1A4, Canada
- ARC
Arnot Research & Consulting, Toronto, Ontario M4M 1W4, Canada
- Department
of Pharmacology and Toxicology, University
of Toronto, Toronto, Ontario M5S 1A8, Canada
| | - Frank Wania
- Department
of Physical and Environmental Sciences, University of Toronto Scarborough, Toronto, Ontario M1C 1A4, Canada
| |
Collapse
|
7
|
Dolciami D, Villasclaras-Fernandez E, Kannas C, Meniconi M, Al-Lazikani B, Antolin AA. canSAR chemistry registration and standardization pipeline. J Cheminform 2022; 14:28. [PMID: 35643512 PMCID: PMC9148294 DOI: 10.1186/s13321-022-00606-7] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2022] [Accepted: 04/04/2022] [Indexed: 11/10/2022] Open
Abstract
Abstract
Background
Integration of medicinal chemistry data from numerous public resources is an increasingly important part of academic drug discovery and translational research because it can bring a wealth of important knowledge related to compounds in one place. However, different data sources can report the same or related compounds in various forms (e.g., tautomers, racemates, etc.), thus highlighting the need of organising related compounds in hierarchies that alert the user on important bioactivity data that may be relevant. To generate these compound hierarchies, we have developed and implemented canSARchem, a new compound registration and standardization pipeline as part of the canSAR public knowledgebase. canSARchem builds on previously developed ChEMBL and PubChem pipelines and is developed using KNIME. We describe the pipeline which we make publicly available, and we provide examples on the strengths and limitations of the use of hierarchies for bioactivity data exploration. Finally, we identify canonicalization enrichment in FDA-approved drugs, illustrating the benefits of our approach.
Results
We created a chemical registration and standardization pipeline in KNIME and made it freely available to the research community. The pipeline consists of five steps to register the compounds and create the compounds’ hierarchy: 1. Structure checker, 2. Standardization, 3. Generation of canonical tautomers and representative structures, 4. Salt strip, and 5. Generation of abstract structure to generate the compound hierarchy. Unlike ChEMBL’s RDKit pipeline, we carry out compound canonicalization ahead of getting the parent structure, similar to PubChem’s OpenEye pipeline. canSARchem has a lower rejection rate compared to both PubChem and ChEMBL. We use our pipeline to assess the impact of grouping the compounds in hierarchies for bioactivity data exploration. We find that FDA-approved drugs show statistically significant sensitivity to canonicalization compared to the majority of bioactive compounds which demonstrates the importance of this step.
Conclusions
We use canSARchem to standardize all the compounds uploaded in canSAR (> 3 million) enabling efficient data integration and the rapid identification of alternative compound forms with useful bioactivity data. Comparison with PubChem and ChEMBL pipelines evidenced comparable performances in compound standardization, but only PubChem and canSAR canonicalize tautomers and canSAR has a slightly lower rejection rate. Our results highlight the importance of compound hierarchies for bioactivity data exploration. We make canSARchem available under a Creative Commons Attribution-ShareAlike 4.0 International License (CC BY-SA 4.0) at https://gitlab.icr.ac.uk/cansar-public/compound-registration-pipeline.
Collapse
|
8
|
Jacobs A, Williams D, Hickey K, Patrick N, Williams AJ, Chalk S, McEwen L, Willighagen E, Walker M, Bolton E, Sinclair G, Sanford A. CAS Common Chemistry in 2021: Expanding Access to Trusted Chemical Information for the Scientific Community. J Chem Inf Model 2022; 62:2737-2743. [PMID: 35559614 PMCID: PMC9199008 DOI: 10.1021/acs.jcim.2c00268] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
CAS Common Chemistry (https://commonchemistry.cas.org/) is an open web resource that provides access to reliable chemical substance information for the scientific community. Having served millions of visitors since its creation in 2009, the resource was extensively updated in 2021 with significant enhancements. The underlying dataset was expanded from 8000 to 500,000 chemical substances and includes additional associated information, such as basic properties and computer-readable chemical structure information. New use cases are supported with enhanced search capabilities and an integrated application programming interface. Reusable licensing of the content is provided through a Creative Commons Attribution-Non-Commercial (CC-BY-NC 4.0) license allowing other public resources to integrate the data into their systems. This paper provides an overview of the enhancements to data and functionality, discusses the benefits of the contribution to the chemistry community, and summarizes recent progress in leveraging this resource to strengthen other information sources.
Collapse
Affiliation(s)
- Andrea Jacobs
- CAS, 2540 Olentangy River Rd, Columbus, Ohio 43202, United States
| | - Dustin Williams
- CAS, 2540 Olentangy River Rd, Columbus, Ohio 43202, United States
| | - Katherine Hickey
- CAS, 2540 Olentangy River Rd, Columbus, Ohio 43202, United States
| | - Nathan Patrick
- CAS, 2540 Olentangy River Rd, Columbus, Ohio 43202, United States
| | - Antony J Williams
- Center for Computational Toxicology and Exposure, Office of Research and Development, U.S. Environmental Protection Agency (U.S. EPA), Research Triangle Park, North Carolina 27711, United States
| | - Stuart Chalk
- Department of Chemistry, University of North Florida, Jacksonville, Florida 32224, United States
| | - Leah McEwen
- Physical Sciences Library, Cornell University, Ithaca, New York 14853, United States
| | - Egon Willighagen
- Department of Bioinformatics - BiGCaT, Maastricht University, 6229 ER Maastricht, The Netherlands
| | - Martin Walker
- Department of Chemistry, SUNY Potsdam, 44 Pierrepont Ave., Potsdam, New York 13676, United States
| | - Evan Bolton
- Department of Health and Human Services, National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, 8600 Rockville Pike, Bethesda, Maryland 20894, United States
| | - Gabriel Sinclair
- Center for Computational Toxicology and Exposure, Office of Research and Development, U.S. Environmental Protection Agency (U.S. EPA), Research Triangle Park, North Carolina 27711, United States
| | - Adam Sanford
- CAS, 2540 Olentangy River Rd, Columbus, Ohio 43202, United States
| |
Collapse
|
9
|
Chang X, Tan YM, Allen DG, Bell S, Brown PC, Browning L, Ceger P, Gearhart J, Hakkinen PJ, Kabadi SV, Kleinstreuer NC, Lumen A, Matheson J, Paini A, Pangburn HA, Petersen EJ, Reinke EN, Ribeiro AJS, Sipes N, Sweeney LM, Wambaugh JF, Wange R, Wetmore BA, Mumtaz M. IVIVE: Facilitating the Use of In Vitro Toxicity Data in Risk Assessment and Decision Making. TOXICS 2022; 10:232. [PMID: 35622645 PMCID: PMC9143724 DOI: 10.3390/toxics10050232] [Citation(s) in RCA: 35] [Impact Index Per Article: 17.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/25/2022] [Accepted: 04/24/2022] [Indexed: 02/04/2023]
Abstract
During the past few decades, the science of toxicology has been undergoing a transformation from observational to predictive science. New approach methodologies (NAMs), including in vitro assays, in silico models, read-across, and in vitro to in vivo extrapolation (IVIVE), are being developed to reduce, refine, or replace whole animal testing, encouraging the judicious use of time and resources. Some of these methods have advanced past the exploratory research stage and are beginning to gain acceptance for the risk assessment of chemicals. A review of the recent literature reveals a burst of IVIVE publications over the past decade. In this review, we propose operational definitions for IVIVE, present literature examples for several common toxicity endpoints, and highlight their implications in decision-making processes across various federal agencies, as well as international organizations, including those in the European Union (EU). The current challenges and future needs are also summarized for IVIVE. In addition to refining and reducing the number of animals in traditional toxicity testing protocols and being used for prioritizing chemical testing, the goal to use IVIVE to facilitate the replacement of animal models can be achieved through their continued evolution and development, including a strategic plan to qualify IVIVE methods for regulatory acceptance.
Collapse
Affiliation(s)
- Xiaoqing Chang
- Inotiv-RTP, 601 Keystone Park Drive, Suite 200, Morrisville, NC 27560, USA; (X.C.); (D.G.A.); (S.B.); (L.B.); (P.C.)
| | - Yu-Mei Tan
- U.S. Environmental Protection Agency, Office of Pesticide Programs, 109 T.W. Alexander Drive, Durham, NC 27709, USA;
| | - David G. Allen
- Inotiv-RTP, 601 Keystone Park Drive, Suite 200, Morrisville, NC 27560, USA; (X.C.); (D.G.A.); (S.B.); (L.B.); (P.C.)
| | - Shannon Bell
- Inotiv-RTP, 601 Keystone Park Drive, Suite 200, Morrisville, NC 27560, USA; (X.C.); (D.G.A.); (S.B.); (L.B.); (P.C.)
| | - Paul C. Brown
- U.S. Food and Drug Administration, Center for Drug Evaluation and Research, 10903 New Hampshire Avenue, Silver Spring, MD 20903, USA; (P.C.B.); (A.J.S.R.); (R.W.)
| | - Lauren Browning
- Inotiv-RTP, 601 Keystone Park Drive, Suite 200, Morrisville, NC 27560, USA; (X.C.); (D.G.A.); (S.B.); (L.B.); (P.C.)
| | - Patricia Ceger
- Inotiv-RTP, 601 Keystone Park Drive, Suite 200, Morrisville, NC 27560, USA; (X.C.); (D.G.A.); (S.B.); (L.B.); (P.C.)
| | - Jeffery Gearhart
- The Henry M. Jackson Foundation, Air Force Research Laboratory, 711 Human Performance Wing, Wright-Patterson Air Force Base, OH 45433, USA;
| | - Pertti J. Hakkinen
- National Library of Medicine, National Center for Biotechnology Information, 8600 Rockville Pike, Bethesda, MD 20894, USA;
| | - Shruti V. Kabadi
- U.S. Food and Drug Administration, Center for Food Safety and Applied Nutrition, Office of Food Additive Safety, 5001 Campus Drive, HFS-275, College Park, MD 20740, USA;
| | - Nicole C. Kleinstreuer
- National Institute of Environmental Health Sciences, National Toxicology Program Interagency Center for the Evaluation of Alternative Toxicological Methods, P.O. Box 12233, Research Triangle Park, NC 27709, USA;
| | - Annie Lumen
- U.S. Food and Drug Administration, National Center for Toxicological Research, 3900 NCTR Road, Jefferson, AR 72079, USA;
| | - Joanna Matheson
- U.S. Consumer Product Safety Commission, Division of Toxicology and Risk Assessment, 5 Research Place, Rockville, MD 20850, USA;
| | - Alicia Paini
- European Commission, Joint Research Centre (JRC), 21027 Ispra, Italy;
| | - Heather A. Pangburn
- Air Force Research Laboratory, 711 Human Performance Wing, 2729 R Street, Area B, Building 837, Wright-Patterson Air Force Base, OH 45433, USA;
| | - Elijah J. Petersen
- U.S. Department of Commerce, National Institute of Standards and Technology, 100 Bureau Drive, Gaithersburg, MD 20899, USA;
| | - Emily N. Reinke
- U.S. Army Public Health Center, 8252 Blackhawk Rd., Aberdeen Proving Ground, MD 21010, USA;
| | - Alexandre J. S. Ribeiro
- U.S. Food and Drug Administration, Center for Drug Evaluation and Research, 10903 New Hampshire Avenue, Silver Spring, MD 20903, USA; (P.C.B.); (A.J.S.R.); (R.W.)
| | - Nisha Sipes
- U.S. Environmental Protection Agency, Center for Computational Toxicology and Exposure, 109 TW Alexander Dr., Research Triangle Park, NC 27711, USA; (N.S.); (J.F.W.); (B.A.W.)
| | - Lisa M. Sweeney
- UES, Inc., 4401 Dayton-Xenia Road, Beavercreek, OH 45432, Assigned to Air Force Research Laboratory, 711 Human Performance Wing, Wright-Patterson Air Force Base, OH 45433, USA;
| | - John F. Wambaugh
- U.S. Environmental Protection Agency, Center for Computational Toxicology and Exposure, 109 TW Alexander Dr., Research Triangle Park, NC 27711, USA; (N.S.); (J.F.W.); (B.A.W.)
| | - Ronald Wange
- U.S. Food and Drug Administration, Center for Drug Evaluation and Research, 10903 New Hampshire Avenue, Silver Spring, MD 20903, USA; (P.C.B.); (A.J.S.R.); (R.W.)
| | - Barbara A. Wetmore
- U.S. Environmental Protection Agency, Center for Computational Toxicology and Exposure, 109 TW Alexander Dr., Research Triangle Park, NC 27711, USA; (N.S.); (J.F.W.); (B.A.W.)
| | - Moiz Mumtaz
- Agency for Toxic Substances and Disease Registry, Office of the Associate Director for Science, 1600 Clifton Road, S102-2, Atlanta, GA 30333, USA
| |
Collapse
|
10
|
Kirstgen M, Müller SF, Lowjaga KAAT, Goldmann N, Lehmann F, Alakurtti S, Yli-Kauhaluoma J, Baringhaus KH, Krieg R, Glebe D, Geyer J. Identification of Novel HBV/HDV Entry Inhibitors by Pharmacophore- and QSAR-Guided Virtual Screening. Viruses 2021; 13:v13081489. [PMID: 34452354 PMCID: PMC8402622 DOI: 10.3390/v13081489] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2021] [Revised: 07/19/2021] [Accepted: 07/24/2021] [Indexed: 12/17/2022] Open
Abstract
The hepatic bile acid transporter Na+/taurocholate co-transporting polypeptide (NTCP) was identified in 2012 as the high-affinity hepatic receptor for the hepatitis B and D viruses (HBV/HDV). Since then, this carrier has emerged as promising drug target for HBV/HDV virus entry inhibitors, but the synthetic peptide Hepcludex® of high molecular weight is the only approved HDV entry inhibitor so far. The present study aimed to identify small molecules as novel NTCP inhibitors with anti-viral activity. A ligand-based bioinformatic approach was used to generate and validate appropriate pharmacophore and QSAR (quantitative structure–activity relationship) models. Half-maximal inhibitory concentrations (IC50) for binding inhibition of the HBV/HDV-derived preS1 peptide (as surrogate parameter for virus binding to NTCP) were determined in NTCP-expressing HEK293 cells for 150 compounds of different chemical classes. IC50 values ranged from 2 µM up to >1000 µM. The generated pharmacophore and QSAR models were used for virtual screening of drug-like chemicals from the ZINC15 database (~11 million compounds). The 20 best-performing compounds were then experimentally tested for preS1-peptide binding inhibition in NTCP-HEK293 cells. Among them, four compounds were active and revealed experimental IC50 values for preS1-peptide binding inhibition of 9, 19, 20, and 35 µM, which were comparable to the QSAR-based predictions. All these compounds also significantly inhibited in vitro HDV infection of NTCP-HepG2 cells, without showing any cytotoxicity. The best-performing compound in all assays was ZINC000253533654. In conclusion, the present study demonstrates that virtual compound screening based on NTCP-specific pharmacophore and QSAR models can predict novel active hit compounds for the development of HBV/HDV entry inhibitors.
Collapse
Affiliation(s)
- Michael Kirstgen
- Institute of Pharmacology and Toxicology, Faculty of Veterinary Medicine, Justus Liebig University Giessen, 35392 Giessen, Germany; (M.K.); (S.F.M.); (K.A.A.T.L.)
| | - Simon Franz Müller
- Institute of Pharmacology and Toxicology, Faculty of Veterinary Medicine, Justus Liebig University Giessen, 35392 Giessen, Germany; (M.K.); (S.F.M.); (K.A.A.T.L.)
| | - Kira Alessandra Alicia Theresa Lowjaga
- Institute of Pharmacology and Toxicology, Faculty of Veterinary Medicine, Justus Liebig University Giessen, 35392 Giessen, Germany; (M.K.); (S.F.M.); (K.A.A.T.L.)
| | - Nora Goldmann
- Institute of Medical Virology, National Reference Center for Hepatitis B Viruses and Hepatitis D Viruses, Justus Liebig University Giessen, 35392 Giessen, Germany; (N.G.); (F.L.); (D.G.)
| | - Felix Lehmann
- Institute of Medical Virology, National Reference Center for Hepatitis B Viruses and Hepatitis D Viruses, Justus Liebig University Giessen, 35392 Giessen, Germany; (N.G.); (F.L.); (D.G.)
| | - Sami Alakurtti
- Drug Research Program, Division of Pharmaceutical Chemistry and Technology, Faculty of Pharmacy, University of Helsinki, Viikinkaari 5 E, FI-00014 Helsinki, Finland; (S.A.); (J.Y.-K.)
- VTT Technical Research Centre of Finland, Biologinkuja 7, FI-02044 Espoo, Finland
| | - Jari Yli-Kauhaluoma
- Drug Research Program, Division of Pharmaceutical Chemistry and Technology, Faculty of Pharmacy, University of Helsinki, Viikinkaari 5 E, FI-00014 Helsinki, Finland; (S.A.); (J.Y.-K.)
| | | | - Reimar Krieg
- Institute of Anatomy II, University Hospital Jena, Teichgraben 7, 07743 Jena, Germany;
| | - Dieter Glebe
- Institute of Medical Virology, National Reference Center for Hepatitis B Viruses and Hepatitis D Viruses, Justus Liebig University Giessen, 35392 Giessen, Germany; (N.G.); (F.L.); (D.G.)
- German Center for Infection Research (DZIF), Partner Site Giessen-Marburg-Langen, 35392 Giessen, Germany
| | - Joachim Geyer
- Institute of Pharmacology and Toxicology, Faculty of Veterinary Medicine, Justus Liebig University Giessen, 35392 Giessen, Germany; (M.K.); (S.F.M.); (K.A.A.T.L.)
- Correspondence: ; Tel.: +49-641-99-38404; Fax: +49-641-99-38409
| |
Collapse
|
11
|
Santana K, do Nascimento LD, Lima e Lima A, Damasceno V, Nahum C, Braga RC, Lameira J. Applications of Virtual Screening in Bioprospecting: Facts, Shifts, and Perspectives to Explore the Chemo-Structural Diversity of Natural Products. Front Chem 2021; 9:662688. [PMID: 33996755 PMCID: PMC8117418 DOI: 10.3389/fchem.2021.662688] [Citation(s) in RCA: 31] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2021] [Accepted: 02/25/2021] [Indexed: 12/22/2022] Open
Abstract
Natural products are continually explored in the development of new bioactive compounds with industrial applications, attracting the attention of scientific research efforts due to their pharmacophore-like structures, pharmacokinetic properties, and unique chemical space. The systematic search for natural sources to obtain valuable molecules to develop products with commercial value and industrial purposes remains the most challenging task in bioprospecting. Virtual screening strategies have innovated the discovery of novel bioactive molecules assessing in silico large compound libraries, favoring the analysis of their chemical space, pharmacodynamics, and their pharmacokinetic properties, thus leading to the reduction of financial efforts, infrastructure, and time involved in the process of discovering new chemical entities. Herein, we discuss the computational approaches and methods developed to explore the chemo-structural diversity of natural products, focusing on the main paradigms involved in the discovery and screening of bioactive compounds from natural sources, placing particular emphasis on artificial intelligence, cheminformatics methods, and big data analyses.
Collapse
Affiliation(s)
- Kauê Santana
- Instituto de Biodiversidade, Universidade Federal do Oeste do Pará, Santarém, Brazil
| | | | - Anderson Lima e Lima
- Instituto de Ciências Exatas e Naturais, Universidade Federal do Pará, Belém, Brazil
| | - Vinícius Damasceno
- Instituto de Ciências Exatas e Naturais, Universidade Federal do Pará, Belém, Brazil
| | - Claudio Nahum
- Instituto de Ciências Exatas e Naturais, Universidade Federal do Pará, Belém, Brazil
| | | | - Jerônimo Lameira
- Instituto de Ciências Biológicas, Universidade Federal do Pará, Belém, Brazil
| |
Collapse
|
12
|
Vaitkus A, Merkys A, Gražulis S. Validation of the Crystallography Open Database using the Crystallographic Information Framework. J Appl Crystallogr 2021; 54:661-672. [PMID: 33953659 PMCID: PMC8056762 DOI: 10.1107/s1600576720016532] [Citation(s) in RCA: 42] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2020] [Accepted: 12/21/2020] [Indexed: 12/25/2022] Open
Abstract
Data curation practices of the Crystallography Open Database are described with greater focus being placed on the cif_validate program, capable of validating crystallographic information files against both DDL1 and DDLm dictionaries. Data curation practices of the Crystallography Open Database (COD) are described with additional focus being placed on the formal validation using the Crystallographic Information Framework (CIF). The cif_validate program, capable of validating CIF files against both the DDL1 and the DDLm dictionaries, is presented and used to process the entirety of the COD. Validation results collected from over 450 000 CIF files are demonstrated to be a useful resource in the data maintenance process as well as the development of the underlying ontologies. A set of programs intended to aid in the dictionary migration from DDL1 to DDLm is also presented.
Collapse
Affiliation(s)
- Antanas Vaitkus
- Department of Protein-DNA Interactions, Institute of Biotechnology, Life Sciences Center, Vilnius University, Saulėtekio al. 7, LT-10257, Vilnius, Lithuania
| | - Andrius Merkys
- Department of Protein-DNA Interactions, Institute of Biotechnology, Life Sciences Center, Vilnius University, Saulėtekio al. 7, LT-10257, Vilnius, Lithuania
| | - Saulius Gražulis
- Department of Protein-DNA Interactions, Institute of Biotechnology, Life Sciences Center, Vilnius University, Saulėtekio al. 7, LT-10257, Vilnius, Lithuania.,Faculty of Mathematics and Informatics, Vilnius University, Naugarduko g. 24, LT-03225, Vilnius, Lithuania
| |
Collapse
|
13
|
Chemoinformatics and QSAR. Adv Bioinformatics 2021. [DOI: 10.1007/978-981-33-6191-1_10] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022] Open
|
14
|
Achary PGR. Applications of Quantitative Structure-Activity Relationships (QSAR) based Virtual Screening in Drug Design: A Review. Mini Rev Med Chem 2020; 20:1375-1388. [DOI: 10.2174/1389557520666200429102334] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2019] [Revised: 11/07/2019] [Accepted: 11/08/2019] [Indexed: 12/18/2022]
Abstract
The scientists, and the researchers around the globe generate tremendous amount of information
everyday; for instance, so far more than 74 million molecules are registered in Chemical
Abstract Services. According to a recent study, at present we have around 1060 molecules, which are
classified as new drug-like molecules. The library of such molecules is now considered as ‘dark chemical
space’ or ‘dark chemistry.’ Now, in order to explore such hidden molecules scientifically, a good
number of live and updated databases (protein, cell, tissues, structure, drugs, etc.) are available today.
The synchronization of the three different sciences: ‘genomics’, proteomics and ‘in-silico simulation’
will revolutionize the process of drug discovery. The screening of a sizable number of drugs like molecules
is a challenge and it must be treated in an efficient manner. Virtual screening (VS) is an important
computational tool in the drug discovery process; however, experimental verification of the
drugs also equally important for the drug development process. The quantitative structure-activity relationship
(QSAR) analysis is one of the machine learning technique, which is extensively used in VS
techniques. QSAR is well-known for its high and fast throughput screening with a satisfactory hit rate.
The QSAR model building involves (i) chemo-genomics data collection from a database or literature
(ii) Calculation of right descriptors from molecular representation (iii) establishing a relationship
(model) between biological activity and the selected descriptors (iv) application of QSAR model to
predict the biological property for the molecules. All the hits obtained by the VS technique needs to be
experimentally verified. The present mini-review highlights: the web-based machine learning tools, the
role of QSAR in VS techniques, successful applications of QSAR based VS leading to the drug discovery
and advantages and challenges of QSAR.
Collapse
Affiliation(s)
- Patnala Ganga Raju Achary
- Department of Chemistry, Faculty of Engineering & Technology (ITER), Siksha ‘O’ Anusandhan, Deemed to be University, Khandagiri Square, Bhubaneswar- 751030, India
| |
Collapse
|
15
|
Zhao L, Ciallella HL, Aleksunes LM, Zhu H. Advancing computer-aided drug discovery (CADD) by big data and data-driven machine learning modeling. Drug Discov Today 2020; 25:1624-1638. [PMID: 32663517 PMCID: PMC7572559 DOI: 10.1016/j.drudis.2020.07.005] [Citation(s) in RCA: 66] [Impact Index Per Article: 16.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2020] [Revised: 06/26/2020] [Accepted: 07/06/2020] [Indexed: 02/06/2023]
Abstract
Advancing a new drug to market requires substantial investments in time as well as financial resources. Crucial bioactivities for drug candidates, including their efficacy, pharmacokinetics (PK), and adverse effects, need to be investigated during drug development. With advancements in chemical synthesis and biological screening technologies over the past decade, a large amount of biological data points for millions of small molecules have been generated and are stored in various databases. These accumulated data, combined with new machine learning (ML) approaches, such as deep learning, have shown great potential to provide insights into relevant chemical structures to predict in vitro, in vivo, and clinical outcomes, thereby advancing drug discovery and development in the big data era.
Collapse
Affiliation(s)
- Linlin Zhao
- The Rutgers Center for Computational and Integrative Biology, Camden, NJ 08102, USA
| | - Heather L Ciallella
- The Rutgers Center for Computational and Integrative Biology, Camden, NJ 08102, USA
| | - Lauren M Aleksunes
- Department of Pharmacology and Toxicology, Ernest Mario School of Pharmacy, Rutgers University, Piscataway, NJ 08854, USA
| | - Hao Zhu
- The Rutgers Center for Computational and Integrative Biology, Camden, NJ 08102, USA; Department of Chemistry, Rutgers University, Camden, NJ 08102, USA.
| |
Collapse
|
16
|
Mansouri K, Kleinstreuer N, Abdelaziz AM, Alberga D, Alves VM, Andersson PL, Andrade CH, Bai F, Balabin I, Ballabio D, Benfenati E, Bhhatarai B, Boyer S, Chen J, Consonni V, Farag S, Fourches D, García-Sosa AT, Gramatica P, Grisoni F, Grulke CM, Hong H, Horvath D, Hu X, Huang R, Jeliazkova N, Li J, Li X, Liu H, Manganelli S, Mangiatordi GF, Maran U, Marcou G, Martin T, Muratov E, Nguyen DT, Nicolotti O, Nikolov NG, Norinder U, Papa E, Petitjean M, Piir G, Pogodin P, Poroikov V, Qiao X, Richard AM, Roncaglioni A, Ruiz P, Rupakheti C, Sakkiah S, Sangion A, Schramm KW, Selvaraj C, Shah I, Sild S, Sun L, Taboureau O, Tang Y, Tetko IV, Todeschini R, Tong W, Trisciuzzi D, Tropsha A, Van Den Driessche G, Varnek A, Wang Z, Wedebye EB, Williams AJ, Xie H, Zakharov AV, Zheng Z, Judson RS. CoMPARA: Collaborative Modeling Project for Androgen Receptor Activity. ENVIRONMENTAL HEALTH PERSPECTIVES 2020; 128:27002. [PMID: 32074470 DOI: 10.23645/epacomptox.5176876] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 05/24/2023]
Abstract
BACKGROUND Endocrine disrupting chemicals (EDCs) are xenobiotics that mimic the interaction of natural hormones and alter synthesis, transport, or metabolic pathways. The prospect of EDCs causing adverse health effects in humans and wildlife has led to the development of scientific and regulatory approaches for evaluating bioactivity. This need is being addressed using high-throughput screening (HTS) in vitro approaches and computational modeling. OBJECTIVES In support of the Endocrine Disruptor Screening Program, the U.S. Environmental Protection Agency (EPA) led two worldwide consortiums to virtually screen chemicals for their potential estrogenic and androgenic activities. Here, we describe the Collaborative Modeling Project for Androgen Receptor Activity (CoMPARA) efforts, which follows the steps of the Collaborative Estrogen Receptor Activity Prediction Project (CERAPP). METHODS The CoMPARA list of screened chemicals built on CERAPP's list of 32,464 chemicals to include additional chemicals of interest, as well as simulated ToxCast™ metabolites, totaling 55,450 chemical structures. Computational toxicology scientists from 25 international groups contributed 91 predictive models for binding, agonist, and antagonist activity predictions. Models were underpinned by a common training set of 1,746 chemicals compiled from a combined data set of 11 ToxCast™/Tox21 HTS in vitro assays. RESULTS The resulting models were evaluated using curated literature data extracted from different sources. To overcome the limitations of single-model approaches, CoMPARA predictions were combined into consensus models that provided averaged predictive accuracy of approximately 80% for the evaluation set. DISCUSSION The strengths and limitations of the consensus predictions were discussed with example chemicals; then, the models were implemented into the free and open-source OPERA application to enable screening of new chemicals with a defined applicability domain and accuracy assessment. This implementation was used to screen the entire EPA DSSTox database of ∼875,000 chemicals, and their predicted AR activities have been made available on the EPA CompTox Chemicals dashboard and National Toxicology Program's Integrated Chemical Environment. https://doi.org/10.1289/EHP5580.
Collapse
Affiliation(s)
- Kamel Mansouri
- National Center for Computational Toxicology, Office of Research and Development, U.S. Environmental Protection Agency (U.S. EPA), Research Triangle Park, North Carolina, USA
- ScitoVation LLC, Research Triangle Park, North Carolina, USA
- Integrated Laboratory Systems, Inc., Morrisville, North Carolina, USA
| | - Nicole Kleinstreuer
- National Toxicology Program Interagency Center for the Evaluation of Alternative Toxicological Methods (NICEATM), National Institute of Environmental Health Sciences, Research Triangle Park, North Carolina, USA
| | - Ahmed M Abdelaziz
- Technische Universität München, Wissenschaftszentrum Weihenstephan für Ernährung, Landnutzung und Umwelt, Department für Biowissenschaftliche Grundlagen, Weihenstephaner Steig 23, 85350 Freising, Germany
| | - Domenico Alberga
- Department of Pharmacy-Drug Sciences, University of Bari, Bari, Italy
| | - Vinicius M Alves
- Laboratory for Molecular Modeling and Drug Design, Faculty of Pharmacy, Federal University of Goiás, Goiânia, Brazil
- Laboratory for Molecular Modeling, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | | | - Carolina H Andrade
- Laboratory for Molecular Modeling and Drug Design, Faculty of Pharmacy, Federal University of Goiás, Goiânia, Brazil
| | - Fang Bai
- School of Pharmacy, Lanzhou University, China
| | - Ilya Balabin
- Information Systems & Global Solutions (IS&GS), Lockheed Martin, USA
| | - Davide Ballabio
- Milano Chemometrics and QSAR Research Group, Department of Earth and Environmental Sciences, University of Milano-Bicocca, Milan, Italy
| | - Emilio Benfenati
- Istituto di Ricerche Farmacologiche "Mario Negri", IRCCS, Milan, Italy
| | - Barun Bhhatarai
- QSAR Research Unit in Environmental Chemistry and Ecotoxicology, Department of Theoretical and Applied Sciences, University of Insubria, Varese, Italy
| | - Scott Boyer
- Swedish Toxicology Sciences Research Center, Karolinska Institutet, Södertälje, Sweden
| | - Jingwen Chen
- School of Environmental Science and Technology, Dalian University of Technology, Dalian, China
| | - Viviana Consonni
- Milano Chemometrics and QSAR Research Group, Department of Earth and Environmental Sciences, University of Milano-Bicocca, Milan, Italy
| | - Sherif Farag
- Laboratory for Molecular Modeling, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Denis Fourches
- Department of Chemistry, Bioinformatics Research Center, North Carolina State University, Raleigh, North Carolina, USA
| | | | - Paola Gramatica
- QSAR Research Unit in Environmental Chemistry and Ecotoxicology, Department of Theoretical and Applied Sciences, University of Insubria, Varese, Italy
| | - Francesca Grisoni
- Milano Chemometrics and QSAR Research Group, Department of Earth and Environmental Sciences, University of Milano-Bicocca, Milan, Italy
| | - Chris M Grulke
- National Center for Computational Toxicology, Office of Research and Development, U.S. Environmental Protection Agency (U.S. EPA), Research Triangle Park, North Carolina, USA
| | - Huixiao Hong
- Division of Bioinformatics and Biostatistics, National Center for Toxicology Research, U.S. Food and Drug Administration, Jefferson, Arkansas, USA
| | - Dragos Horvath
- Laboratoire de Chémoinformatique-UMR7140, University of Strasbourg/CNRS, Strasbourg, France
| | - Xin Hu
- National Center for Advancing Translational Sciences, National Institutes of Health, Rockville, Maryland, USA
| | - Ruili Huang
- National Center for Advancing Translational Sciences, National Institutes of Health, Rockville, Maryland, USA
| | | | - Jiazhong Li
- School of Pharmacy, Lanzhou University, China
| | - Xuehua Li
- School of Environmental Science and Technology, Dalian University of Technology, Dalian, China
| | | | - Serena Manganelli
- Istituto di Ricerche Farmacologiche "Mario Negri", IRCCS, Milan, Italy
| | | | - Uko Maran
- Institute of Chemistry, University of Tartu, Tartu, Estonia
| | - Gilles Marcou
- Laboratoire de Chémoinformatique-UMR7140, University of Strasbourg/CNRS, Strasbourg, France
| | - Todd Martin
- National Risk Management Research Laboratory, U.S. EPA, Cincinnati, Ohio, USA
| | - Eugene Muratov
- Laboratory for Molecular Modeling, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Dac-Trung Nguyen
- National Center for Advancing Translational Sciences, National Institutes of Health, Rockville, Maryland, USA
| | - Orazio Nicolotti
- Department of Pharmacy-Drug Sciences, University of Bari, Bari, Italy
| | - Nikolai G Nikolov
- Division of Risk Assessment and Nutrition, National Food Institute, Technical University of Denmark, Copenhagen, Denmark
| | - Ulf Norinder
- Swedish Toxicology Sciences Research Center, Karolinska Institutet, Södertälje, Sweden
| | - Ester Papa
- QSAR Research Unit in Environmental Chemistry and Ecotoxicology, Department of Theoretical and Applied Sciences, University of Insubria, Varese, Italy
| | - Michel Petitjean
- Computational Modeling of Protein-Ligand Interactions (CMPLI)-INSERM UMR 8251, INSERM ERL U1133, Functional and Adaptative Biology (BFA), Universite de Paris, Paris, France
| | - Geven Piir
- Institute of Chemistry, University of Tartu, Tartu, Estonia
| | - Pavel Pogodin
- Institute of Biomedical Chemistry IBMC, 10 Building 8, Pogodinskaya st., Moscow 119121, Russia
| | - Vladimir Poroikov
- Institute of Biomedical Chemistry IBMC, 10 Building 8, Pogodinskaya st., Moscow 119121, Russia
| | - Xianliang Qiao
- School of Environmental Science and Technology, Dalian University of Technology, Dalian, China
| | - Ann M Richard
- National Center for Computational Toxicology, Office of Research and Development, U.S. Environmental Protection Agency (U.S. EPA), Research Triangle Park, North Carolina, USA
| | | | - Patricia Ruiz
- Computational Toxicology and Methods Development Laboratory, Division of Toxicology and Human Health Sciences, Agency for Toxic Substances and Disease Registry, Centers for Disease Control and Prevention, Atlanta, Georgia, USA
| | - Chetan Rupakheti
- National Risk Management Research Laboratory, U.S. EPA, Cincinnati, Ohio, USA
- Department of Biochemistry and Molecular Biophysics, University of Chicago, Chicago, Illinois, USA
| | - Sugunadevi Sakkiah
- Division of Bioinformatics and Biostatistics, National Center for Toxicology Research, U.S. Food and Drug Administration, Jefferson, Arkansas, USA
| | - Alessandro Sangion
- QSAR Research Unit in Environmental Chemistry and Ecotoxicology, Department of Theoretical and Applied Sciences, University of Insubria, Varese, Italy
| | - Karl-Werner Schramm
- Technische Universität München, Wissenschaftszentrum Weihenstephan für Ernährung, Landnutzung und Umwelt, Department für Biowissenschaftliche Grundlagen, Weihenstephaner Steig 23, 85350 Freising, Germany
| | - Chandrabose Selvaraj
- Division of Bioinformatics and Biostatistics, National Center for Toxicology Research, U.S. Food and Drug Administration, Jefferson, Arkansas, USA
| | - Imran Shah
- National Center for Computational Toxicology, Office of Research and Development, U.S. Environmental Protection Agency (U.S. EPA), Research Triangle Park, North Carolina, USA
| | - Sulev Sild
- Institute of Chemistry, University of Tartu, Tartu, Estonia
| | - Lixia Sun
- Department of Pharmaceutical Sciences, School of Pharmacy, East China University of Science and Technology, Shanghai, China
| | - Olivier Taboureau
- Computational Modeling of Protein-Ligand Interactions (CMPLI)-INSERM UMR 8251, INSERM ERL U1133, Functional and Adaptative Biology (BFA), Universite de Paris, Paris, France
| | - Yun Tang
- Department of Pharmaceutical Sciences, School of Pharmacy, East China University of Science and Technology, Shanghai, China
| | - Igor V Tetko
- BIGCHEM GmbH, Neuherberg, Germany
- Helmholtz Zentrum Muenchen - German Research Center for Environmental Health (GmbH), Neuherberg, Germany
| | - Roberto Todeschini
- Milano Chemometrics and QSAR Research Group, Department of Earth and Environmental Sciences, University of Milano-Bicocca, Milan, Italy
| | - Weida Tong
- Division of Bioinformatics and Biostatistics, National Center for Toxicology Research, U.S. Food and Drug Administration, Jefferson, Arkansas, USA
| | | | - Alexander Tropsha
- Laboratory for Molecular Modeling, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - George Van Den Driessche
- Department of Chemistry, Bioinformatics Research Center, North Carolina State University, Raleigh, North Carolina, USA
| | - Alexandre Varnek
- Laboratoire de Chémoinformatique-UMR7140, University of Strasbourg/CNRS, Strasbourg, France
| | - Zhongyu Wang
- School of Environmental Science and Technology, Dalian University of Technology, Dalian, China
| | - Eva B Wedebye
- Division of Risk Assessment and Nutrition, National Food Institute, Technical University of Denmark, Copenhagen, Denmark
| | - Antony J Williams
- National Center for Computational Toxicology, Office of Research and Development, U.S. Environmental Protection Agency (U.S. EPA), Research Triangle Park, North Carolina, USA
| | - Hongbin Xie
- School of Environmental Science and Technology, Dalian University of Technology, Dalian, China
| | - Alexey V Zakharov
- National Center for Advancing Translational Sciences, National Institutes of Health, Rockville, Maryland, USA
| | - Ziye Zheng
- Chemistry Department, Umeå University, Umeå, Sweden
| | - Richard S Judson
- National Center for Computational Toxicology, Office of Research and Development, U.S. Environmental Protection Agency (U.S. EPA), Research Triangle Park, North Carolina, USA
| |
Collapse
|
17
|
Mansouri K, Kleinstreuer N, Abdelaziz AM, Alberga D, Alves VM, Andersson PL, Andrade CH, Bai F, Balabin I, Ballabio D, Benfenati E, Bhhatarai B, Boyer S, Chen J, Consonni V, Farag S, Fourches D, García-Sosa AT, Gramatica P, Grisoni F, Grulke CM, Hong H, Horvath D, Hu X, Huang R, Jeliazkova N, Li J, Li X, Liu H, Manganelli S, Mangiatordi GF, Maran U, Marcou G, Martin T, Muratov E, Nguyen DT, Nicolotti O, Nikolov NG, Norinder U, Papa E, Petitjean M, Piir G, Pogodin P, Poroikov V, Qiao X, Richard AM, Roncaglioni A, Ruiz P, Rupakheti C, Sakkiah S, Sangion A, Schramm KW, Selvaraj C, Shah I, Sild S, Sun L, Taboureau O, Tang Y, Tetko IV, Todeschini R, Tong W, Trisciuzzi D, Tropsha A, Van Den Driessche G, Varnek A, Wang Z, Wedebye EB, Williams AJ, Xie H, Zakharov AV, Zheng Z, Judson RS. CoMPARA: Collaborative Modeling Project for Androgen Receptor Activity. ENVIRONMENTAL HEALTH PERSPECTIVES 2020; 128:27002. [PMID: 32074470 PMCID: PMC7064318 DOI: 10.1289/ehp5580] [Citation(s) in RCA: 96] [Impact Index Per Article: 24.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/06/2019] [Revised: 11/27/2019] [Accepted: 12/05/2019] [Indexed: 05/04/2023]
Abstract
BACKGROUND Endocrine disrupting chemicals (EDCs) are xenobiotics that mimic the interaction of natural hormones and alter synthesis, transport, or metabolic pathways. The prospect of EDCs causing adverse health effects in humans and wildlife has led to the development of scientific and regulatory approaches for evaluating bioactivity. This need is being addressed using high-throughput screening (HTS) in vitro approaches and computational modeling. OBJECTIVES In support of the Endocrine Disruptor Screening Program, the U.S. Environmental Protection Agency (EPA) led two worldwide consortiums to virtually screen chemicals for their potential estrogenic and androgenic activities. Here, we describe the Collaborative Modeling Project for Androgen Receptor Activity (CoMPARA) efforts, which follows the steps of the Collaborative Estrogen Receptor Activity Prediction Project (CERAPP). METHODS The CoMPARA list of screened chemicals built on CERAPP's list of 32,464 chemicals to include additional chemicals of interest, as well as simulated ToxCast™ metabolites, totaling 55,450 chemical structures. Computational toxicology scientists from 25 international groups contributed 91 predictive models for binding, agonist, and antagonist activity predictions. Models were underpinned by a common training set of 1,746 chemicals compiled from a combined data set of 11 ToxCast™/Tox21 HTS in vitro assays. RESULTS The resulting models were evaluated using curated literature data extracted from different sources. To overcome the limitations of single-model approaches, CoMPARA predictions were combined into consensus models that provided averaged predictive accuracy of approximately 80% for the evaluation set. DISCUSSION The strengths and limitations of the consensus predictions were discussed with example chemicals; then, the models were implemented into the free and open-source OPERA application to enable screening of new chemicals with a defined applicability domain and accuracy assessment. This implementation was used to screen the entire EPA DSSTox database of ∼ 875,000 chemicals, and their predicted AR activities have been made available on the EPA CompTox Chemicals dashboard and National Toxicology Program's Integrated Chemical Environment. https://doi.org/10.1289/EHP5580.
Collapse
Affiliation(s)
- Kamel Mansouri
- National Center for Computational Toxicology, Office of Research and Development, U.S. Environmental Protection Agency (U.S. EPA), Research Triangle Park, North Carolina, USA
- ScitoVation LLC, Research Triangle Park, North Carolina, USA
- Integrated Laboratory Systems, Inc., Morrisville, North Carolina, USA
| | - Nicole Kleinstreuer
- National Toxicology Program Interagency Center for the Evaluation of Alternative Toxicological Methods (NICEATM), National Institute of Environmental Health Sciences, Research Triangle Park, North Carolina, USA
| | - Ahmed M. Abdelaziz
- Technische Universität München, Wissenschaftszentrum Weihenstephan für Ernährung, Landnutzung und Umwelt, Department für Biowissenschaftliche Grundlagen, Weihenstephaner Steig 23, 85350 Freising, Germany
| | - Domenico Alberga
- Department of Pharmacy-Drug Sciences, University of Bari, Bari, Italy
| | - Vinicius M. Alves
- Laboratory for Molecular Modeling and Drug Design, Faculty of Pharmacy, Federal University of Goiás, Goiânia, Brazil
- Laboratory for Molecular Modeling, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | | | - Carolina H. Andrade
- Laboratory for Molecular Modeling and Drug Design, Faculty of Pharmacy, Federal University of Goiás, Goiânia, Brazil
| | - Fang Bai
- School of Pharmacy, Lanzhou University, China
| | - Ilya Balabin
- Information Systems & Global Solutions (IS&GS), Lockheed Martin, USA
| | - Davide Ballabio
- Milano Chemometrics and QSAR Research Group, Department of Earth and Environmental Sciences, University of Milano-Bicocca, Milan, Italy
| | - Emilio Benfenati
- Istituto di Ricerche Farmacologiche “Mario Negri”, IRCCS, Milan, Italy
| | - Barun Bhhatarai
- QSAR Research Unit in Environmental Chemistry and Ecotoxicology, Department of Theoretical and Applied Sciences, University of Insubria, Varese, Italy
| | - Scott Boyer
- Swedish Toxicology Sciences Research Center, Karolinska Institutet, Södertälje, Sweden
| | - Jingwen Chen
- School of Environmental Science and Technology, Dalian University of Technology, Dalian, China
| | - Viviana Consonni
- Milano Chemometrics and QSAR Research Group, Department of Earth and Environmental Sciences, University of Milano-Bicocca, Milan, Italy
| | - Sherif Farag
- Laboratory for Molecular Modeling, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Denis Fourches
- Department of Chemistry, Bioinformatics Research Center, North Carolina State University, Raleigh, North Carolina, USA
| | | | - Paola Gramatica
- QSAR Research Unit in Environmental Chemistry and Ecotoxicology, Department of Theoretical and Applied Sciences, University of Insubria, Varese, Italy
| | - Francesca Grisoni
- Milano Chemometrics and QSAR Research Group, Department of Earth and Environmental Sciences, University of Milano-Bicocca, Milan, Italy
| | - Chris M. Grulke
- National Center for Computational Toxicology, Office of Research and Development, U.S. Environmental Protection Agency (U.S. EPA), Research Triangle Park, North Carolina, USA
| | - Huixiao Hong
- Division of Bioinformatics and Biostatistics, National Center for Toxicology Research, U.S. Food and Drug Administration, Jefferson, Arkansas, USA
| | - Dragos Horvath
- Laboratoire de Chémoinformatique—UMR7140, University of Strasbourg/CNRS, Strasbourg, France
| | - Xin Hu
- National Center for Advancing Translational Sciences, National Institutes of Health, Rockville, Maryland, USA
| | - Ruili Huang
- National Center for Advancing Translational Sciences, National Institutes of Health, Rockville, Maryland, USA
| | | | - Jiazhong Li
- School of Pharmacy, Lanzhou University, China
| | - Xuehua Li
- School of Environmental Science and Technology, Dalian University of Technology, Dalian, China
| | | | - Serena Manganelli
- Istituto di Ricerche Farmacologiche “Mario Negri”, IRCCS, Milan, Italy
| | | | - Uko Maran
- Institute of Chemistry, University of Tartu, Tartu, Estonia
| | - Gilles Marcou
- Laboratoire de Chémoinformatique—UMR7140, University of Strasbourg/CNRS, Strasbourg, France
| | - Todd Martin
- National Risk Management Research Laboratory, U.S. EPA, Cincinnati, Ohio, USA
| | - Eugene Muratov
- Laboratory for Molecular Modeling, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Dac-Trung Nguyen
- National Center for Advancing Translational Sciences, National Institutes of Health, Rockville, Maryland, USA
| | - Orazio Nicolotti
- Department of Pharmacy-Drug Sciences, University of Bari, Bari, Italy
| | - Nikolai G. Nikolov
- Division of Risk Assessment and Nutrition, National Food Institute, Technical University of Denmark, Copenhagen, Denmark
| | - Ulf Norinder
- Swedish Toxicology Sciences Research Center, Karolinska Institutet, Södertälje, Sweden
| | - Ester Papa
- QSAR Research Unit in Environmental Chemistry and Ecotoxicology, Department of Theoretical and Applied Sciences, University of Insubria, Varese, Italy
| | - Michel Petitjean
- Computational Modeling of Protein-Ligand Interactions (CMPLI)–INSERM UMR 8251, INSERM ERL U1133, Functional and Adaptative Biology (BFA), Universite de Paris, Paris, France
| | - Geven Piir
- Institute of Chemistry, University of Tartu, Tartu, Estonia
| | - Pavel Pogodin
- Institute of Biomedical Chemistry IBMC, 10 Building 8, Pogodinskaya st., Moscow 119121, Russia
| | - Vladimir Poroikov
- Institute of Biomedical Chemistry IBMC, 10 Building 8, Pogodinskaya st., Moscow 119121, Russia
| | - Xianliang Qiao
- School of Environmental Science and Technology, Dalian University of Technology, Dalian, China
| | - Ann M. Richard
- National Center for Computational Toxicology, Office of Research and Development, U.S. Environmental Protection Agency (U.S. EPA), Research Triangle Park, North Carolina, USA
| | | | - Patricia Ruiz
- Computational Toxicology and Methods Development Laboratory, Division of Toxicology and Human Health Sciences, Agency for Toxic Substances and Disease Registry, Centers for Disease Control and Prevention, Atlanta, Georgia, USA
| | - Chetan Rupakheti
- National Risk Management Research Laboratory, U.S. EPA, Cincinnati, Ohio, USA
- Department of Biochemistry and Molecular Biophysics, University of Chicago, Chicago, Illinois, USA
| | - Sugunadevi Sakkiah
- Division of Bioinformatics and Biostatistics, National Center for Toxicology Research, U.S. Food and Drug Administration, Jefferson, Arkansas, USA
| | - Alessandro Sangion
- QSAR Research Unit in Environmental Chemistry and Ecotoxicology, Department of Theoretical and Applied Sciences, University of Insubria, Varese, Italy
| | - Karl-Werner Schramm
- Technische Universität München, Wissenschaftszentrum Weihenstephan für Ernährung, Landnutzung und Umwelt, Department für Biowissenschaftliche Grundlagen, Weihenstephaner Steig 23, 85350 Freising, Germany
| | - Chandrabose Selvaraj
- Division of Bioinformatics and Biostatistics, National Center for Toxicology Research, U.S. Food and Drug Administration, Jefferson, Arkansas, USA
| | - Imran Shah
- National Center for Computational Toxicology, Office of Research and Development, U.S. Environmental Protection Agency (U.S. EPA), Research Triangle Park, North Carolina, USA
| | - Sulev Sild
- Institute of Chemistry, University of Tartu, Tartu, Estonia
| | - Lixia Sun
- Department of Pharmaceutical Sciences, School of Pharmacy, East China University of Science and Technology, Shanghai, China
| | - Olivier Taboureau
- Computational Modeling of Protein-Ligand Interactions (CMPLI)–INSERM UMR 8251, INSERM ERL U1133, Functional and Adaptative Biology (BFA), Universite de Paris, Paris, France
| | - Yun Tang
- Department of Pharmaceutical Sciences, School of Pharmacy, East China University of Science and Technology, Shanghai, China
| | - Igor V. Tetko
- BIGCHEM GmbH, Neuherberg, Germany
- Helmholtz Zentrum Muenchen – German Research Center for Environmental Health (GmbH), Neuherberg, Germany
| | - Roberto Todeschini
- Milano Chemometrics and QSAR Research Group, Department of Earth and Environmental Sciences, University of Milano-Bicocca, Milan, Italy
| | - Weida Tong
- Division of Bioinformatics and Biostatistics, National Center for Toxicology Research, U.S. Food and Drug Administration, Jefferson, Arkansas, USA
| | | | - Alexander Tropsha
- Laboratory for Molecular Modeling, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - George Van Den Driessche
- Department of Chemistry, Bioinformatics Research Center, North Carolina State University, Raleigh, North Carolina, USA
| | - Alexandre Varnek
- Laboratoire de Chémoinformatique—UMR7140, University of Strasbourg/CNRS, Strasbourg, France
| | - Zhongyu Wang
- School of Environmental Science and Technology, Dalian University of Technology, Dalian, China
| | - Eva B. Wedebye
- Division of Risk Assessment and Nutrition, National Food Institute, Technical University of Denmark, Copenhagen, Denmark
| | - Antony J. Williams
- National Center for Computational Toxicology, Office of Research and Development, U.S. Environmental Protection Agency (U.S. EPA), Research Triangle Park, North Carolina, USA
| | - Hongbin Xie
- School of Environmental Science and Technology, Dalian University of Technology, Dalian, China
| | - Alexey V. Zakharov
- National Center for Advancing Translational Sciences, National Institutes of Health, Rockville, Maryland, USA
| | - Ziye Zheng
- Chemistry Department, Umeå University, Umeå, Sweden
| | - Richard S. Judson
- National Center for Computational Toxicology, Office of Research and Development, U.S. Environmental Protection Agency (U.S. EPA), Research Triangle Park, North Carolina, USA
| |
Collapse
|
18
|
Ambure P, Cordeiro MNDS. Importance of Data Curation in QSAR Studies Especially While Modeling Large-Size Datasets. METHODS IN PHARMACOLOGY AND TOXICOLOGY 2020. [DOI: 10.1007/978-1-0716-0150-1_5] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
|
19
|
Grulke CM, Williams AJ, Thillanadarajah I, Richard AM. EPA's DSSTox database: History of development of a curated chemistry resource supporting computational toxicology research. ACTA ACUST UNITED AC 2019; 12. [PMID: 33426407 PMCID: PMC7787967 DOI: 10.1016/j.comtox.2019.100096] [Citation(s) in RCA: 93] [Impact Index Per Article: 18.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
The US Environmental Protection Agency's (EPA) Distributed Structure-Searchable Toxicity (DSSTox) database, launched publicly in 2004, currently exceeds 875 K substances spanning hundreds of lists of interest to EPA and environmental researchers. From its inception, DSSTox has focused curation efforts on resolving chemical identifier errors and conflicts in the public domain towards the goal of assigning accurate chemical structures to data and lists of importance to the environmental research and regulatory community. Accurate structure-data associations, in turn, are necessary inputs to structure-based predictive models supporting hazard and risk assessments. In 2014, the legacy, manually curated DSSTox_V1 content was migrated to a MySQL data model, with modern cheminformatics tools supporting both manual and automated curation processes to increase efficiencies. This was followed by sequential auto-loads of filtered portions of three public datasets: EPA's Substance Registry Services (SRS), the National Library of Medicine's ChemID, and PubChem. This process was constrained by a key requirement of uniquely mapped identifiers (i.e., CAS RN, name and structure) for each substance, rejecting content where any two identifiers were conflicted either within or across datasets. This rejected content highlighted the degree of conflicting, inaccurate substance-structure ID mappings in the public domain, ranging from 12% (within EPA SRS) to 49% (across ChemID and PubChem). Substances successfully added to DSSTox from each auto-load were assigned to one of five qc_levels, conveying curator confidence in each dataset. This process enabled a significant expansion of DSSTox content to provide better coverage of the chemical landscape of interest to environmental scientists, while retaining focus on the accuracy of substance-structure-data associations. Currently, DSSTox serves as the core foundation of EPA's CompTox Chemicals Dashboard [https://comptox.epa.gov/dashboard], which provides public access to DSSTox content in support of a broad range of modeling and research activities within EPA and, increasingly, across the field of computational toxicology.
Collapse
Affiliation(s)
- Christopher M Grulke
- National Center for Computational Toxicology, Office of Research & Development, US Environmental Protection Agency, Mail Drop D143-02, Research Triangle Park, NC 27711, USA
| | - Antony J Williams
- National Center for Computational Toxicology, Office of Research & Development, US Environmental Protection Agency, Mail Drop D143-02, Research Triangle Park, NC 27711, USA
| | - Inthirany Thillanadarajah
- Senior Environmental Employment Program, US Environmental Protection Agency, Research Triangle Park, NC 27711, USA
| | - Ann M Richard
- National Center for Computational Toxicology, Office of Research & Development, US Environmental Protection Agency, Mail Drop D143-02, Research Triangle Park, NC 27711, USA
| |
Collapse
|
20
|
Fan F, Toledo Warshaviak D, Hamadeh HK, Dunn RT. The integration of pharmacophore-based 3D QSAR modeling and virtual screening in safety profiling: A case study to identify antagonistic activities against adenosine receptor, A2A, using 1,897 known drugs. PLoS One 2019; 14:e0204378. [PMID: 30605479 PMCID: PMC6317804 DOI: 10.1371/journal.pone.0204378] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2018] [Accepted: 12/12/2018] [Indexed: 12/23/2022] Open
Abstract
Safety pharmacology screening against a wide range of unintended vital targets using in vitro assays is crucial to understand off-target interactions with drug candidates. With the increasing demand for in vitro assays, ligand- and structure-based virtual screening approaches have been evaluated for potential utilization in safety profiling. Although ligand based approaches have been actively applied in retrospective analysis or prospectively within well-defined chemical space during the early discovery stage (i.e., HTS screening and lead optimization), virtual screening is rarely implemented in later stage of drug discovery (i.e., safety). Here we present a case study to evaluate ligand-based 3D QSAR models built based on in vitro antagonistic activity data against adenosine receptor 2A (A2A). The resulting models, obtained from 268 chemically diverse compounds, were used to test a set of 1,897 chemically distinct drugs, simulating the real-world challenge of safety screening when presented with novel chemistry and a limited training set. Due to the unique requirements of safety screening versus discovery screening, the limitations of 3D QSAR methods (i.e., chemotypes, dependence on large training set, and prone to false positives) are less critical than early discovery screen. We demonstrated that 3D QSAR modeling can be effectively applied in safety assessment prior to in vitro assays, even with chemotypes that are drastically different from training compounds. It is also worth noting that our model is able to adequately make the mechanistic distinction between agonists and antagonists, which is important to inform subsequent in vivo studies. Overall, we present an in-depth analysis of the appropriate utilization and interpretation of pharmacophore-based 3D QSAR models for safety screening.
Collapse
Affiliation(s)
- Fan Fan
- Amgen Research, Department of Comparative Biology and Safety Sciences, Thousand Oaks, CA, United States of America
- * E-mail:
| | - Dora Toledo Warshaviak
- Schrodinger Inc., San Diego, CA, United States of America
- Department of Molecular Engineering, Amgen Inc., Thousand Oaks, CA, United States of America
| | - Hisham K. Hamadeh
- Amgen Research, Department of Comparative Biology and Safety Sciences, Thousand Oaks, CA, United States of America
| | - Robert T. Dunn
- Amgen Research, Department of Comparative Biology and Safety Sciences, Thousand Oaks, CA, United States of America
| |
Collapse
|
21
|
Neves BJ, Braga RC, Melo-Filho CC, Moreira-Filho JT, Muratov EN, Andrade CH. QSAR-Based Virtual Screening: Advances and Applications in Drug Discovery. Front Pharmacol 2018; 9:1275. [PMID: 30524275 PMCID: PMC6262347 DOI: 10.3389/fphar.2018.01275] [Citation(s) in RCA: 190] [Impact Index Per Article: 31.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2018] [Accepted: 10/18/2018] [Indexed: 02/03/2023] Open
Abstract
Virtual screening (VS) has emerged in drug discovery as a powerful computational approach to screen large libraries of small molecules for new hits with desired properties that can then be tested experimentally. Similar to other computational approaches, VS intention is not to replace in vitro or in vivo assays, but to speed up the discovery process, to reduce the number of candidates to be tested experimentally, and to rationalize their choice. Moreover, VS has become very popular in pharmaceutical companies and academic organizations due to its time-, cost-, resources-, and labor-saving. Among the VS approaches, quantitative structure–activity relationship (QSAR) analysis is the most powerful method due to its high and fast throughput and good hit rate. As the first preliminary step of a QSAR model development, relevant chemogenomics data are collected from databases and the literature. Then, chemical descriptors are calculated on different levels of representation of molecular structure, ranging from 1D to nD, and then correlated with the biological property using machine learning techniques. Once developed and validated, QSAR models are applied to predict the biological property of novel compounds. Although the experimental testing of computational hits is not an inherent part of QSAR methodology, it is highly desired and should be performed as an ultimate validation of developed models. In this mini-review, we summarize and critically analyze the recent trends of QSAR-based VS in drug discovery and demonstrate successful applications in identifying perspective compounds with desired properties. Moreover, we provide some recommendations about the best practices for QSAR-based VS along with the future perspectives of this approach.
Collapse
Affiliation(s)
- Bruno J Neves
- LabMol - Laboratory for Molecular Modeling and Drug Design, Faculdade de Farmácia, Universidade Federal de Goiás, Goiânia, Brazil.,Laboratory of Cheminformatics, Centro Universitário de Anápolis (UniEVANGÉLICA), Anápolis, Brazil
| | - Rodolpho C Braga
- LabMol - Laboratory for Molecular Modeling and Drug Design, Faculdade de Farmácia, Universidade Federal de Goiás, Goiânia, Brazil
| | - Cleber C Melo-Filho
- LabMol - Laboratory for Molecular Modeling and Drug Design, Faculdade de Farmácia, Universidade Federal de Goiás, Goiânia, Brazil
| | - José Teófilo Moreira-Filho
- LabMol - Laboratory for Molecular Modeling and Drug Design, Faculdade de Farmácia, Universidade Federal de Goiás, Goiânia, Brazil
| | - Eugene N Muratov
- Laboratory for Molecular Modeling, Division of Chemical Biology and Medicinal Chemistry, Eshelman School of Pharmacy, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States.,Department of Chemical Technology, Odessa National Polytechnic University, Odessa, Ukraine
| | - Carolina Horta Andrade
- LabMol - Laboratory for Molecular Modeling and Drug Design, Faculdade de Farmácia, Universidade Federal de Goiás, Goiânia, Brazil
| |
Collapse
|
22
|
Sobus JR, Wambaugh JF, Isaacs KK, Williams AJ, McEachran AD, Richard AM, Grulke CM, Ulrich EM, Rager JE, Strynar MJ, Newton SR. Integrating tools for non-targeted analysis research and chemical safety evaluations at the US EPA. JOURNAL OF EXPOSURE SCIENCE & ENVIRONMENTAL EPIDEMIOLOGY 2018; 28:411-426. [PMID: 29288256 PMCID: PMC6661898 DOI: 10.1038/s41370-017-0012-y] [Citation(s) in RCA: 134] [Impact Index Per Article: 22.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/27/2017] [Revised: 08/04/2017] [Accepted: 08/25/2017] [Indexed: 05/18/2023]
Abstract
Tens-of-thousands of chemicals are registered in the U.S. for use in countless processes and products. Recent evidence suggests that many of these chemicals are measureable in environmental and/or biological systems, indicating the potential for widespread exposures. Traditional public health research tools, including in vivo studies and targeted analytical chemistry methods, have been unable to meet the needs of screening programs designed to evaluate chemical safety. As such, new tools have been developed to enable rapid assessment of potentially harmful chemical exposures and their attendant biological responses. One group of tools, known as "non-targeted analysis" (NTA) methods, allows the rapid characterization of thousands of never-before-studied compounds in a wide variety of environmental, residential, and biological media. This article discusses current applications of NTA methods, challenges to their effective use in chemical screening studies, and ways in which shared resources (e.g., chemical standards, databases, model predictions, and media measurements) can advance their use in risk-based chemical prioritization. A brief review is provided of resources and projects within EPA's Office of Research and Development (ORD) that provide benefit to, and receive benefits from, NTA research endeavors. A summary of EPA's Non-Targeted Analysis Collaborative Trial (ENTACT) is also given, which makes direct use of ORD resources to benefit the global NTA research community. Finally, a research framework is described that shows how NTA methods will bridge chemical prioritization efforts within ORD. This framework exists as a guide for institutions seeking to understand the complexity of chemical exposures, and the impact of these exposures on living systems.
Collapse
Affiliation(s)
- Jon R Sobus
- U.S. Environmental Protection Agency, Office of Research and Development, National Exposure Research Laboratory, 109 T.W. Alexander Drive, Research Triangle Park, NC, 27709, USA.
| | - John F Wambaugh
- U.S. Environmental Protection Agency, Office of Research and Development, National Center for Computational Toxicology, 109 T.W. Alexander Drive, Research Triangle Park, NC, 27709, USA
| | - Kristin K Isaacs
- U.S. Environmental Protection Agency, Office of Research and Development, National Exposure Research Laboratory, 109 T.W. Alexander Drive, Research Triangle Park, NC, 27709, USA
| | - Antony J Williams
- U.S. Environmental Protection Agency, Office of Research and Development, National Center for Computational Toxicology, 109 T.W. Alexander Drive, Research Triangle Park, NC, 27709, USA
| | - Andrew D McEachran
- Oak Ridge Institute for Science and Education (ORISE) Participant, 109 T.W. Alexander Drive, Research Triangle Park, NC, 27709, USA
| | - Ann M Richard
- U.S. Environmental Protection Agency, Office of Research and Development, National Center for Computational Toxicology, 109 T.W. Alexander Drive, Research Triangle Park, NC, 27709, USA
| | - Christopher M Grulke
- U.S. Environmental Protection Agency, Office of Research and Development, National Center for Computational Toxicology, 109 T.W. Alexander Drive, Research Triangle Park, NC, 27709, USA
| | - Elin M Ulrich
- U.S. Environmental Protection Agency, Office of Research and Development, National Exposure Research Laboratory, 109 T.W. Alexander Drive, Research Triangle Park, NC, 27709, USA
| | - Julia E Rager
- Oak Ridge Institute for Science and Education (ORISE) Participant, 109 T.W. Alexander Drive, Research Triangle Park, NC, 27709, USA
- ToxStrategies, Inc., 9390 Research Blvd., Suite 100, Austin, TX, 78759, USA
| | - Mark J Strynar
- U.S. Environmental Protection Agency, Office of Research and Development, National Exposure Research Laboratory, 109 T.W. Alexander Drive, Research Triangle Park, NC, 27709, USA
| | - Seth R Newton
- U.S. Environmental Protection Agency, Office of Research and Development, National Exposure Research Laboratory, 109 T.W. Alexander Drive, Research Triangle Park, NC, 27709, USA
| |
Collapse
|
23
|
Basith S, Cui M, Macalino SJY, Park J, Clavio NAB, Kang S, Choi S. Exploring G Protein-Coupled Receptors (GPCRs) Ligand Space via Cheminformatics Approaches: Impact on Rational Drug Design. Front Pharmacol 2018; 9:128. [PMID: 29593527 PMCID: PMC5854945 DOI: 10.3389/fphar.2018.00128] [Citation(s) in RCA: 74] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2017] [Accepted: 02/06/2018] [Indexed: 01/14/2023] Open
Abstract
The primary goal of rational drug discovery is the identification of selective ligands which act on single or multiple drug targets to achieve the desired clinical outcome through the exploration of total chemical space. To identify such desired compounds, computational approaches are necessary in predicting their drug-like properties. G Protein-Coupled Receptors (GPCRs) represent one of the largest and most important integral membrane protein families. These receptors serve as increasingly attractive drug targets due to their relevance in the treatment of various diseases, such as inflammatory disorders, metabolic imbalances, cardiac disorders, cancer, monogenic disorders, etc. In the last decade, multitudes of three-dimensional (3D) structures were solved for diverse GPCRs, thus referring to this period as the "golden age for GPCR structural biology." Moreover, accumulation of data about the chemical properties of GPCR ligands has garnered much interest toward the exploration of GPCR chemical space. Due to the steady increase in the structural, ligand, and functional data of GPCRs, several cheminformatics approaches have been implemented in its drug discovery pipeline. In this review, we mainly focus on the cheminformatics-based paradigms in GPCR drug discovery. We provide a comprehensive view on the ligand- and structure-based cheminformatics approaches which are best illustrated via GPCR case studies. Furthermore, an appropriate combination of ligand-based knowledge with structure-based ones, i.e., integrated approach, which is emerging as a promising strategy for cheminformatics-based GPCR drug design is also discussed.
Collapse
Affiliation(s)
| | | | | | | | | | - Soosung Kang
- College of Pharmacy and Graduate School of Pharmaceutical Sciences, Ewha Womans University, Seoul, South Korea
| | - Sun Choi
- College of Pharmacy and Graduate School of Pharmaceutical Sciences, Ewha Womans University, Seoul, South Korea
| |
Collapse
|
24
|
Patel M, Chilton ML, Sartini A, Gibson L, Barber C, Covey-Crump L, Przybylak KR, Cronin MTD, Madden JC. Assessment and Reproducibility of Quantitative Structure–Activity Relationship Models by the Nonexpert. J Chem Inf Model 2018; 58:673-682. [DOI: 10.1021/acs.jcim.7b00523] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Affiliation(s)
- Mukesh Patel
- Lhasa Limited, Granary Wharf House, 2 Canal Wharf, Leeds LS11 5PS, England
| | - Martyn L. Chilton
- Lhasa Limited, Granary Wharf House, 2 Canal Wharf, Leeds LS11 5PS, England
| | - Andrea Sartini
- Lhasa Limited, Granary Wharf House, 2 Canal Wharf, Leeds LS11 5PS, England
| | - Laura Gibson
- Lhasa Limited, Granary Wharf House, 2 Canal Wharf, Leeds LS11 5PS, England
| | - Chris Barber
- Lhasa Limited, Granary Wharf House, 2 Canal Wharf, Leeds LS11 5PS, England
| | - Liz Covey-Crump
- Lhasa Limited, Granary Wharf House, 2 Canal Wharf, Leeds LS11 5PS, England
| | - Katarzyna R. Przybylak
- School of Pharmacy and Chemistry, Liverpool John Moores University, Byrom Street, Liverpool L3 3AF, England
| | - Mark T. D. Cronin
- School of Pharmacy and Chemistry, Liverpool John Moores University, Byrom Street, Liverpool L3 3AF, England
| | - Judith C. Madden
- School of Pharmacy and Chemistry, Liverpool John Moores University, Byrom Street, Liverpool L3 3AF, England
| |
Collapse
|
25
|
Williams AJ, Grulke CM, Edwards J, McEachran AD, Mansouri K, Baker NC, Patlewicz G, Shah I, Wambaugh JF, Judson RS, Richard AM. The CompTox Chemistry Dashboard: a community data resource for environmental chemistry. J Cheminform 2017; 9:61. [PMID: 29185060 PMCID: PMC5705535 DOI: 10.1186/s13321-017-0247-6] [Citation(s) in RCA: 554] [Impact Index Per Article: 79.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2017] [Accepted: 11/18/2017] [Indexed: 11/10/2022] Open
Abstract
Despite an abundance of online databases providing access to chemical data, there is increasing demand for high-quality, structure-curated, open data to meet the various needs of the environmental sciences and computational toxicology communities. The U.S. Environmental Protection Agency's (EPA) web-based CompTox Chemistry Dashboard is addressing these needs by integrating diverse types of relevant domain data through a cheminformatics layer, built upon a database of curated substances linked to chemical structures. These data include physicochemical, environmental fate and transport, exposure, usage, in vivo toxicity, and in vitro bioassay data, surfaced through an integration hub with link-outs to additional EPA data and public domain online resources. Batch searching allows for direct chemical identifier (ID) mapping and downloading of multiple data streams in several different formats. This facilitates fast access to available structure, property, toxicity, and bioassay data for collections of chemicals (hundreds to thousands at a time). Advanced search capabilities are available to support, for example, non-targeted analysis and identification of chemicals using mass spectrometry. The contents of the chemistry database, presently containing ~ 760,000 substances, are available as public domain data for download. The chemistry content underpinning the Dashboard has been aggregated over the past 15 years by both manual and auto-curation techniques within EPA's DSSTox project. DSSTox chemical content is subject to strict quality controls to enforce consistency among chemical substance-structure identifiers, as well as list curation review to ensure accurate linkages of DSSTox substances to chemical lists and associated data. The Dashboard, publicly launched in April 2016, has expanded considerably in content and user traffic over the past year. It is continuously evolving with the growth of DSSTox into high-interest or data-rich domains of interest to EPA, such as chemicals on the Toxic Substances Control Act listing, while providing the user community with a flexible and dynamic web-based platform for integration, processing, visualization and delivery of data and resources. The Dashboard provides support for a broad array of research and regulatory programs across the worldwide community of toxicologists and environmental scientists.
Collapse
Affiliation(s)
- Antony J. Williams
- National Center for Computational Toxicology, Office of Research and Development, U.S. Environmental Protection Agency, Research Triangle Park, NC USA
| | - Christopher M. Grulke
- National Center for Computational Toxicology, Office of Research and Development, U.S. Environmental Protection Agency, Research Triangle Park, NC USA
| | - Jeff Edwards
- National Center for Computational Toxicology, Office of Research and Development, U.S. Environmental Protection Agency, Research Triangle Park, NC USA
| | | | - Kamel Mansouri
- National Center for Computational Toxicology, Office of Research and Development, U.S. Environmental Protection Agency, Research Triangle Park, NC USA
- Oak Ridge Institute for Science and Education, Oak Ridge, TN USA
- ScitoVation LLC, Research Triangle Park, NC USA
| | | | - Grace Patlewicz
- National Center for Computational Toxicology, Office of Research and Development, U.S. Environmental Protection Agency, Research Triangle Park, NC USA
| | - Imran Shah
- National Center for Computational Toxicology, Office of Research and Development, U.S. Environmental Protection Agency, Research Triangle Park, NC USA
| | - John F. Wambaugh
- National Center for Computational Toxicology, Office of Research and Development, U.S. Environmental Protection Agency, Research Triangle Park, NC USA
| | - Richard S. Judson
- National Center for Computational Toxicology, Office of Research and Development, U.S. Environmental Protection Agency, Research Triangle Park, NC USA
| | - Ann M. Richard
- National Center for Computational Toxicology, Office of Research and Development, U.S. Environmental Protection Agency, Research Triangle Park, NC USA
| |
Collapse
|
26
|
Minkiewicz P, Iwaniak A, Darewicz M. Annotation of Peptide Structures Using SMILES and Other Chemical Codes-Practical Solutions. Molecules 2017; 22:molecules22122075. [PMID: 29186902 PMCID: PMC6149970 DOI: 10.3390/molecules22122075] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2017] [Revised: 11/15/2017] [Accepted: 11/25/2017] [Indexed: 12/20/2022] Open
Abstract
Contemporary peptide science exploits methods and tools of bioinformatics, and cheminformatics. These approaches use different languages to describe peptide structures—amino acid sequences and chemical codes (especially SMILES), respectively. The latter may be applied, e.g., in comparative studies involving structures and properties of peptides and peptidomimetics. Progress in peptide science “in silico” may be achieved via better communication between biologists and chemists, involving the translation of peptide representation from amino acid sequence into SMILES code. Recent recommendations concerning good practice in chemical information include careful verification of data and their annotation. This publication discusses the generation of SMILES representations of peptides using existing software. Construction of peptide structures containing unnatural and modified amino acids (with special attention paid on glycosylated peptides) is also included. Special attention is paid to the detection and correction of typical errors occurring in SMILES representations of peptides and their correction using molecular editors. Brief recommendations for training of staff working on peptide annotations, are discussed as well.
Collapse
Affiliation(s)
- Piotr Minkiewicz
- Chair of Food Biochemistry, Faculty of Food Science, University of Warmia and Mazury in Olsztyn, Plac Cieszyński 1, 10-726 Olsztyn-Kortowo, Poland.
| | - Anna Iwaniak
- Chair of Food Biochemistry, Faculty of Food Science, University of Warmia and Mazury in Olsztyn, Plac Cieszyński 1, 10-726 Olsztyn-Kortowo, Poland.
| | - Małgorzata Darewicz
- Chair of Food Biochemistry, Faculty of Food Science, University of Warmia and Mazury in Olsztyn, Plac Cieszyński 1, 10-726 Olsztyn-Kortowo, Poland.
| |
Collapse
|
27
|
Liu J, Patlewicz G, Williams AJ, Thomas RS, Shah I. Predicting Organ Toxicity Using in Vitro Bioactivity Data and Chemical Structure. Chem Res Toxicol 2017; 30:2046-2059. [PMID: 28768096 DOI: 10.1021/acs.chemrestox.7b00084] [Citation(s) in RCA: 37] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
Animal testing alone cannot practically evaluate the health hazard posed by tens of thousands of environmental chemicals. Computational approaches making use of high-throughput experimental data may provide more efficient means to predict chemical toxicity. Here, we use a supervised machine learning strategy to systematically investigate the relative importance of study type, machine learning algorithm, and type of descriptor on predicting in vivo repeat-dose toxicity at the organ-level. A total of 985 compounds were represented using chemical structural descriptors, ToxPrint chemotype descriptors, and bioactivity descriptors from ToxCast in vitro high-throughput screening assays. Using ToxRefDB, a total of 35 target organ outcomes were identified that contained at least 100 chemicals (50 positive and 50 negative). Supervised machine learning was performed using Naïve Bayes, k-nearest neighbor, random forest, classification and regression trees, and support vector classification approaches. Model performance was assessed based on F1 scores using 5-fold cross-validation with balanced bootstrap replicates. Fixed effects modeling showed the variance in F1 scores was explained mostly by target organ outcome, followed by descriptor type, machine learning algorithm, and interactions between these three factors. A combination of bioactivity and chemical structure or chemotype descriptors were the most predictive. Model performance improved with more chemicals (up to a maximum of 24%), and these gains were correlated (ρ = 0.92) with the number of chemicals. Overall, the results demonstrate that a combination of bioactivity and chemical descriptors can accurately predict a range of target organ toxicity outcomes in repeat-dose studies, but specific experimental and methodologic improvements may increase predictivity.
Collapse
Affiliation(s)
- Jie Liu
- Department of Information Science, University of Arkansas at Little Rock , Arkansas 72204, United States.,Oak Ridge Institute for Science Education, National Center for Computational Toxicology, Office of Research and Development, U.S. Environmental Protection Agency , Research Triangle Park, Durham, North Carolina 27711, United States
| | - Grace Patlewicz
- National Center for Computational Toxicology, Office of Research and Development, U.S. Environmental Protection Agency , Research Triangle Park, Durham, North Carolina 27711, United States
| | - Antony J Williams
- National Center for Computational Toxicology, Office of Research and Development, U.S. Environmental Protection Agency , Research Triangle Park, Durham, North Carolina 27711, United States
| | - Russell S Thomas
- National Center for Computational Toxicology, Office of Research and Development, U.S. Environmental Protection Agency , Research Triangle Park, Durham, North Carolina 27711, United States
| | - Imran Shah
- National Center for Computational Toxicology, Office of Research and Development, U.S. Environmental Protection Agency , Research Triangle Park, Durham, North Carolina 27711, United States
| |
Collapse
|
28
|
Zhao L, Wang W, Sedykh A, Zhu H. Experimental Errors in QSAR Modeling Sets: What We Can Do and What We Cannot Do. ACS OMEGA 2017; 2:2805-2812. [PMID: 28691113 PMCID: PMC5494643 DOI: 10.1021/acsomega.7b00274] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/08/2017] [Accepted: 04/27/2017] [Indexed: 05/04/2023]
Abstract
Numerous chemical data sets have become available for quantitative structure-activity relationship (QSAR) modeling studies. However, the quality of different data sources may be different based on the nature of experimental protocols. Therefore, potential experimental errors in the modeling sets may lead to the development of poor QSAR models and further affect the predictions of new compounds. In this study, we explored the relationship between the ratio of questionable data in the modeling sets, which was obtained by simulating experimental errors, and the QSAR modeling performance. To this end, we used eight data sets (four continuous endpoints and four categorical endpoints) that have been extensively curated both in-house and by our collaborators to create over 1800 various QSAR models. Each data set was duplicated to create several new modeling sets with different ratios of simulated experimental errors (i.e., randomizing the activities of part of the compounds) in the modeling process. A fivefold cross-validation process was used to evaluate the modeling performance, which deteriorates when the ratio of experimental errors increases. All of the resulting models were also used to predict external sets of new compounds, which were excluded at the beginning of the modeling process. The modeling results showed that the compounds with relatively large prediction errors in cross-validation processes are likely to be those with simulated experimental errors. However, after removing a certain number of compounds with large prediction errors in the cross-validation process, the external predictions of new compounds did not show improvement. Our conclusion is that the QSAR predictions, especially consensus predictions, can identify compounds with potential experimental errors. But removing those compounds by the cross-validation procedure is not a reasonable means to improve model predictivity due to overfitting.
Collapse
Affiliation(s)
- Linlin Zhao
- The
Rutgers Center for Computational and Integrative Biology, Camden, New Jersey 08102, United States
| | - Wenyi Wang
- The
Rutgers Center for Computational and Integrative Biology, Camden, New Jersey 08102, United States
| | - Alexander Sedykh
- Sciome
LLC, Durham, North Carolina 27709, United States
- E-mail: (A.S.)
| | - Hao Zhu
- The
Rutgers Center for Computational and Integrative Biology, Camden, New Jersey 08102, United States
- Department
of Chemistry, Rutgers University, Camden, New Jersey 08102, United States
- E-mail: . Tel: (856) 225-6781 (H.Z.)
| |
Collapse
|
29
|
Krallinger M, Rabal O, Lourenço A, Oyarzabal J, Valencia A. Information Retrieval and Text Mining Technologies for Chemistry. Chem Rev 2017; 117:7673-7761. [PMID: 28475312 DOI: 10.1021/acs.chemrev.6b00851] [Citation(s) in RCA: 111] [Impact Index Per Article: 15.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
Abstract
Efficient access to chemical information contained in scientific literature, patents, technical reports, or the web is a pressing need shared by researchers and patent attorneys from different chemical disciplines. Retrieval of important chemical information in most cases starts with finding relevant documents for a particular chemical compound or family. Targeted retrieval of chemical documents is closely connected to the automatic recognition of chemical entities in the text, which commonly involves the extraction of the entire list of chemicals mentioned in a document, including any associated information. In this Review, we provide a comprehensive and in-depth description of fundamental concepts, technical implementations, and current technologies for meeting these information demands. A strong focus is placed on community challenges addressing systems performance, more particularly CHEMDNER and CHEMDNER patents tasks of BioCreative IV and V, respectively. Considering the growing interest in the construction of automatically annotated chemical knowledge bases that integrate chemical information and biological data, cheminformatics approaches for mapping the extracted chemical names into chemical structures and their subsequent annotation together with text mining applications for linking chemistry with biological information are also presented. Finally, future trends and current challenges are highlighted as a roadmap proposal for research in this emerging field.
Collapse
Affiliation(s)
- Martin Krallinger
- Structural Computational Biology Group, Structural Biology and BioComputing Programme, Spanish National Cancer Research Centre , C/Melchor Fernández Almagro 3, Madrid E-28029, Spain
| | - Obdulia Rabal
- Small Molecule Discovery Platform, Molecular Therapeutics Program, Center for Applied Medical Research (CIMA), University of Navarra , Avenida Pio XII 55, Pamplona E-31008, Spain
| | - Anália Lourenço
- ESEI - Department of Computer Science, University of Vigo , Edificio Politécnico, Campus Universitario As Lagoas s/n, Ourense E-32004, Spain.,Centro de Investigaciones Biomédicas (Centro Singular de Investigación de Galicia) , Campus Universitario Lagoas-Marcosende, Vigo E-36310, Spain.,CEB-Centre of Biological Engineering, University of Minho , Campus de Gualtar, Braga 4710-057, Portugal
| | - Julen Oyarzabal
- Small Molecule Discovery Platform, Molecular Therapeutics Program, Center for Applied Medical Research (CIMA), University of Navarra , Avenida Pio XII 55, Pamplona E-31008, Spain
| | - Alfonso Valencia
- Life Science Department, Barcelona Supercomputing Centre (BSC-CNS) , C/Jordi Girona, 29-31, Barcelona E-08034, Spain.,Joint BSC-IRB-CRG Program in Computational Biology, Parc Científic de Barcelona , C/ Baldiri Reixac 10, Barcelona E-08028, Spain.,Institució Catalana de Recerca i Estudis Avançats (ICREA) , Passeig de Lluís Companys 23, Barcelona E-08010, Spain
| |
Collapse
|
30
|
Richard AM, Judson RS, Houck KA, Grulke CM, Volarath P, Thillainadarajah I, Yang C, Rathman J, Martin MT, Wambaugh JF, Knudsen TB, Kancherla J, Mansouri K, Patlewicz G, Williams AJ, Little SB, Crofton KM, Thomas RS. ToxCast Chemical Landscape: Paving the Road to 21st Century Toxicology. Chem Res Toxicol 2016; 29:1225-51. [PMID: 27367298 DOI: 10.1021/acs.chemrestox.6b00135] [Citation(s) in RCA: 386] [Impact Index Per Article: 48.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
Abstract
The U.S. Environmental Protection Agency's (EPA) ToxCast program is testing a large library of Agency-relevant chemicals using in vitro high-throughput screening (HTS) approaches to support the development of improved toxicity prediction models. Launched in 2007, Phase I of the program screened 310 chemicals, mostly pesticides, across hundreds of ToxCast assay end points. In Phase II, the ToxCast library was expanded to 1878 chemicals, culminating in the public release of screening data at the end of 2013. Subsequent expansion in Phase III has resulted in more than 3800 chemicals actively undergoing ToxCast screening, 96% of which are also being screened in the multi-Agency Tox21 project. The chemical library unpinning these efforts plays a central role in defining the scope and potential application of ToxCast HTS results. The history of the phased construction of EPA's ToxCast library is reviewed, followed by a survey of the library contents from several different vantage points. CAS Registry Numbers are used to assess ToxCast library coverage of important toxicity, regulatory, and exposure inventories. Structure-based representations of ToxCast chemicals are then used to compute physicochemical properties, substructural features, and structural alerts for toxicity and biotransformation. Cheminformatics approaches using these varied representations are applied to defining the boundaries of HTS testability, evaluating chemical diversity, and comparing the ToxCast library to potential target application inventories, such as used in EPA's Endocrine Disruption Screening Program (EDSP). Through several examples, the ToxCast chemical library is demonstrated to provide comprehensive coverage of the knowledge domains and target inventories of potential interest to EPA. Furthermore, the varied representations and approaches presented here define local chemistry domains potentially worthy of further investigation (e.g., not currently covered in the testing library or defined by toxicity "alerts") to strategically support data mining and predictive toxicology modeling moving forward.
Collapse
Affiliation(s)
- Ann M Richard
- National Center for Computational Toxicology, Office of Research & Development, U.S. Environmental Protection Agency , Mail Code B205-01, Research Triangle Park, Durham, North Carolina 27711, United States
| | - Richard S Judson
- National Center for Computational Toxicology, Office of Research & Development, U.S. Environmental Protection Agency , Mail Code B205-01, Research Triangle Park, Durham, North Carolina 27711, United States
| | - Keith A Houck
- National Center for Computational Toxicology, Office of Research & Development, U.S. Environmental Protection Agency , Mail Code B205-01, Research Triangle Park, Durham, North Carolina 27711, United States
| | - Christopher M Grulke
- National Center for Computational Toxicology, Office of Research & Development, U.S. Environmental Protection Agency , Mail Code B205-01, Research Triangle Park, Durham, North Carolina 27711, United States
| | - Patra Volarath
- Center for Food Safety and Nutrition, U.S. Food and Drug Administration , 5100 Paint Branch Parkway, College Park, Maryland 20740, United States
| | - Inthirany Thillainadarajah
- Senior Environmental Employment Program, U.S. Environmental Protection Agency , Research Triangle Park, Durham, North Carolina 27711, United States
| | - Chihae Yang
- Molecular Networks GmbH , Henkestraße 91, 91052 Erlangen, Germany.,Altamira, LLC , 1455 Candlewood Drive, Columbus, Ohio 43235, United States
| | - James Rathman
- Altamira, LLC , 1455 Candlewood Drive, Columbus, Ohio 43235, United States.,Department of Chemical and Biomolecular Engineering, The Ohio State University , 151 W. Woodruff Avenue, Columbus, Ohio 43210, United States
| | - Matthew T Martin
- National Center for Computational Toxicology, Office of Research & Development, U.S. Environmental Protection Agency , Mail Code B205-01, Research Triangle Park, Durham, North Carolina 27711, United States
| | - John F Wambaugh
- National Center for Computational Toxicology, Office of Research & Development, U.S. Environmental Protection Agency , Mail Code B205-01, Research Triangle Park, Durham, North Carolina 27711, United States
| | - Thomas B Knudsen
- National Center for Computational Toxicology, Office of Research & Development, U.S. Environmental Protection Agency , Mail Code B205-01, Research Triangle Park, Durham, North Carolina 27711, United States
| | - Jayaram Kancherla
- ORISE Fellow, U.S. Environmental Protection Agency, Research Triangle Park, Durham, North Carolina 27711, United States
| | - Kamel Mansouri
- ORISE Fellow, U.S. Environmental Protection Agency, Research Triangle Park, Durham, North Carolina 27711, United States
| | - Grace Patlewicz
- National Center for Computational Toxicology, Office of Research & Development, U.S. Environmental Protection Agency , Mail Code B205-01, Research Triangle Park, Durham, North Carolina 27711, United States
| | - Antony J Williams
- National Center for Computational Toxicology, Office of Research & Development, U.S. Environmental Protection Agency , Mail Code B205-01, Research Triangle Park, Durham, North Carolina 27711, United States
| | - Stephen B Little
- National Center for Computational Toxicology, Office of Research & Development, U.S. Environmental Protection Agency , Mail Code B205-01, Research Triangle Park, Durham, North Carolina 27711, United States
| | - Kevin M Crofton
- National Center for Computational Toxicology, Office of Research & Development, U.S. Environmental Protection Agency , Mail Code B205-01, Research Triangle Park, Durham, North Carolina 27711, United States
| | - Russell S Thomas
- National Center for Computational Toxicology, Office of Research & Development, U.S. Environmental Protection Agency , Mail Code B205-01, Research Triangle Park, Durham, North Carolina 27711, United States
| |
Collapse
|
31
|
Neves BJ, Muratov E, Machado RB, Andrade CH, Cravo PVL. Modern approaches to accelerate discovery of new antischistosomal drugs. Expert Opin Drug Discov 2016; 11:557-67. [PMID: 27073973 PMCID: PMC6534417 DOI: 10.1080/17460441.2016.1178230] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
INTRODUCTION The almost exclusive use of only praziquantel for the treatment of schistosomiasis has raised concerns about the possible emergence of drug-resistant schistosomes. Consequently, there is an urgent need for new antischistosomal drugs. The identification of leads and the generation of high quality data are crucial steps in the early stages of schistosome drug discovery projects. AREAS COVERED Herein, the authors focus on the current developments in antischistosomal lead discovery, specifically referring to the use of automated in vitro target-based and whole-organism screens and virtual screening of chemical databases. They highlight the strengths and pitfalls of each of the above-mentioned approaches, and suggest possible roadmaps towards the integration of several strategies, which may contribute for optimizing research outputs and led to more successful and cost-effective drug discovery endeavors. EXPERT OPINION Increasing partnerships and access to funding for drug discovery have strengthened the battle against schistosomiasis in recent years. However, the authors believe this battle also includes innovative strategies to overcome scientific challenges. In this context, significant advances of in vitro screening as well as computer-aided drug discovery have contributed to increase the success rate and reduce the costs of drug discovery campaigns. Although some of these approaches were already used in current antischistosomal lead discovery pipelines, the integration of these strategies in a solid workflow should allow the production of new treatments for schistosomiasis in the near future.
Collapse
Affiliation(s)
- Bruno Junior Neves
- a LabMol - Laboratory for Molecular Modeling and Drug Design, Faculdade de Farmácia , Universidade Federal de Goiás , Goiânia , Brazil
| | - Eugene Muratov
- b Laboratory for Molecular Modeling, Eshelman School of Pharmacy , University of North Carolina , Chapel Hill , NC , USA
| | - Renato Beilner Machado
- c GenoBio - Laboratory of Genomics and Biotechnology, Instituto de Patologia Tropical e Saúde Pública , Universidade Federal de Goiás , Goiânia , Brazil
| | - Carolina Horta Andrade
- a LabMol - Laboratory for Molecular Modeling and Drug Design, Faculdade de Farmácia , Universidade Federal de Goiás , Goiânia , Brazil
| | - Pedro Vitor Lemos Cravo
- c GenoBio - Laboratory of Genomics and Biotechnology, Instituto de Patologia Tropical e Saúde Pública , Universidade Federal de Goiás , Goiânia , Brazil
- d Instituto de Higiene e Medicina Tropical , Universidade Nova de Lisboa , Lisbon , Portugal
| |
Collapse
|
32
|
Oprea TI, Overington JP. Computational and Practical Aspects of Drug Repositioning. Assay Drug Dev Technol 2016; 13:299-306. [PMID: 26241209 DOI: 10.1089/adt.2015.29011.tiodrrr] [Citation(s) in RCA: 64] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023] Open
Abstract
The concept of the hypothesis-driven or observational-based expansion of the therapeutic application of drugs is very seductive. This is due to a number of factors, such as lower cost of development, higher probability of success, near-term clinical potential, patient and societal benefit, and also the ability to apply the approach to rare, orphan, and underresearched diseases. Another highly attractive aspect is that the "barrier to entry" is low, at least in comparison to a full drug discovery operation. The availability of high-performance computing, and databases of various forms have also enhanced the ability to pose reasonable and testable hypotheses for drug repurposing, rescue, and repositioning. In this article we discuss several factors that are currently underdeveloped, or could benefit from clearer definition in articles presenting such work. We propose a classification scheme-drug repositioning evidence level (DREL)-for all drug repositioning projects, according to the level of scientific evidence. DREL ranges from zero, which refers to predictions that lack any experimental support, to four, which refers to drugs approved for the new indication. We also present a set of simple concepts that can allow rapid and effective filtering of hypotheses, leading to a focus on those that are most likely to lead to practical safe applications of an existing drug. Some promising repurposing leads for malaria (DREL-1) and amoebic dysentery (DREL-2) are discussed.
Collapse
Affiliation(s)
- Tudor I Oprea
- 1 Translational Informatics Division, Department of Internal Medicine, University of New Mexico School of Medicine , Albuquerque, New Mexico
| | - John P Overington
- 2 European Molecular Biology Laboratory-European Bioinformatics Institute , Wellcome Trust Genome Campus, Hinxton, United Kingdom
| |
Collapse
|
33
|
Ball N, Cronin MTD, Shen J, Blackburn K, Booth ED, Bouhifd M, Donley E, Egnash L, Hastings C, Juberg DR, Kleensang A, Kleinstreuer N, Kroese ED, Lee AC, Luechtefeld T, Maertens A, Marty S, Naciff JM, Palmer J, Pamies D, Penman M, Richarz AN, Russo DP, Stuard SB, Patlewicz G, van Ravenzwaay B, Wu S, Zhu H, Hartung T. Toward Good Read-Across Practice (GRAP) guidance. ALTEX-ALTERNATIVES TO ANIMAL EXPERIMENTATION 2016; 33:149-66. [PMID: 26863606 PMCID: PMC5581000 DOI: 10.14573/altex.1601251] [Citation(s) in RCA: 111] [Impact Index Per Article: 13.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Received: 01/21/2016] [Accepted: 02/11/2016] [Indexed: 12/04/2022]
Abstract
Grouping of substances and utilizing read-across of data within those groups represents an important data gap filling technique for chemical safety assessments. Categories/analogue groups are typically developed based on structural similarity and, increasingly often, also on mechanistic (biological) similarity. While read-across can play a key role in complying with legislation such as the European REACH regulation, the lack of consensus regarding the extent and type of evidence necessary to support it often hampers its successful application and acceptance by regulatory authorities. Despite a potentially broad user community, expertise is still concentrated across a handful of organizations and individuals. In order to facilitate the effective use of read-across, this document presents the state of the art, summarizes insights learned from reviewing ECHA published decisions regarding the relative successes/pitfalls surrounding read-across under REACH, and compiles the relevant activities and guidance documents. Special emphasis is given to the available existing tools and approaches, an analysis of ECHA's published final decisions associated with all levels of compliance checks and testing proposals, the consideration and expression of uncertainty, the use of biological support data, and the impact of the ECHA Read-Across Assessment Framework (RAAF) published in 2015.
Collapse
Affiliation(s)
| | - Mark T D Cronin
- School of Pharmacy and Biomolecular Sciences, Liverpool John Moores University, Liverpool, UK
| | - Jie Shen
- Research Institute for Fragrance Materials, Inc. Woodcliff Lake, NJ, USA
| | | | - Ewan D Booth
- Syngenta Ltd, Jealott's Hill International Research Centre, Bracknell, Berkshire, UK
| | - Mounir Bouhifd
- Johns Hopkins Bloomberg School of Public Health, Center for Alternatives to Animal Testing (CAAT), Baltimore, MD, USA
| | | | - Laura Egnash
- Stemina Biomarker Discovery Inc., Madison, WI, USA
| | - Charles Hastings
- BASF SE, Ludwigshafen am Rhein, Germany, and Research Triangle Park, NC, USA
| | | | - Andre Kleensang
- Johns Hopkins Bloomberg School of Public Health, Center for Alternatives to Animal Testing (CAAT), Baltimore, MD, USA
| | - Nicole Kleinstreuer
- National Toxicology Program Interagency Center for the Evaluation of Alternative Toxicological Methods, National Institute of Environmental Health Sciences, Research Triangle Park, NC, USA
| | - E Dinant Kroese
- Risk Analysis for Products in Development, TNO Zeist, The Netherlands
| | - Adam C Lee
- DuPont Haskell Global Centers for Health and Environmental Sciences, Newark, DE, USA
| | - Thomas Luechtefeld
- Johns Hopkins Bloomberg School of Public Health, Center for Alternatives to Animal Testing (CAAT), Baltimore, MD, USA
| | - Alexandra Maertens
- Johns Hopkins Bloomberg School of Public Health, Center for Alternatives to Animal Testing (CAAT), Baltimore, MD, USA
| | - Sue Marty
- The Dow Chemical Company, Midland, MI, USA
| | | | | | - David Pamies
- Johns Hopkins Bloomberg School of Public Health, Center for Alternatives to Animal Testing (CAAT), Baltimore, MD, USA
| | | | - Andrea-Nicole Richarz
- School of Pharmacy and Biomolecular Sciences, Liverpool John Moores University, Liverpool, UK
| | - Daniel P Russo
- Department of Chemistry and Center for Computational and Integrative Biology, Rutgers University, Camden, NJ, USA
| | | | - Grace Patlewicz
- US EPA/ORD, National Center for Computational Toxicology, Research Triangle Park, NC, USA
| | | | - Shengde Wu
- The Procter and Gamble Co., Cincinatti, OH, USA
| | - Hao Zhu
- Department of Chemistry and Center for Computational and Integrative Biology, Rutgers University, Camden, NJ, USA
| | - Thomas Hartung
- Johns Hopkins Bloomberg School of Public Health, Center for Alternatives to Animal Testing (CAAT), Baltimore, MD, USA.,University of Konstanz, CAAT-Europe, Konstanz, Germany
| |
Collapse
|
34
|
Tales from the war on error: the art and science of curating QSAR data. J Comput Aided Mol Des 2015; 29:897-910. [PMID: 26290258 DOI: 10.1007/s10822-015-9865-0] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2015] [Accepted: 08/07/2015] [Indexed: 10/23/2022]
Abstract
Curating the data underlying quantitative structure-activity relationship models is a never-ending struggle. Some curation can now be automated but much cannot, especially where data as complex as those pertaining to molecular absorption, distribution, metabolism, excretion, and toxicity are concerned (vide infra). The authors discuss some particularly challenging problem areas in terms of specific examples involving experimental context, incompleteness of data, confusion of units, problematic nomenclature, tautomerism, and misapplication of automated structure recognition tools.
Collapse
|
35
|
Abstract
The emergence of a number of publicly available bioactivity databases, such as ChEMBL, PubChem BioAssay and BindingDB, has raised awareness about the topics of data curation, quality and integrity. Here we provide an overview and discussion of the current and future approaches to activity, assay and target data curation of the ChEMBL database. This curation process involves several manual and automated steps and aims to: (1) maximise data accessibility and comparability; (2) improve data integrity and flag outliers, ambiguities and potential errors; and (3) add further curated annotations and mappings thus increasing the usefulness and accuracy of the ChEMBL data for all users and modellers in particular. Issues related to activity, assay and target data curation and integrity along with their potential impact for users of the data are discussed, alongside robust selection and filter strategies in order to avoid or minimise these, depending on the desired application.
Collapse
|
36
|
Ai N, Fan X, Ekins S. In silico methods for predicting drug-drug interactions with cytochrome P-450s, transporters and beyond. Adv Drug Deliv Rev 2015; 86:46-60. [PMID: 25796619 DOI: 10.1016/j.addr.2015.03.006] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2014] [Revised: 01/05/2015] [Accepted: 03/11/2015] [Indexed: 12/13/2022]
Abstract
Drug-drug interactions (DDIs) are associated with severe adverse effects that may lead to the patient requiring alternative therapeutics and could ultimately lead to drug withdrawal from the market if they are severe. To prevent the occurrence of DDI in the clinic, experimental systems to evaluate drug interaction have been integrated into the various stages of the drug discovery and development process. A large body of knowledge about DDI has also accumulated through these studies and pharmacovigillence systems. Much of this work to date has focused on the drug metabolizing enzymes such as cytochrome P-450s as well as drug transporters, ion channels and occasionally other proteins. This combined knowledge provides a foundation for a hypothesis-driven in silico approach, using either cheminformatics or physiologically based pharmacokinetics (PK) modeling methods to assess DDI potential. Here we review recent advances in these approaches with emphasis on hypothesis-driven mechanistic models for important protein targets involved in PK-based DDI. Recent efforts with other informatics approaches to detect DDI are highlighted. Besides DDI, we also briefly introduce drug interactions with other substances, such as Traditional Chinese Medicines to illustrate how in silico modeling can be useful in this domain. We also summarize valuable data sources and web-based tools that are available for DDI prediction. We finally explore the challenges we see faced by in silico approaches for predicting DDI and propose future directions to make these computational models more reliable, accurate, and publically accessible.
Collapse
Affiliation(s)
- Ni Ai
- Pharmaceutical Informatics Institute, College of Pharmaceutical Sciences, Zhejiang University, 866 Yuhangtang Road, Hangzhou, Zhejiang 310058, PR China
| | - Xiaohui Fan
- Pharmaceutical Informatics Institute, College of Pharmaceutical Sciences, Zhejiang University, 866 Yuhangtang Road, Hangzhou, Zhejiang 310058, PR China.
| | - Sean Ekins
- Collaborations in Chemistry, 5616 Hilltop Needmore Road, Fuquay-Varina, NC 27526, USA.
| |
Collapse
|
37
|
Karapetyan K, Batchelor C, Sharpe D, Tkachenko V, Williams AJ. The Chemical Validation and Standardization Platform (CVSP): large-scale automated validation of chemical structure datasets. J Cheminform 2015; 7:30. [PMID: 26155308 PMCID: PMC4494041 DOI: 10.1186/s13321-015-0072-8] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2014] [Accepted: 04/28/2015] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND There are presently hundreds of online databases hosting millions of chemical compounds and associated data. As a result of the number of cheminformatics software tools that can be used to produce the data, subtle differences between the various cheminformatics platforms, as well as the naivety of the software users, there are a myriad of issues that can exist with chemical structure representations online. In order to help facilitate validation and standardization of chemical structure datasets from various sources we have delivered a freely available internet-based platform to the community for the processing of chemical compound datasets. RESULTS The chemical validation and standardization platform (CVSP) both validates and standardizes chemical structure representations according to sets of systematic rules. The chemical validation algorithms detect issues with submitted molecular representations using pre-defined or user-defined dictionary-based molecular patterns that are chemically suspicious or potentially requiring manual review. Each identified issue is assigned one of three levels of severity - Information, Warning, and Error - in order to conveniently inform the user of the need to browse and review subsets of their data. The validation process includes validation of atoms and bonds (e.g., making aware of query atoms and bonds), valences, and stereo. The standard form of submission of collections of data, the SDF file, allows the user to map the data fields to predefined CVSP fields for the purpose of cross-validating associated SMILES and InChIs with the connection tables contained within the SDF file. This platform has been applied to the analysis of a large number of data sets prepared for deposition to our ChemSpider database and in preparation of data for the Open PHACTS project. In this work we review the results of the automated validation of the DrugBank dataset, a popular drug and drug target database utilized by the community, and ChEMBL 17 data set. CVSP web site is located at http://cvsp.chemspider.com/. CONCLUSION A platform for the validation and standardization of chemical structure representations of various formats has been developed and made available to the community to assist and encourage the processing of chemical structure files to produce more homogeneous compound representations for exchange and interchange between online databases. While the CVSP platform is designed with flexibility inherent to the rules that can be used for processing the data we have produced a recommended rule set based on our own experiences with the large data sets such as DrugBank, ChEMBL, and data sets from ChemSpider.
Collapse
Affiliation(s)
- Karen Karapetyan
- />Royal Society of Chemistry, US Office, 904 Tamaras Circle, Wake Forest, NC 27587 USA
| | - Colin Batchelor
- />Thomas Graham House, Science Park, 290 Milton Road, Cambridge, UK
| | - David Sharpe
- />Thomas Graham House, Science Park, 290 Milton Road, Cambridge, UK
| | - Valery Tkachenko
- />Royal Society of Chemistry, US Office, 904 Tamaras Circle, Wake Forest, NC 27587 USA
| | - Antony J Williams
- />Royal Society of Chemistry, US Office, 904 Tamaras Circle, Wake Forest, NC 27587 USA
- />Environmental Protection Agency, Research Triangle Park, NC USA
| |
Collapse
|
38
|
Clark AM, Williams AJ, Ekins S. Machines first, humans second: on the importance of algorithmic interpretation of open chemistry data. J Cheminform 2015; 7:9. [PMID: 25798198 PMCID: PMC4369291 DOI: 10.1186/s13321-015-0057-7] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2014] [Accepted: 02/23/2015] [Indexed: 11/12/2022] Open
Abstract
The current rise in the use of open lab notebook techniques means that there are an increasing number of scientists who make chemical information freely and openly available to the entire community as a series of micropublications that are released shortly after the conclusion of each experiment. We propose that this trend be accompanied by a thorough examination of data sharing priorities. We argue that the most significant immediate benefactor of open data is in fact chemical algorithms, which are capable of absorbing vast quantities of data, and using it to present concise insights to working chemists, on a scale that could not be achieved by traditional publication methods. Making this goal practically achievable will require a paradigm shift in the way individual scientists translate their data into digital form, since most contemporary methods of data entry are designed for presentation to humans rather than consumption by machine learning algorithms. We discuss some of the complex issues involved in fixing current methods, as well as some of the immediate benefits that can be gained when open data is published correctly using unambiguous machine readable formats. Lab notebook entries must target both visualisation by scientists and use by machine learning algorithms ![]()
Collapse
Affiliation(s)
- Alex M Clark
- Molecular Materials Informatics, 1900 St. Jacques #302, Montreal, H3J 2S1, QC Canada
| | - Antony J Williams
- Royal Society of Chemistry, 904 Tamaras Circle, Wake Forest, NC 27587 USA
| | - Sean Ekins
- Collaborations in Chemistry, 5616 Hilltop Needmore Road, Fuquay-Varina, NC 27526 USA ; Collaborative Drug Discovery, 1633 Bayshore Highway, Suite 342, Burlingame, CA 94010 USA
| |
Collapse
|
39
|
Oprea TI, Overington JP. Computational and Practical Aspects of Drug Repositioning. ACTA ACUST UNITED AC 2015. [DOI: 10.1089/drrr.2014.0009] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
|
40
|
Nantasenamat C, Prachayasittikul V. Maximizing computational tools for successful drug discovery. Expert Opin Drug Discov 2015; 10:321-9. [PMID: 25693813 DOI: 10.1517/17460441.2015.1016497] [Citation(s) in RCA: 42] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023]
Abstract
Drug discovery is an iterative cycle of identifying promising hits followed by lead optimization via bioisosteric replacements. In the search for compounds affording good bioactivity, equal importance should also be placed on achieving those with favorable pharmacokinetic properties. Thus, the balance and realization of both key properties is an intricate problem that requires great caution. In this editorial, the authors explore the available computational tools in the context of the extant of big data that has borne out via advents of the Omics revolution. As such, the selection of appropriate computational tools for analyzing the vast number of chemical libraries, target proteins and interactomes is the first step toward maximizing the chance for success. However, in order to realize this, it is also necessary to have a solid foundation on the big concepts of drug discovery as well as knowing which tools are available in order to give drug discovery scientists the best opportunity.
Collapse
Affiliation(s)
- Chanin Nantasenamat
- Mahidol University, Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology , 10700 Bangkok , Thailand
| | | |
Collapse
|
41
|
Lipinski CA, Litterman NK, Southan C, Williams AJ, Clark AM, Ekins S. Parallel worlds of public and commercial bioactive chemistry data. J Med Chem 2014; 58:2068-76. [PMID: 25415348 PMCID: PMC4360371 DOI: 10.1021/jm5011308] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
Abstract
![]()
The
availability of structures and linked bioactivity data in databases
is powerfully enabling for drug discovery and chemical biology. However,
we now review some confounding issues with the divergent expansions
of public and commercial sources of chemical structures. These are
associated with not only expanding patent extraction but also increasingly
large vendor collections amassed via different selection criteria
between SciFinder from Chemical Abstracts Service (CAS) and major
public sources such as PubChem, ChemSpider, UniChem, and others. These
increasingly massive collections may include both real and virtual
compounds, as well as so-called prophetic compounds from patents.
We address a range of issues raised by the challenges faced resolving
the NIH probe compounds. In addition we highlight the confounding
of prior-art searching by virtual compounds that could impact the
composition of matter patentability of a new medicinal chemistry lead.
Finally, we propose some potential solutions.
Collapse
Affiliation(s)
- Christopher A Lipinski
- Christopher A. Lipinski, Ph.D., LLC , 10 Connshire Drive, Waterford, Connecticut 06385-4122, United States
| | | | | | | | | | | |
Collapse
|
42
|
Extending in silico mechanism-of-action analysis by annotating targets with pathways: application to cellular cytotoxicity readouts. Future Med Chem 2014; 6:2029-56. [DOI: 10.4155/fmc.14.137] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022] Open
Abstract
Background: An in silico mechanism-of-action analysis protocol was developed, comprising molecule bioactivity profiling, annotation of predicted targets with pathways and calculation of enrichment factors to highlight targets and pathways more likely to be implicated in the studied phenotype. Results: The method was applied to a cytotoxicity phenotypic endpoint, with enriched targets/pathways found to be statistically significant when compared with 100 random datasets. Application on a smaller apoptotic set (10 molecules) did not allowed to obtain statistically relevant results, suggesting that the protocol requires modification such as analysis of the most frequently predicted targets/annotated pathways. Conclusion: Pathway annotations improved the mechanism-of-action information gained by target prediction alone, allowing a better interpretation of the predictions and providing better mapping of targets onto pathways.
Collapse
|
43
|
Ekins S, Clark AM, Swamidass SJ, Litterman N, Williams AJ. Bigger data, collaborative tools and the future of predictive drug discovery. J Comput Aided Mol Des 2014; 28:997-1008. [PMID: 24943138 PMCID: PMC4198464 DOI: 10.1007/s10822-014-9762-y] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2014] [Accepted: 06/09/2014] [Indexed: 12/31/2022]
Abstract
Over the past decade we have seen a growth in the provision of chemistry data and cheminformatics tools as either free websites or software as a service commercial offerings. These have transformed how we find molecule-related data and use such tools in our research. There have also been efforts to improve collaboration between researchers either openly or through secure transactions using commercial tools. A major challenge in the future will be how such databases and software approaches handle larger amounts of data as it accumulates from high throughput screening and enables the user to draw insights, enable predictions and move projects forward. We now discuss how information from some drug discovery datasets can be made more accessible and how privacy of data should not overwhelm the desire to share it at an appropriate time with collaborators. We also discuss additional software tools that could be made available and provide our thoughts on the future of predictive drug discovery in this age of big data. We use some examples from our own research on neglected diseases, collaborations, mobile apps and algorithm development to illustrate these ideas.
Collapse
Affiliation(s)
- Sean Ekins
- Collaborations in Chemistry, 5616 Hilltop Needmore Road, Fuquay-Varina, NC, 27526, USA,
| | | | | | | | | |
Collapse
|
44
|
Ekins S, Freundlich JS, Reynolds RC. Are bigger data sets better for machine learning? Fusing single-point and dual-event dose response data for Mycobacterium tuberculosis. J Chem Inf Model 2014; 54:2157-65. [PMID: 24968215 DOI: 10.1021/ci500264r] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
Tuberculosis is a major, neglected disease for which the quest to find new treatments continues. There is an abundance of data from large phenotypic screens in the public domain against Mycobacterium tuberculosis (Mtb). Since machine learning methods can learn from past data, we were interested in addressing whether more data builds better models. We now describe using Bayesian machine learning to assess whether we can improve our models by combining the large quantities of single-point data with the much smaller (higher quality) dual-event data sets, which use both dose-response data for both whole-cell antitubercular activity and Vero cell cytotoxicity. We have evaluated 12 models ranging from different single-point, dual-event dose-response, single-point and dual-event dose-response as well as combined data sets for three distinct data sets from the same laboratory. We used a fourth data set of active and inactive compounds from the same group as well as a smaller set of 177 active compounds from GlaxoSmithKline as test sets. Our data suggest combining single-point with dual-event dose-response data does not diminish the internal or external predictive ability of the models based on the receiver operator curve (ROC) for these models (internal ROC range 0.83-0.91, external ROC range 0.62-0.83) compared to the orders of magnitude smaller dual-event models (internal ROC range 0.6-0.83 and external ROC 0.54-0.83). In conclusion, models developed with 1200-5000 compounds appear to be as predictive as those generated with 25 000-350 000 molecules. Our results have implications for justifying further high-throughput screening versus focused testing based on model predictions.
Collapse
Affiliation(s)
- Sean Ekins
- Collaborations in Chemistry , 5616 Hilltop Needmore Road, Fuquay-Varina, North Carolina 27526, United States
| | | | | |
Collapse
|
45
|
Cherkasov A, Muratov EN, Fourches D, Varnek A, Baskin II, Cronin M, Dearden J, Gramatica P, Martin YC, Todeschini R, Consonni V, Kuz'min VE, Cramer R, Benigni R, Yang C, Rathman J, Terfloth L, Gasteiger J, Richard A, Tropsha A. QSAR modeling: where have you been? Where are you going to? J Med Chem 2014; 57:4977-5010. [PMID: 24351051 PMCID: PMC4074254 DOI: 10.1021/jm4004285] [Citation(s) in RCA: 1040] [Impact Index Per Article: 104.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]
Abstract
Quantitative structure-activity relationship modeling is one of the major computational tools employed in medicinal chemistry. However, throughout its entire history it has drawn both praise and criticism concerning its reliability, limitations, successes, and failures. In this paper, we discuss (i) the development and evolution of QSAR; (ii) the current trends, unsolved problems, and pressing challenges; and (iii) several novel and emerging applications of QSAR modeling. Throughout this discussion, we provide guidelines for QSAR development, validation, and application, which are summarized in best practices for building rigorously validated and externally predictive QSAR models. We hope that this Perspective will help communications between computational and experimental chemists toward collaborative development and use of QSAR models. We also believe that the guidelines presented here will help journal editors and reviewers apply more stringent scientific standards to manuscripts reporting new QSAR studies, as well as encourage the use of high quality, validated QSARs for regulatory decision making.
Collapse
Affiliation(s)
- Artem Cherkasov
- Vancouver Prostate Centre, University of British Columbia, Vancouver, BC, V6H3Z6, Canada
| | - Eugene N. Muratov
- Laboratory for Molecular Modeling, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, NC, 27599, USA
- Department of Molecular Structure and Cheminformatics, A.V. Bogatsky Physical-Chemical Institute National Academy of Sciences of Ukraine, Odessa, 65080, Ukraine
| | - Denis Fourches
- Laboratory for Molecular Modeling, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, NC, 27599, USA
| | - Alexandre Varnek
- Department of Chemistry, L. Pasteur University of Strasbourg, Strasbourg, 67000, France
| | - Igor I. Baskin
- Department of Physics, Lomonosov Moscow State University, Moscow, 119991, Russia
| | - Mark Cronin
- School of Pharmacy and Biomolecular Sciences, Liverpool John Moores University, Liverpool L33AF, UK
| | - John Dearden
- School of Pharmacy and Biomolecular Sciences, Liverpool John Moores University, Liverpool L33AF, UK
| | - Paola Gramatica
- Department of Structural and Functional Biology, University of Insubria, Varese, 21100, Italy
| | | | - Roberto Todeschini
- Milano Chemometrics and QSAR Research Group, University of Milano-Bicocca, Milan, 20126, Italy
| | - Viviana Consonni
- Milano Chemometrics and QSAR Research Group, University of Milano-Bicocca, Milan, 20126, Italy
| | - Victor E. Kuz'min
- Department of Molecular Structure and Cheminformatics, A.V. Bogatsky Physical-Chemical Institute National Academy of Sciences of Ukraine, Odessa, 65080, Ukraine
| | | | - Romualdo Benigni
- Environment and Health Department, Istituto Superiore di Sanita’, Rome, 00161, Italy
| | | | - James Rathman
- Altamira LLC, Columbus OH 43235, USA
- Department of Chemical and Biomolecular Engineering, the Ohio State University, Columbus, OH 43215, USA
| | | | | | - Ann Richard
- National Center for Computational Toxicology, U.S. Environmental Protection Agency, Research Triangle Park, NC, 27519, USA
| | - Alexander Tropsha
- Laboratory for Molecular Modeling, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, NC, 27599, USA
| |
Collapse
|
46
|
Warr WA. Data sharing matters. J Comput Aided Mol Des 2014; 28:1-4. [PMID: 24435495 DOI: 10.1007/s10822-013-9705-z] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2013] [Accepted: 12/26/2013] [Indexed: 10/25/2022]
|
47
|
Scientific Lenses to Support Multiple Views over Linked Chemistry Data. THE SEMANTIC WEB – ISWC 2014 2014. [DOI: 10.1007/978-3-319-11964-9_7] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
|
48
|
Lagunin AA, Goel RK, Gawande DY, Pahwa P, Gloriozova TA, Dmitriev AV, Ivanov SM, Rudik AV, Konova VI, Pogodin PV, Druzhilovsky DS, Poroikov VV. Chemo- and bioinformatics resources for in silico drug discovery from medicinal plants beyond their traditional use: a critical review. Nat Prod Rep 2014; 31:1585-611. [DOI: 10.1039/c4np00068d] [Citation(s) in RCA: 87] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]
Abstract
An overview of databases andin silicotools for discovery of the hidden therapeutic potential of medicinal plants.
Collapse
Affiliation(s)
- Alexey A. Lagunin
- Orekhovich Institute of Biomedical Chemistry of Rus. Acad. Med. Sci
- Moscow, Russia
- Russian National Research Medical University
- Medico-Biologic Faculty
- Moscow, Russia
| | - Rajesh K. Goel
- Department of Pharmaceutical Sciences and Drug Research
- Punjabi University
- Patiala-147002, India
| | - Dinesh Y. Gawande
- Department of Pharmaceutical Sciences and Drug Research
- Punjabi University
- Patiala-147002, India
| | - Priynka Pahwa
- Department of Pharmaceutical Sciences and Drug Research
- Punjabi University
- Patiala-147002, India
| | | | | | - Sergey M. Ivanov
- Orekhovich Institute of Biomedical Chemistry of Rus. Acad. Med. Sci
- Moscow, Russia
| | - Anastassia V. Rudik
- Orekhovich Institute of Biomedical Chemistry of Rus. Acad. Med. Sci
- Moscow, Russia
| | - Varvara I. Konova
- Orekhovich Institute of Biomedical Chemistry of Rus. Acad. Med. Sci
- Moscow, Russia
| | - Pavel V. Pogodin
- Orekhovich Institute of Biomedical Chemistry of Rus. Acad. Med. Sci
- Moscow, Russia
- Russian National Research Medical University
- Medico-Biologic Faculty
- Moscow, Russia
| | | | - Vladimir V. Poroikov
- Orekhovich Institute of Biomedical Chemistry of Rus. Acad. Med. Sci
- Moscow, Russia
- Russian National Research Medical University
- Medico-Biologic Faculty
- Moscow, Russia
| |
Collapse
|
49
|
Ekins S. Progress in computational toxicology. J Pharmacol Toxicol Methods 2013; 69:115-40. [PMID: 24361690 DOI: 10.1016/j.vascn.2013.12.003] [Citation(s) in RCA: 62] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2013] [Accepted: 12/08/2013] [Indexed: 01/02/2023]
Abstract
INTRODUCTION Computational methods have been widely applied to toxicology across pharmaceutical, consumer product and environmental fields over the past decade. Progress in computational toxicology is now reviewed. METHODS A literature review was performed on computational models for hepatotoxicity (e.g. for drug-induced liver injury (DILI)), cardiotoxicity, renal toxicity and genotoxicity. In addition various publications have been highlighted that use machine learning methods. Several computational toxicology model datasets from past publications were used to compare Bayesian and Support Vector Machine (SVM) learning methods. RESULTS The increasing amounts of data for defined toxicology endpoints have enabled machine learning models that have been increasingly used for predictions. It is shown that across many different models Bayesian and SVM perform similarly based on cross validation data. DISCUSSION Considerable progress has been made in computational toxicology in a decade in both model development and availability of larger scale or 'big data' models. The future efforts in toxicology data generation will likely provide us with hundreds of thousands of compounds that are readily accessible for machine learning models. These models will cover relevant chemistry space for pharmaceutical, consumer product and environmental applications.
Collapse
Affiliation(s)
- Sean Ekins
- Collaborations in Chemistry, 5616 Hilltop Needmore Road, Fuquay Varina, NC 27526, USA; Department of Pharmaceutical Sciences, University of Maryland, 20 Penn Street, Baltimore, MD 21201, USA; Department of Pharmacology, Rutgers University-Robert Wood Johnson Medical School, 675 Hoes Lane, Piscataway, NJ 08854, USA; Division of Chemical Biology and Medicinal Chemistry, UNC Eshelman School of Pharmacy, University of North Carolina at Chapel Hill, NC 27599-7355, USA.
| |
Collapse
|
50
|
Mosca R, Pons T, Céol A, Valencia A, Aloy P. Towards a detailed atlas of protein–protein interactions. Curr Opin Struct Biol 2013; 23:929-40. [DOI: 10.1016/j.sbi.2013.07.005] [Citation(s) in RCA: 87] [Impact Index Per Article: 7.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2013] [Revised: 07/04/2013] [Accepted: 07/08/2013] [Indexed: 12/30/2022]
|