1
|
Mulero-Hernández J, Mironov V, Miñarro-Giménez JA, Kuiper M, Fernández-Breis JT. Integration of chromosome locations and functional aspects of enhancers and topologically associating domains in knowledge graphs enables versatile queries about gene regulation. Nucleic Acids Res 2024:gkae566. [PMID: 38967009 DOI: 10.1093/nar/gkae566] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2023] [Revised: 06/12/2024] [Accepted: 06/19/2024] [Indexed: 07/06/2024] Open
Abstract
Knowledge about transcription factor binding and regulation, target genes, cis-regulatory modules and topologically associating domains is not only defined by functional associations like biological processes or diseases but also has a determinative genome location aspect. Here, we exploit these location and functional aspects together to develop new strategies to enable advanced data querying. Many databases have been developed to provide information about enhancers, but a schema that allows the standardized representation of data, securing interoperability between resources, has been lacking. In this work, we use knowledge graphs for the standardized representation of enhancers and topologically associating domains, together with data about their target genes, transcription factors, location on the human genome, and functional data about diseases and gene ontology annotations. We used this schema to integrate twenty-five enhancer datasets and two domain datasets, creating the most powerful integrative resource in this field to date. The knowledge graphs have been implemented using the Resource Description Framework and integrated within the open-access BioGateway knowledge network, generating a resource that contains an interoperable set of knowledge graphs (enhancers, TADs, genes, proteins, diseases, GO terms, and interactions between domains). We show how advanced queries, which combine functional and location restrictions, can be used to develop new hypotheses about functional aspects of gene expression regulation.
Collapse
Affiliation(s)
- Juan Mulero-Hernández
- Departamento de Informática y Sistemas, Universidad de Murcia, CEIR Campus Mare Nostrum, Instituto Murciano de Investigación Biosanitaria (IMIB),30100 Murcia, Spain
| | - Vladimir Mironov
- Department of Biology, Norwegian University of Science and Technology, NO-7491 Trondheim, Norway
| | - José Antonio Miñarro-Giménez
- Departamento de Informática y Sistemas, Universidad de Murcia, CEIR Campus Mare Nostrum, Instituto Murciano de Investigación Biosanitaria (IMIB),30100 Murcia, Spain
| | - Martin Kuiper
- Department of Biology, Norwegian University of Science and Technology, NO-7491 Trondheim, Norway
| | - Jesualdo Tomás Fernández-Breis
- Departamento de Informática y Sistemas, Universidad de Murcia, CEIR Campus Mare Nostrum, Instituto Murciano de Investigación Biosanitaria (IMIB),30100 Murcia, Spain
| |
Collapse
|
2
|
Börner K, Blood PD, Silverstein JC, Ruffalo M, Teichmann SA, Pryhuber G, Misra R, Purkerson J, Fan J, Hickey JW, Molla G, Xu C, Zhang Y, Weber G, Jain Y, Qaurooni D, Kong Y, Bueckle A, Herr BW. Human BioMolecular Atlas Program (HuBMAP): 3D Human Reference Atlas Construction and Usage. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.03.27.587041. [PMID: 38826261 PMCID: PMC11142047 DOI: 10.1101/2024.03.27.587041] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2024]
Abstract
The Human BioMolecular Atlas Program (HuBMAP) aims to construct a reference 3D structural, cellular, and molecular atlas of the healthy adult human body. The HuBMAP Data Portal (https://portal.hubmapconsortium.org) serves experimental datasets and supports data processing, search, filtering, and visualization. The Human Reference Atlas (HRA) Portal (https://humanatlas.io) provides open access to atlas data, code, procedures, and instructional materials. Experts from more than 20 consortia are collaborating to construct the HRA's Common Coordinate Framework (CCF), knowledge graphs, and tools that describe the multiscale structure of the human body (from organs and tissues down to cells, genes, and biomarkers) and to use the HRA to understand changes that occur at each of these levels with aging, disease, and other perturbations. The 6th release of the HRA v2.0 covers 36 organs with 4,499 unique anatomical structures, 1,195 cell types, and 2,089 biomarkers (e.g., genes, proteins, lipids) linked to ontologies. In addition, three workflows were developed to map new experimental data into the HRA's CCF. This paper describes the HRA user stories, terminology, data formats, ontology validation, unified analysis workflows, user interfaces, instructional materials, application programming interface (APIs), flexible hybrid cloud infrastructure, and demonstrates first atlas usage applications and previews.
Collapse
Affiliation(s)
- Katy Börner
- Department of Intelligent Systems Engineering, Luddy School of Informatics, Computing, and Engineering, Indiana University, Bloomington, IN, USA
- CIFAR MacMillan Multiscale Human program, CIFAR, Toronto, ON, Canada
| | - Philip D. Blood
- Pittsburgh Supercomputing Center, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Jonathan C. Silverstein
- Department of Biomedical Informatics, University of Pittsburgh School of Medicine, Pittsburgh, PA, USA
| | - Matthew Ruffalo
- Ray and Stephanie Lane Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Sarah A. Teichmann
- CIFAR MacMillan Multiscale Human program, CIFAR, Toronto, ON, Canada
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | | | - Ravi Misra
- University of Rochester Medical Center, Rochester, NY, USA
| | | | - Jean Fan
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore MD, USA
| | - John W. Hickey
- Department of Biomedical Engineering, Duke University, Durham, NC, USA
| | | | - Chuan Xu
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Yun Zhang
- J. Craig Venter Institute, La Jolla, CA, USA
| | - Griffin Weber
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Yashvardhan Jain
- Department of Intelligent Systems Engineering, Luddy School of Informatics, Computing, and Engineering, Indiana University, Bloomington, IN, USA
| | - Danial Qaurooni
- Department of Intelligent Systems Engineering, Luddy School of Informatics, Computing, and Engineering, Indiana University, Bloomington, IN, USA
| | - Yongxin Kong
- Department of Intelligent Systems Engineering, Luddy School of Informatics, Computing, and Engineering, Indiana University, Bloomington, IN, USA
| | | | - Andreas Bueckle
- Department of Intelligent Systems Engineering, Luddy School of Informatics, Computing, and Engineering, Indiana University, Bloomington, IN, USA
| | - Bruce W. Herr
- Department of Intelligent Systems Engineering, Luddy School of Informatics, Computing, and Engineering, Indiana University, Bloomington, IN, USA
| |
Collapse
|
3
|
van Rijn JPM, Martens M, Ammar A, Cimpan MR, Fessard V, Hoet P, Jeliazkova N, Murugadoss S, Vinković Vrček I, Willighagen EL. From papers to RDF-based integration of physicochemical data and adverse outcome pathways for nanomaterials. J Cheminform 2024; 16:49. [PMID: 38693555 PMCID: PMC11064368 DOI: 10.1186/s13321-024-00833-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2023] [Accepted: 03/23/2024] [Indexed: 05/03/2024] Open
Abstract
Adverse Outcome Pathways (AOPs) have been proposed to facilitate mechanistic understanding of interactions of chemicals/materials with biological systems. Each AOP starts with a molecular initiating event (MIE) and possibly ends with adverse outcome(s) (AOs) via a series of key events (KEs). So far, the interaction of engineered nanomaterials (ENMs) with biomolecules, biomembranes, cells, and biological structures, in general, is not yet fully elucidated. There is also a huge lack of information on which AOPs are ENMs-relevant or -specific, despite numerous published data on toxicological endpoints they trigger, such as oxidative stress and inflammation. We propose to integrate related data and knowledge recently collected. Our approach combines the annotation of nanomaterials and their MIEs with ontology annotation to demonstrate how we can then query AOPs and biological pathway information for these materials. We conclude that a FAIR (Findable, Accessible, Interoperable, Reusable) representation of the ENM-MIE knowledge simplifies integration with other knowledge. SCIENTIFIC CONTRIBUTION: This study introduces a new database linking nanomaterial stressors to the first known MIE or KE. Second, it presents a reproducible workflow to analyze and summarize this knowledge. Third, this work extends the use of semantic web technologies to the field of nanoinformatics and nanosafety.
Collapse
Affiliation(s)
- Jeaphianne P M van Rijn
- Dept of Bioinformatics, BiGCaT, NUTRIM, FHML, Maastricht University, Maastricht, The Netherlands
| | - Marvin Martens
- Dept of Bioinformatics, BiGCaT, NUTRIM, FHML, Maastricht University, Maastricht, The Netherlands
| | - Ammar Ammar
- Dept of Bioinformatics, BiGCaT, NUTRIM, FHML, Maastricht University, Maastricht, The Netherlands
| | - Mihaela Roxana Cimpan
- Department of Clinical Dentistry, Faculty of Medicine, University of Bergen, Bergen, Norway
| | - Valerie Fessard
- Fougères Laboratory, Anses, French Agency for Food, Environmental and Occupational Health and Safety, Toxicology of Contaminants Unit, Fougères, France
| | - Peter Hoet
- Laboratory of Toxicology, Unit of Environment and Health, Department of Public Health and Primary Care, KU Leuven, Leuven, Belgium
| | | | - Sivakumar Murugadoss
- Laboratory of Toxicology, Unit of Environment and Health, Department of Public Health and Primary Care, KU Leuven, Leuven, Belgium
- SD Chemical and Physical Health Risks, Brussels, Belgium
| | | | - Egon L Willighagen
- Dept of Bioinformatics, BiGCaT, NUTRIM, FHML, Maastricht University, Maastricht, The Netherlands.
| |
Collapse
|
4
|
Behr AS, Borgelt H, Kockmann N. Ontologies4Cat: investigating the landscape of ontologies for catalysis research data management. J Cheminform 2024; 16:16. [PMID: 38326906 PMCID: PMC10851519 DOI: 10.1186/s13321-024-00807-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2023] [Accepted: 01/22/2024] [Indexed: 02/09/2024] Open
Abstract
As scientific digitization advances it is imperative ensuring data is Findable, Accessible, Interoperable, and Reusable (FAIR) for machine-processable data. Ontologies play a vital role in enhancing data FAIRness by explicitly representing knowledge in a machine-understandable format. Research data in catalysis research often exhibits complexity and diversity, necessitating a respectively broad collection of ontologies. While ontology portals such as EBI OLS and BioPortal aid in ontology discovery, they lack deep classification, while quality metrics for ontology reusability and domains are absent for the domain of catalysis research. Thus, this work provides an approach for systematic collection of ontology metadata with focus on the catalysis research data value chain. By classifying ontologies by subdomains of catalysis research, the approach is offering efficient comparison across ontologies. Furthermore, a workflow and codebase is presented, facilitating representation of the metadata on GitHub. Finally, a method is presented to automatically map the classes contained in the ontologies of the metadata collection against each other, providing further insights on relatedness of the ontologies listed. The presented methodology is designed for its reusability, enabling its adaptation to other ontology collections or domains of knowledge. The ontology metadata taken up for this work and the code developed and described in this work are available in a GitHub repository at: https://github.com/nfdi4cat/Ontology-Overview-of-NFDI4Cat .
Collapse
Affiliation(s)
- Alexander S Behr
- Laboratory of Equipment Design, Faculty of Biochemical and Chemical Engineering, TU-Dortmund University, Emil-Figge-Strasse 68, 44139, Dortmund, NRW, Germany.
| | - Hendrik Borgelt
- Laboratory of Equipment Design, Faculty of Biochemical and Chemical Engineering, TU-Dortmund University, Emil-Figge-Strasse 68, 44139, Dortmund, NRW, Germany
| | - Norbert Kockmann
- Laboratory of Equipment Design, Faculty of Biochemical and Chemical Engineering, TU-Dortmund University, Emil-Figge-Strasse 68, 44139, Dortmund, NRW, Germany
| |
Collapse
|
5
|
Miranda-Escalada A, Mehryary F, Luoma J, Estrada-Zavala D, Gasco L, Pyysalo S, Valencia A, Krallinger M. Overview of DrugProt task at BioCreative VII: data and methods for large-scale text mining and knowledge graph generation of heterogenous chemical-protein relations. Database (Oxford) 2023; 2023:baad080. [PMID: 38015956 PMCID: PMC10683943 DOI: 10.1093/database/baad080] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2022] [Revised: 09/22/2023] [Accepted: 10/30/2023] [Indexed: 11/30/2023]
Abstract
It is getting increasingly challenging to efficiently exploit drug-related information described in the growing amount of scientific literature. Indeed, for drug-gene/protein interactions, the challenge is even bigger, considering the scattered information sources and types of interactions. However, their systematic, large-scale exploitation is key for developing tools, impacting knowledge fields as diverse as drug design or metabolic pathway research. Previous efforts in the extraction of drug-gene/protein interactions from the literature did not address these scalability and granularity issues. To tackle them, we have organized the DrugProt track at BioCreative VII. In the context of the track, we have released the DrugProt Gold Standard corpus, a collection of 5000 PubMed abstracts, manually annotated with granular drug-gene/protein interactions. We have proposed a novel large-scale track to evaluate the capacity of natural language processing systems to scale to the range of millions of documents, and generate with their predictions a silver standard knowledge graph of 53 993 602 nodes and 19 367 406 edges. Its use exceeds the shared task and points toward pharmacological and biological applications such as drug discovery or continuous database curation. Finally, we have created a persistent evaluation scenario on CodaLab to continuously evaluate new relation extraction systems that may arise. Thirty teams from four continents, which involved 110 people, sent 107 submission runs for the Main DrugProt track, and nine teams submitted 21 runs for the Large Scale DrugProt track. Most participants implemented deep learning approaches based on pretrained transformer-like language models (LMs) such as BERT or BioBERT, reaching precision and recall values as high as 0.9167 and 0.9542 for some relation types. Finally, some initial explorations of the applicability of the knowledge graph have shown its potential to explore the chemical-protein relations described in the literature, or chemical compound-enzyme interactions. Database URL: https://doi.org/10.5281/zenodo.4955410.
Collapse
Affiliation(s)
| | - Farrokh Mehryary
- TurkuNLP Group, Department of Computing, University of Turku, Turku 20014, Finland
| | - Jouni Luoma
- TurkuNLP Group, Department of Computing, University of Turku, Turku 20014, Finland
| | | | - Luis Gasco
- Life Sciences Department, Barcelona Supercomputing Center, Barcelona 08034, Spain
| | - Sampo Pyysalo
- TurkuNLP Group, Department of Computing, University of Turku, Turku 20014, Finland
| | - Alfonso Valencia
- Life Sciences Department, Barcelona Supercomputing Center, Barcelona 08034, Spain
| | - Martin Krallinger
- Life Sciences Department, Barcelona Supercomputing Center, Barcelona 08034, Spain
| |
Collapse
|
6
|
Mullowney MW, Duncan KR, Elsayed SS, Garg N, van der Hooft JJJ, Martin NI, Meijer D, Terlouw BR, Biermann F, Blin K, Durairaj J, Gorostiola González M, Helfrich EJN, Huber F, Leopold-Messer S, Rajan K, de Rond T, van Santen JA, Sorokina M, Balunas MJ, Beniddir MA, van Bergeijk DA, Carroll LM, Clark CM, Clevert DA, Dejong CA, Du C, Ferrinho S, Grisoni F, Hofstetter A, Jespers W, Kalinina OV, Kautsar SA, Kim H, Leao TF, Masschelein J, Rees ER, Reher R, Reker D, Schwaller P, Segler M, Skinnider MA, Walker AS, Willighagen EL, Zdrazil B, Ziemert N, Goss RJM, Guyomard P, Volkamer A, Gerwick WH, Kim HU, Müller R, van Wezel GP, van Westen GJP, Hirsch AKH, Linington RG, Robinson SL, Medema MH. Artificial intelligence for natural product drug discovery. Nat Rev Drug Discov 2023; 22:895-916. [PMID: 37697042 DOI: 10.1038/s41573-023-00774-7] [Citation(s) in RCA: 16] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/20/2023] [Indexed: 09/13/2023]
Abstract
Developments in computational omics technologies have provided new means to access the hidden diversity of natural products, unearthing new potential for drug discovery. In parallel, artificial intelligence approaches such as machine learning have led to exciting developments in the computational drug design field, facilitating biological activity prediction and de novo drug design for molecular targets of interest. Here, we describe current and future synergies between these developments to effectively identify drug candidates from the plethora of molecules produced by nature. We also discuss how to address key challenges in realizing the potential of these synergies, such as the need for high-quality datasets to train deep learning algorithms and appropriate strategies for algorithm validation.
Collapse
Affiliation(s)
| | - Katherine R Duncan
- Strathclyde Institute of Pharmacy and Biomedical Sciences, University of Strathclyde, Glasgow, UK
| | - Somayah S Elsayed
- Department of Molecular Biotechnology, Institute of Biology, Leiden University, Leiden, The Netherlands
| | - Neha Garg
- School of Chemistry and Biochemistry, Center for Microbial Dynamics and Infection, Georgia Institute of Technology, Atlanta, GA, USA
| | - Justin J J van der Hooft
- Bioinformatics Group, Wageningen University, Wageningen, The Netherlands
- Department of Biochemistry, University of Johannesburg, Johannesburg, South Africa
| | - Nathaniel I Martin
- Biological Chemistry Group, Institute of Biology, Leiden University, Leiden, The Netherlands
| | - David Meijer
- Bioinformatics Group, Wageningen University, Wageningen, The Netherlands
| | - Barbara R Terlouw
- Bioinformatics Group, Wageningen University, Wageningen, The Netherlands
| | - Friederike Biermann
- Bioinformatics Group, Wageningen University, Wageningen, The Netherlands
- Institute of Molecular Bio Science, Goethe-University Frankfurt, Frankfurt am Main, Germany
- LOEWE Center for Translational Biodiversity Genomics (TBG), Frankfurt am Main, Germany
| | - Kai Blin
- The Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Kongens Lyngby, Denmark
| | | | - Marina Gorostiola González
- Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Leiden, The Netherlands
- ONCODE institute, Leiden, The Netherlands
| | - Eric J N Helfrich
- Institute of Molecular Bio Science, Goethe-University Frankfurt, Frankfurt am Main, Germany
- LOEWE Center for Translational Biodiversity Genomics (TBG), Frankfurt am Main, Germany
| | - Florian Huber
- Center for Digitalization and Digitality, Hochschule Düsseldorf, Düsseldorf, Germany
| | - Stefan Leopold-Messer
- Institut für Mikrobiologie, Eidgenössische Technische Hochschule (ETH) Zürich, Zürich, Switzerland
| | - Kohulan Rajan
- Institute for Inorganic and Analytical Chemistry, Friedrich-Schiller-University Jena, Jena, Germany
| | - Tristan de Rond
- School of Chemical Sciences, University of Auckland, Auckland, New Zealand
| | - Jeffrey A van Santen
- Department of Chemistry, Simon Fraser University, Burnaby, British Columbia, Canada
| | - Maria Sorokina
- Institute for Inorganic and Analytical Chemistry, Friedrich-Schiller University, Jena, Germany
- Pharmaceuticals R&D, Bayer AG, Berlin, Germany
| | - Marcy J Balunas
- Department of Microbiology and Immunology, University of Michigan, Ann Arbor, MI, USA
- Department of Medicinal Chemistry, University of Michigan, Ann Arbor, MI, USA
| | - Mehdi A Beniddir
- Équipe "Chimie des Substances Naturelles", Université Paris-Saclay, CNRS, BioCIS, Orsay, France
| | - Doris A van Bergeijk
- Department of Molecular Biotechnology, Institute of Biology, Leiden University, Leiden, The Netherlands
| | - Laura M Carroll
- Structural and Computational Biology Unit, EMBL, Heidelberg, Germany
| | - Chase M Clark
- Division of Pharmaceutical Sciences, School of Pharmacy, University of Wisconsin-Madison, Madison, WI, USA
| | | | | | - Chao Du
- Department of Molecular Biotechnology, Institute of Biology, Leiden University, Leiden, The Netherlands
| | | | - Francesca Grisoni
- Institute for Complex Molecular Systems, Department of Biomedical Engineering, Eindhoven University of Technology, Eindhoven, The Netherlands
- Centre for Living Technologies, Alliance TU/e, WUR, UU, UMC Utrecht, Utrecht, The Netherlands
| | | | - Willem Jespers
- Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Leiden, The Netherlands
| | - Olga V Kalinina
- Helmholtz Institute for Pharmaceutical Research Saarland (HIPS), Helmholtz Centre for Infection Research (HZI), Saarbrücken, Germany
- Drug Bioinformatics, Medical Faculty, Saarland University, Homburg, Germany
- Center for Bioinformatics, Saarland University, Saarbrücken, Germany
| | | | - Hyunwoo Kim
- College of Pharmacy and Integrated Research Institute for Drug Development, Dongguk University Seoul, Goyang-si, Republic of Korea
| | - Tiago F Leao
- Center for Nuclear Energy in Agriculture, University of São Paulo, Piracicaba, Brazil
| | - Joleen Masschelein
- Center for Microbiology, VIB-KU Leuven, Heverlee, Belgium
- Department of Biology, KU Leuven, Heverlee, Belgium
| | - Evan R Rees
- Division of Pharmaceutical Sciences, School of Pharmacy, University of Wisconsin-Madison, Madison, WI, USA
| | - Raphael Reher
- Institute of Pharmaceutical Biology and Biotechnology, University of Marburg, Marburg, Germany
- Institute of Pharmacy, Martin-Luther-University Halle-Wittenberg, Halle (Saale), Germany
| | - Daniel Reker
- Department of Biomedical Engineering, Duke University, Durham, NC, USA
- Duke Microbiome Center, Duke University, Durham, NC, USA
| | - Philippe Schwaller
- Laboratory of Artificial Chemical Intelligence, Institut des Sciences et Ingénierie Chimiques, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
| | | | - Michael A Skinnider
- Adapsyn Bioscience, Hamilton, Ontario, Canada
- Michael Smith Laboratories, University of British Columbia, Vancouver, British Columbia, Canada
| | - Allison S Walker
- Department of Chemistry, Vanderbilt University, Nashville, TN, USA
- Department of Biological Sciences, Vanderbilt University, Nashville, TN, USA
| | - Egon L Willighagen
- Department of Bioinformatics - BiGCaT, NUTRIM, Maastricht University, Maastricht, The Netherlands
| | - Barbara Zdrazil
- European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Cambridgeshire, UK
| | - Nadine Ziemert
- Interfaculty Institute for Microbiology and Infection Medicine Tuebingen (IMIT), Institute for Bioinformatics and Medical Informatics (IBMI), University of Tuebingen, Tuebingen, Germany
| | | | - Pierre Guyomard
- Bonsai team, CRIStAL - Centre de Recherche en Informatique Signal et Automatique de Lille, Université de Lille, Villeneuve d'Ascq Cedex, France
| | - Andrea Volkamer
- Center for Bioinformatics, Saarland University, Saarbrücken, Germany
- In silico Toxicology and Structural Bioinformatics, Institute of Physiology, Charité - Universitätsmedizin Berlin, Berlin, Germany
| | - William H Gerwick
- Scripps Institution of Oceanography, University of California San Diego, La Jolla, CA, USA
| | - Hyun Uk Kim
- Department of Chemical and Biomolecular Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, Korea
| | - Rolf Müller
- Helmholtz Institute for Pharmaceutical Research Saarland (HIPS), Helmholtz Centre for Infection Research (HZI), Saarbrücken, Germany
- Department of Pharmacy, Saarland University, Saarbrücken, Germany
- German Center for infection research (DZIF), Braunschweig, Germany
- Helmholtz International Lab for Anti-Infectives, Saarbrücken, Germany
| | - Gilles P van Wezel
- Department of Molecular Biotechnology, Institute of Biology, Leiden University, Leiden, The Netherlands
- Netherlands Institute of Ecology, NIOO-KNAW, Wageningen, The Netherlands
| | - Gerard J P van Westen
- Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Leiden, The Netherlands.
| | - Anna K H Hirsch
- Helmholtz Institute for Pharmaceutical Research Saarland (HIPS), Helmholtz Centre for Infection Research (HZI), Saarbrücken, Germany.
- Department of Pharmacy, Saarland University, Saarbrücken, Germany.
- German Center for infection research (DZIF), Braunschweig, Germany.
- Helmholtz International Lab for Anti-Infectives, Saarbrücken, Germany.
| | - Roger G Linington
- Department of Chemistry, Simon Fraser University, Burnaby, British Columbia, Canada.
| | - Serina L Robinson
- Department of Environmental Microbiology, Eawag: Swiss Federal Institute for Aquatic Science and Technology, Dübendorf, Switzerland.
| | - Marnix H Medema
- Bioinformatics Group, Wageningen University, Wageningen, The Netherlands.
- Institute of Biology, Leiden University, Leiden, The Netherlands.
| |
Collapse
|
7
|
Penn S, Lomax J, Karlsson A, Antonucci V, Zachmann CD, Kanza S, Schurer S, Turner J. An extension of the BioAssay Ontology to include pharmacokinetic/pharmacodynamic terminology for the enrichment of scientific workflows. J Biomed Semantics 2023; 14:10. [PMID: 37568227 PMCID: PMC10416407 DOI: 10.1186/s13326-023-00288-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2022] [Accepted: 04/29/2023] [Indexed: 08/13/2023] Open
Abstract
With the capacity to produce and record data electronically, Scientific research and the data associated with it have grown at an unprecedented rate. However, despite a decent amount of data now existing in an electronic form, it is still common for scientific research to be recorded in an unstructured text format with inconsistent context (vocabularies) which vastly reduces the potential for direct intelligent analysis. Research has demonstrated that the use of semantic technologies such as ontologies to structure and enrich scientific data can greatly improve this potential. However, whilst there are many ontologies that can be used for this purpose, there is still a vast quantity of scientific terminology that does not have adequate semantic representation. A key area for expansion identified by the authors was the pharmacokinetic/pharmacodynamic (PK/PD) domain due to its high usage across many areas of Pharma. As such we have produced a set of these terms and other bioassay related terms to be incorporated into the BioAssay Ontology (BAO), which was identified as the most relevant ontology for this work. A number of use cases developed by experts in the field were used to demonstrate how these new ontology terms can be used, and to set the scene for the continuation of this work with a look to expanding this work out into further relevant domains. The work done in this paper was part of Phase 1 of the SEED project (Semantically Enriching electronic laboratory notebook (eLN) Data).
Collapse
Affiliation(s)
- Steve Penn
- Pfizer Inc, 1 Portland Street, Cambridge, MA 02139 USA
| | - Jane Lomax
- Scibite an Elsevier Company, Scibite Ltd, Biodata Innovation Centre, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1DR UK
| | - Anneli Karlsson
- Scibite an Elsevier Company, Scibite Ltd, Biodata Innovation Centre, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1DR UK
| | | | - Carl-Dieter Zachmann
- Sanofi-Aventis Deutschland GmbH, R&D / Integrated Drug Discovery, Industriepark Hoechst, Frankfurt am Main, H831 C.0156, 65926 Germany
| | - Samantha Kanza
- Department of Chemistry, University of Southampton, Highfield Campus, University Road, Southampton, SO17 1BJ UK
| | - Stephan Schurer
- Department of Cellular and Molecular Pharmacology, Miller School of Medicine, University of Miami, Miami, FL 33136 USA
| | - John Turner
- Department of Cellular and Molecular Pharmacology, Miller School of Medicine, University of Miami, Miami, FL 33136 USA
| |
Collapse
|
8
|
Kim S, Chen J, Cheng T, Gindulyte A, He J, He S, Li Q, Shoemaker BA, Thiessen PA, Yu B, Zaslavsky L, Zhang J, Bolton EE. PubChem 2023 update. Nucleic Acids Res 2022; 51:D1373-D1380. [PMID: 36305812 PMCID: PMC9825602 DOI: 10.1093/nar/gkac956] [Citation(s) in RCA: 523] [Impact Index Per Article: 261.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2022] [Revised: 10/06/2022] [Accepted: 10/13/2022] [Indexed: 01/30/2023] Open
Abstract
PubChem (https://pubchem.ncbi.nlm.nih.gov) is a popular chemical information resource that serves a wide range of use cases. In the past two years, a number of changes were made to PubChem. Data from more than 120 data sources was added to PubChem. Some major highlights include: the integration of Google Patents data into PubChem, which greatly expanded the coverage of the PubChem Patent data collection; the creation of the Cell Line and Taxonomy data collections, which provide quick and easy access to chemical information for a given cell line and taxon, respectively; and the update of the bioassay data model. In addition, new functionalities were added to the PubChem programmatic access protocols, PUG-REST and PUG-View, including support for target-centric data download for a given protein, gene, pathway, cell line, and taxon and the addition of the 'standardize' option to PUG-REST, which returns the standardized form of an input chemical structure. A significant update was also made to PubChemRDF. The present paper provides an overview of these changes.
Collapse
Affiliation(s)
- Sunghwan Kim
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, Bethesda, MD, 20894, USA
| | - Jie Chen
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, Bethesda, MD, 20894, USA
| | - Tiejun Cheng
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, Bethesda, MD, 20894, USA
| | - Asta Gindulyte
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, Bethesda, MD, 20894, USA
| | - Jia He
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, Bethesda, MD, 20894, USA
| | - Siqian He
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, Bethesda, MD, 20894, USA
| | - Qingliang Li
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, Bethesda, MD, 20894, USA
| | - Benjamin A Shoemaker
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, Bethesda, MD, 20894, USA
| | - Paul A Thiessen
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, Bethesda, MD, 20894, USA
| | - Bo Yu
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, Bethesda, MD, 20894, USA
| | - Leonid Zaslavsky
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, Bethesda, MD, 20894, USA
| | - Jian Zhang
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, Bethesda, MD, 20894, USA
| | - Evan E Bolton
- To whom correspondence should be addressed. Tel: +1 301 451 1811; Fax: +1 301 480 4559;
| |
Collapse
|
9
|
Zheng S, Aldahdooh J, Shadbahr T, Wang Y, Aldahdooh D, Bao J, Wang W, Tang J. DrugComb update: a more comprehensive drug sensitivity data repository and analysis portal. Nucleic Acids Res 2021; 49:W174-W184. [PMID: 34060634 PMCID: PMC8218202 DOI: 10.1093/nar/gkab438] [Citation(s) in RCA: 43] [Impact Index Per Article: 14.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2021] [Revised: 04/18/2021] [Accepted: 05/06/2021] [Indexed: 02/06/2023] Open
Abstract
Combinatorial therapies that target multiple pathways have shown great promises for treating complex diseases. DrugComb (https://drugcomb.org/) is a web-based portal for the deposition and analysis of drug combination screening datasets. Since its first release, DrugComb has received continuous updates on the coverage of data resources, as well as on the functionality of the web server to improve the analysis, visualization and interpretation of drug combination screens. Here, we report significant updates of DrugComb, including: (i) manual curation and harmonization of more comprehensive drug combination and monotherapy screening data, not only for cancers but also for other diseases such as malaria and COVID-19; (ii) enhanced algorithms for assessing the sensitivity and synergy of drug combinations; (iii) network modelling tools to visualize the mechanisms of action of drugs or drug combinations for a given cancer sample and (iv) state-of-the-art machine learning models to predict drug combination sensitivity and synergy. These improvements have been provided with more user-friendly graphical interface and faster database infrastructure, which make DrugComb the most comprehensive web-based resources for the study of drug sensitivities for multiple diseases.
Collapse
Affiliation(s)
- Shuyu Zheng
- Research Program in Systems Oncology, Faculty of Medicine, University of Helsinki, Helsinki FI-00290, Finland
| | - Jehad Aldahdooh
- Research Program in Systems Oncology, Faculty of Medicine, University of Helsinki, Helsinki FI-00290, Finland
| | - Tolou Shadbahr
- Research Program in Systems Oncology, Faculty of Medicine, University of Helsinki, Helsinki FI-00290, Finland
| | - Yinyin Wang
- Research Program in Systems Oncology, Faculty of Medicine, University of Helsinki, Helsinki FI-00290, Finland
| | - Dalal Aldahdooh
- Research Program in Systems Oncology, Faculty of Medicine, University of Helsinki, Helsinki FI-00290, Finland
| | - Jie Bao
- Research Program in Systems Oncology, Faculty of Medicine, University of Helsinki, Helsinki FI-00290, Finland
- Institute for Molecular Medicine Finland, University of Helsinki, Helsinki FI-00290, Finland
| | - Wenyu Wang
- Research Program in Systems Oncology, Faculty of Medicine, University of Helsinki, Helsinki FI-00290, Finland
| | - Jing Tang
- Research Program in Systems Oncology, Faculty of Medicine, University of Helsinki, Helsinki FI-00290, Finland
| |
Collapse
|
10
|
Korotkevich EI, Rudik AV, Dmitriev AV, Lagunin AA, Filimonov DA. [Predict of metabolic stability of xenobiotics by the PASS and GUSAR programs]. BIOMEDIT︠S︡INSKAI︠A︡ KHIMII︠A︡ 2021; 67:295-299. [PMID: 34142537 DOI: 10.18097/pbmc20216703295] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
Abstract
Metabolic stability refers to the susceptibility of compounds to the biotransformation; it is characterized by such pharmacokinetic parameters as half-life (T1/2) and clearance (CL). Generally, these parameters are estimated by in vitro assays, which are based on cells or subcellular fractions (mainly liver microsomal enzymes) and serve as models of the processes occurring in living organisms. Data obtained from the experiments are used to build QSAR (Quantitative Structure-Activity Relationship) models. More than 8000 compounds with known CL and/or T1/2 values obtained in vitro using human liver microsomes were selected from the freely available ChEMBL v.27 database. GUSAR (General Unrestricted Structure-Activity Relationships) and PASS (Prediction of Activity Spectra for Substances) softwares were used to make quantitative and classification models. The quality of the models was evaluated using 5-fold cross-validation. Compounds were subdivided into "stable" and "unstable" by means of the following threshold parameters: T1/2 = 30 minutes, CL = 20 ml/min/kg. The accuracy of the models ranged from 0.5 (calculated in 5-fold CV on the test set for the half-life prediction quantitative model) to 0.96 (calculated in 5-fold CV on the test set for the clearance prediction classification model).
Collapse
Affiliation(s)
- E I Korotkevich
- Institute of Biomedical Chemistry, Moscow, Russia; Medico-biological Faculty, Pirogov Russian National Research Medical University, Moscow, Russia
| | - A V Rudik
- Institute of Biomedical Chemistry, Moscow, Russia
| | - A V Dmitriev
- Institute of Biomedical Chemistry, Moscow, Russia
| | - A A Lagunin
- Institute of Biomedical Chemistry, Moscow, Russia; Medico-biological Faculty, Pirogov Russian National Research Medical University, Moscow, Russia
| | | |
Collapse
|
11
|
Kundu K, Darden L, Moult J. MecCog: A knowledge representation framework for genetic disease mechanism. Bioinformatics 2021; 37:4180-4186. [PMID: 34117883 DOI: 10.1093/bioinformatics/btab432] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2020] [Revised: 03/11/2021] [Accepted: 06/11/2021] [Indexed: 12/16/2022] Open
Abstract
MOTIVATION Experimental findings on genetic disease mechanisms are scattered throughout the literature and represented in many ways, including unstructured text, cartoons, pathway diagrams, and network graphs. Integration and structuring of such mechanistic information greatly enhances its utility. RESULTS MecCog is a graphical framework for building integrated representations (mechanism schemas) of mechanisms by which a genetic variant causes a disease phenotype. A MecCog mechanism schema displays the propagation of system perturbations across stages of biological organization, using graphical notations to symbolize perturbed entities and activities, hyperlinked evidence tagging, a mechanism ontology, and depiction of knowledge gaps, ambiguities, and uncertainties. The web platform enables a user to construct, store, publish, browse, query, and comment on schemas. MecCog facilitates the identification of potential biomarkers, therapeutic intervention sites, and critical future experiments. AVAILABILITY The MecCog framework is freely available at http://www.meccog.org. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Kunal Kundu
- Computational Biology, Bioinformatics and Genomics, Biological Sciences Graduate Program, University of Maryland, College Park, MD, 20742, USA.,Institute for Bioscience and Biotechnology Research, University of Maryland, 9600 Gudelsky Drive, Rockville, MD, 20850, USA
| | - Lindley Darden
- Department of Philosophy, University of Maryland, College Park, MD, 20742, USA
| | - John Moult
- Institute for Bioscience and Biotechnology Research, University of Maryland, 9600 Gudelsky Drive, Rockville, MD, 20850, USA.,Department of Cell Biology and Molecular Genetics, University of Maryland, College Park, MD, 20742, USA
| |
Collapse
|
12
|
The Life of a Trailing Spouse. J Neurosci 2021; 41:3-10. [PMID: 33408132 DOI: 10.1523/jneurosci.2874-20.2020] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2020] [Revised: 11/22/2020] [Accepted: 11/24/2020] [Indexed: 11/21/2022] Open
Abstract
In 1981, I published a paper in the first issue of the Journal of Neuroscience with my postdoctoral mentor, Alan Pearlman. It reported a quantitative analysis of the receptive field properties of neurons in reeler mouse visual cortex and the surprising conclusion that although the neuronal somas were strikingly malpositioned, their receptive fields were unchanged. This suggested that in mouse cortex at least, neuronal circuits have very robust systems in place to ensure the proper formation of connections. This had the unintended consequence of transforming me from an electrophysiologist into a cellular and molecular neuroscientist who studied cell adhesion molecules and the molecular mechanisms they use to regulate axon growth. It took me a surprisingly long time to appreciate that your science is driven by the people around you and by the technologies that are locally available. As a professional puzzler, I like all different kinds of puzzles, but the most fun puzzles involve playing with other puzzlers. This is my story of learning how to find like-minded puzzlers to solve riddles about axon growth and regeneration.
Collapse
|
13
|
Kanza S, Graham Frey J. Semantic Technologies in Drug Discovery. SYSTEMS MEDICINE 2021. [DOI: 10.1016/b978-0-12-801238-3.11520-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022] Open
|
14
|
Issa NT, Stathias V, Schürer S, Dakshanamurthy S. Machine and deep learning approaches for cancer drug repurposing. Semin Cancer Biol 2021; 68:132-142. [PMID: 31904426 PMCID: PMC7723306 DOI: 10.1016/j.semcancer.2019.12.011] [Citation(s) in RCA: 103] [Impact Index Per Article: 34.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2019] [Revised: 10/31/2019] [Accepted: 12/15/2019] [Indexed: 02/07/2023]
Abstract
Knowledge of the underpinnings of cancer initiation, progression and metastasis has increased exponentially in recent years. Advanced "omics" coupled with machine learning and artificial intelligence (deep learning) methods have helped elucidate targets and pathways critical to those processes that may be amenable to pharmacologic modulation. However, the current anti-cancer therapeutic armamentarium continues to lag behind. As the cost of developing a new drug remains prohibitively expensive, repurposing of existing approved and investigational drugs is sought after given known safety profiles and reduction in the cost barrier. Notably, successes in oncologic drug repurposing have been infrequent. Computational in-silico strategies have been developed to aid in modeling biological processes to find new disease-relevant targets and discovering novel drug-target and drug-phenotype associations. Machine and deep learning methods have especially enabled leaps in those successes. This review will discuss these methods as they pertain to cancer biology as well as immunomodulation for drug repurposing opportunities in oncologic diseases.
Collapse
Affiliation(s)
- Naiem T Issa
- Dr. Phillip Frost Department of Dermatology and Cutaneous Surgery, University of Miami School of Medicine, Miami, FL, USA
| | - Vasileios Stathias
- Department of Molecular and Cellular Pharmacology, University of Miami School of Medicine, Miami, FL, USA
| | - Stephan Schürer
- Department of Molecular and Cellular Pharmacology, University of Miami School of Medicine, Miami, FL, USA
| | - Sivanesan Dakshanamurthy
- Department of Oncology, Lombardi Comprehensive Cancer Center, Georgetown University Medical Center, Washington, DC, USA.
| |
Collapse
|
15
|
Giblin KA, Basili D, Afzal AM, Rosenbrier-Ribeiro L, Greene N, Barrett I, Hughes SJ, Bender A. New Associations between Drug-Induced Adverse Events in Animal Models and Humans Reveal Novel Candidate Safety Targets. Chem Res Toxicol 2020; 34:438-451. [PMID: 33338378 DOI: 10.1021/acs.chemrestox.0c00311] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023]
Abstract
To improve our ability to extrapolate preclinical toxicity to humans, there is a need to understand and quantify the concordance of adverse events (AEs) between animal models and clinical studies. In the present work, we discovered 3011 statistically significant associations between preclinical and clinical AEs caused by drugs reported in the PharmaPendium database of which 2952 were new associations between toxicities encoded by different Medical Dictionary for Regulatory Activities terms across species. To find plausible and testable candidate off-target drug activities for the derived associations, we investigated the genetic overlap between the genes linked to both a preclinical and a clinical AE and the protein targets found to interact with one or more drugs causing both AEs. We discuss three associations from the analysis in more detail for which novel candidate off-target drug activities could be identified, namely, the association of preclinical mutagenicity readouts with clinical teratospermia and ovarian failure, the association of preclinical reflexes abnormal with clinical poor-quality sleep, and the association of preclinical psychomotor hyperactivity with clinical drug withdrawal syndrome. Our analysis successfully identified a total of 77% of known safety targets currently tested in in vitro screening panels plus an additional 431 genes which were proposed for investigation as future safety targets for different clinical toxicities. This work provides new translational toxicity relationships beyond AE term-matching, the results of which can be used for risk profiling of future new chemical entities for clinical studies and for the development of future in vitro safety panels.
Collapse
Affiliation(s)
- Kathryn A Giblin
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge CB2 1EW, United Kingdom.,Medicinal Chemistry, Research and Early Development, Oncology R&D, AstraZeneca, Cambridge CB4 0WG, United Kingdom
| | - Danilo Basili
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge CB2 1EW, United Kingdom
| | - Avid M Afzal
- Data Sciences and Quantitative Biology, Discovery Sciences, R&D, AstraZeneca, Cambridge CB4 0WG, United Kingdom
| | - Lyn Rosenbrier-Ribeiro
- Safety Platforms, Clinical Pharmacology and Safety Sciences, R&D, AstraZeneca, Cambridge CB4 0WG, United Kingdom
| | - Nigel Greene
- Data Science and Artificial Intelligence, Clinical Pharmacology and Safety Sciences, R&D, AstraZeneca, Boston, Massachusetts 02451, United States
| | - Ian Barrett
- Data Sciences and Quantitative Biology, Discovery Sciences, R&D, AstraZeneca, Cambridge CB4 0WG, United Kingdom
| | - Samantha J Hughes
- Medicinal Chemistry, Research and Early Development, Oncology R&D, AstraZeneca, Cambridge CB4 0WG, United Kingdom
| | - Andreas Bender
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge CB2 1EW, United Kingdom
| |
Collapse
|
16
|
Cáceres EL, Mew NC, Keiser MJ. Adding Stochastic Negative Examples into Machine Learning Improves Molecular Bioactivity Prediction. J Chem Inf Model 2020; 60:5957-5970. [PMID: 33245237 DOI: 10.1021/acs.jcim.0c00565] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/30/2023]
Abstract
Multitask deep neural networks learn to predict ligand-target binding by example, yet public pharmacological data sets are sparse, imbalanced, and approximate. We constructed two hold-out benchmarks to approximate temporal and drug-screening test scenarios, whose characteristics differ from a random split of conventional training data sets. We developed a pharmacological data set augmentation procedure, Stochastic Negative Addition (SNA), which randomly assigns untested molecule-target pairs as transient negative examples during training. Under the SNA procedure, drug-screening benchmark performance increases from R2 = 0.1926 ± 0.0186 to 0.4269 ± 0.0272 (122%). This gain was accompanied by a modest decrease in the temporal benchmark (13%). SNA increases in drug-screening performance were consistent for classification and regression tasks and outperformed y-randomized controls. Our results highlight where data and feature uncertainty may be problematic and how leveraging uncertainty into training improves predictions of drug-target relationships.
Collapse
Affiliation(s)
- Elena L Cáceres
- Department of Pharmaceutical Chemistry, Department of Bioengineering and Therapeutic Sciences, Bakar Computational Health Sciences Institute, Kavli Institute for Fundamental Neuroscience, Institute for Neurodegenerative Diseases, University of California, San Francisco, 675 Nelson Rising Ln NS 416A, San Francisco, California 94143, United States
| | - Nicholas C Mew
- Department of Pharmaceutical Chemistry, Department of Bioengineering and Therapeutic Sciences, Bakar Computational Health Sciences Institute, Kavli Institute for Fundamental Neuroscience, Institute for Neurodegenerative Diseases, University of California, San Francisco, 675 Nelson Rising Ln NS 416A, San Francisco, California 94143, United States
| | - Michael J Keiser
- Department of Pharmaceutical Chemistry, Department of Bioengineering and Therapeutic Sciences, Bakar Computational Health Sciences Institute, Kavli Institute for Fundamental Neuroscience, Institute for Neurodegenerative Diseases, University of California, San Francisco, 675 Nelson Rising Ln NS 416A, San Francisco, California 94143, United States
| |
Collapse
|
17
|
Kochev N, Jeliazkova N, Paskaleva V, Tancheva G, Iliev L, Ritchie P, Jeliazkov V. Your Spreadsheets Can Be FAIR: A Tool and FAIRification Workflow for the eNanoMapper Database. NANOMATERIALS 2020; 10:nano10101908. [PMID: 32987901 PMCID: PMC7601422 DOI: 10.3390/nano10101908] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/29/2020] [Revised: 09/17/2020] [Accepted: 09/20/2020] [Indexed: 11/30/2022]
Abstract
The field of nanoinformatics is rapidly developing and provides data driven solutions in the area of nanomaterials (NM) safety. Safe by Design approaches are encouraged and promoted through regulatory initiatives and multiple scientific projects. Experimental data is at the core of nanoinformatics processing workflows for risk assessment. The nanosafety data is predominantly recorded in Excel spreadsheet files. Although the spreadsheets are quite convenient for the experimentalists, they also pose great challenges for the consequent processing into databases due to variability of the templates used, specific details provided by each laboratory and the need for proper metadata documentation and formatting. In this paper, we present a workflow to facilitate the conversion of spreadsheets into a FAIR (Findable, Accessible, Interoperable, and Reusable) database, with the pivotal aid of the NMDataParser tool, developed to streamline the mapping of the original file layout into the eNanoMapper semantic data model. The NMDataParser is an open source Java library and application, making use of a JSON configuration to define the mapping. We describe the JSON configuration syntax and the approaches applied for parsing different spreadsheet layouts used by the nanosafety community. Examples of using the NMDataParser tool in nanoinformatics workflows are given. Challenging cases are discussed and appropriate solutions are proposed.
Collapse
Affiliation(s)
- Nikolay Kochev
- Department of Analytical Chemistry and Computer Chemistry, Faculty of Chemistry, University of Plovdiv, 24 Tsar Assen St, 4000 Plovdiv, Bulgaria; (V.P.); (G.T.)
- Ideaconsult Ltd., 4 Angel Kanchev St, 1000 Sofia, Bulgaria; (L.I.); (V.J.)
- Correspondence: (N.K.); (N.J.)
| | - Nina Jeliazkova
- Ideaconsult Ltd., 4 Angel Kanchev St, 1000 Sofia, Bulgaria; (L.I.); (V.J.)
- Correspondence: (N.K.); (N.J.)
| | - Vesselina Paskaleva
- Department of Analytical Chemistry and Computer Chemistry, Faculty of Chemistry, University of Plovdiv, 24 Tsar Assen St, 4000 Plovdiv, Bulgaria; (V.P.); (G.T.)
| | - Gergana Tancheva
- Department of Analytical Chemistry and Computer Chemistry, Faculty of Chemistry, University of Plovdiv, 24 Tsar Assen St, 4000 Plovdiv, Bulgaria; (V.P.); (G.T.)
| | - Luchesar Iliev
- Ideaconsult Ltd., 4 Angel Kanchev St, 1000 Sofia, Bulgaria; (L.I.); (V.J.)
| | - Peter Ritchie
- Institute of Occupational Medicine, Research Avenue North, Riccarton, Edinburgh EH14 4AP, UK;
| | - Vedrin Jeliazkov
- Ideaconsult Ltd., 4 Angel Kanchev St, 1000 Sofia, Bulgaria; (L.I.); (V.J.)
| |
Collapse
|
18
|
Valsecchi C, Grisoni F, Motta S, Bonati L, Ballabio D. NURA: A curated dataset of nuclear receptor modulators. Toxicol Appl Pharmacol 2020; 407:115244. [PMID: 32961130 DOI: 10.1016/j.taap.2020.115244] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2020] [Revised: 08/27/2020] [Accepted: 09/14/2020] [Indexed: 01/10/2023]
Abstract
Nuclear receptors (NRs) are key regulators of human health and constitute a relevant target for medicinal chemistry applications as well as for toxicological risk assessment. Several open databases dedicated to small molecules that modulate NRs exist; however, depending on their final aim (i.e., adverse effect assessment or drug design), these databases contain a different amount and type of annotated molecules, along with a different distribution of experimental bioactivity values. Stemming from these considerations, in this work we aim to provide a unified dataset, NURA (NUclear Receptor Activity) dataset, collecting curated information on small molecules that modulate NRs, to be intended for both pharmacological and toxicological applications. NURA contains bioactivity annotations for 15,247 molecules and 11 selected NRs, and it was obtained by integrating and curating data from toxicological and pharmacological databases (i.e., Tox21, ChEMBL, NR-DBIND and BindingDB). Our results show that NURA dataset is a useful tool to bridge the gap between toxicology- and medicinal-chemistry-related databases, as it is enriched in terms of number of molecules, structural diversity and covered atomic scaffolds compared to the single sources. To the best of our knowledge, NURA dataset is the most exhaustive collection of small molecules annotated for their modulation of the chosen nuclear receptors. NURA dataset is intended to support decision-making in pharmacology and toxicology, as well as to contribute to data-driven applications, such as machine learning. The dataset and the data curation pipeline can be downloaded free of charge on Zenodo at the following DOI: https://doi.org/10.5281/zenodo.3991561.
Collapse
Affiliation(s)
- Cecile Valsecchi
- Department of Earth and Environmental Sciences, University of Milano-Bicocca, P.za della Scienza 1, 20126 Milano, Italy
| | - Francesca Grisoni
- ETH Zurich, Department of Chemistry and Applied Biosciences, RETHINK, Vladimir-Prelog-Weg 4, 8049 Zurich, Switzerland.
| | - Stefano Motta
- Department of Earth and Environmental Sciences, University of Milano-Bicocca, P.za della Scienza 1, 20126 Milano, Italy
| | - Laura Bonati
- Department of Earth and Environmental Sciences, University of Milano-Bicocca, P.za della Scienza 1, 20126 Milano, Italy
| | - Davide Ballabio
- Department of Earth and Environmental Sciences, University of Milano-Bicocca, P.za della Scienza 1, 20126 Milano, Italy
| |
Collapse
|
19
|
Minimum Information and Quality Standards for Conducting, Reporting, and Organizing In Vitro Research. Handb Exp Pharmacol 2020; 257:177-196. [PMID: 31628600 DOI: 10.1007/164_2019_284] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Insufficient description of experimental practices can contribute to difficulties in reproducing research findings. In response to this, "minimum information" guidelines have been developed for different disciplines. These standards help ensure that the complete experiment is described, including both experimental protocols and data processing methods, allowing a critical evaluation of the whole process and the potential recreation of the work. Selected examples of minimum information checklists with relevance for in vitro research are presented here and are collected by and registered at the MIBBI/FAIRsharing Information Resource portal.In addition, to support integrative research and to allow for comparisons and data sharing across studies, ontologies and vocabularies need to be defined and integrated across areas of in vitro research. As examples, this chapter addresses ontologies for cells and bioassays and discusses their importance for in vitro studies.Finally, specific quality requirements for important in vitro research tools (like chemical probes, antibodies, and cell lines) are suggested, and remaining issues are discussed.
Collapse
|
20
|
Vos RA, Katayama T, Mishima H, Kawano S, Kawashima S, Kim JD, Moriya Y, Tokimatsu T, Yamaguchi A, Yamamoto Y, Wu H, Amstutz P, Antezana E, Aoki NP, Arakawa K, Bolleman JT, Bolton E, Bonnal RJP, Bono H, Burger K, Chiba H, Cohen KB, Deutsch EW, Fernández-Breis JT, Fu G, Fujisawa T, Fukushima A, García A, Goto N, Groza T, Hercus C, Hoehndorf R, Itaya K, Juty N, Kawashima T, Kim JH, Kinjo AR, Kotera M, Kozaki K, Kumagai S, Kushida T, Lütteke T, Matsubara M, Miyamoto J, Mohsen A, Mori H, Naito Y, Nakazato T, Nguyen-Xuan J, Nishida K, Nishida N, Nishide H, Ogishima S, Ohta T, Okuda S, Paten B, Perret JL, Prathipati P, Prins P, Queralt-Rosinach N, Shinmachi D, Suzuki S, Tabata T, Takatsuki T, Taylor K, Thompson M, Uchiyama I, Vieira B, Wei CH, Wilkinson M, Yamada I, Yamanaka R, Yoshitake K, Yoshizawa AC, Dumontier M, Kosaki K, Takagi T. BioHackathon 2015: Semantics of data for life sciences and reproducible research. F1000Res 2020; 9:136. [PMID: 32308977 PMCID: PMC7141167 DOI: 10.12688/f1000research.18236.1] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 02/05/2020] [Indexed: 01/08/2023] Open
Abstract
We report on the activities of the 2015 edition of the BioHackathon, an annual event that brings together researchers and developers from around the world to develop tools and technologies that promote the reusability of biological data. We discuss issues surrounding the representation, publication, integration, mining and reuse of biological data and metadata across a wide range of biomedical data types of relevance for the life sciences, including chemistry, genotypes and phenotypes, orthology and phylogeny, proteomics, genomics, glycomics, and metabolomics. We describe our progress to address ongoing challenges to the reusability and reproducibility of research results, and identify outstanding issues that continue to impede the progress of bioinformatics research. We share our perspective on the state of the art, continued challenges, and goals for future research and development for the life sciences Semantic Web.
Collapse
Affiliation(s)
- Rutger A. Vos
- Institute of Biology Leiden, Leiden University, Leiden, The Netherlands
- Naturalis Biodiversity Center, Leiden, The Netherlands
| | | | - Hiroyuki Mishima
- Department of Human Genetics, Nagasaki University Graduate School of Biomedical Sciences, Nagasaki, Japan
| | - Shin Kawano
- Database Center for Life Science, Tokyo, Japan
| | | | | | - Yuki Moriya
- Database Center for Life Science, Tokyo, Japan
| | | | | | | | - Hongyan Wu
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
| | | | - Erick Antezana
- Department of Biology, Norwegian University of Science and Technology, Trondheim, Norway
| | - Nobuyuki P. Aoki
- Faculty of Science and Engineering, SOKA University, Tokyo, Japan
| | - Kazuharu Arakawa
- Institute for Advanced Biosciences, Keio University, Tokyo, Japan
| | - Jerven T. Bolleman
- SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire, Lausanne, Switzerland
| | - Evan Bolton
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, USA
| | - Raoul J. P. Bonnal
- Istituto Nazionale Genetica Molecolare, Romeo ed Enrica Invernizzi, Milan, Italy
| | | | - Kees Burger
- Dutch Techcentre for Life Sciences, Utrecht, The Netherlands
| | - Hirokazu Chiba
- National Institute for Basic Biology, National Institutes of Natural Sciences, Okazaki, Japan
| | - Kevin B. Cohen
- Computational Bioscience Program, University of Colorado School of Medicine, Denver, USA
- Université Paris-Saclay, LIMSI, CNRS, Paris, France
| | | | | | - Gang Fu
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, USA
| | | | | | | | - Naohisa Goto
- Research Institute for Microbial Diseases, Osaka University, Osaka, Japan
| | - Tudor Groza
- St Vincent's Clinical School, Faculty of Medicine, University of New South Wales, Darlinghurst, Australia
- Kinghorn Centre for Clinical Genomics, Garvan Institute of Medical Research, Darlinghurst, Australia
| | - Colin Hercus
- Novocraft Technologies Sdn. Bhd., Selangor, Malaysia
| | - Robert Hoehndorf
- Computational Bioscience Research Center, King Abdullah University of Science and Technology, Thuwal, Saudi Arabia
| | - Kotone Itaya
- Institute for Advanced Biosciences, Keio University, Tokyo, Japan
| | - Nick Juty
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK
| | | | - Jee-Hyub Kim
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK
| | - Akira R. Kinjo
- Institute for Protein Research, Osaka University, Osaka, Japan
| | - Masaaki Kotera
- School of Life Science and Technology, Tokyo Institute of Technology, Tokyo, Japan
| | - Kouji Kozaki
- The Institute of Scientific and Industrial Research, Osaka University, Osaka, Japan
| | | | - Tatsuya Kushida
- National Bioscience Database Center, Japan Science and Technology Agency, Tokyo, Japan
| | - Thomas Lütteke
- Institute of Veterinary Physiology and Biochemistry, Justus-Liebig University Giessen, Giessen, Germany
- Gesellschaft für innovative Personalwirtschaftssysteme mbH (GIP GmbH), Offenbach, Germany
| | | | | | - Attayeb Mohsen
- National Institutes of Biomedical Innovation, Health and Nutrition, Osaka, Japan
| | - Hiroshi Mori
- Center for Information Biology, National Institute of Genetics, Mishima, Japan
| | - Yuki Naito
- Database Center for Life Science, Tokyo, Japan
| | | | | | | | - Naoki Nishida
- Department of Systems Science, Osaka University, Osaka, Japan
| | - Hiroyo Nishide
- National Institute for Basic Biology, National Institutes of Natural Sciences, Okazaki, Japan
| | - Soichi Ogishima
- Tohoku Medical Megabank Organization, Tohoku University, Sendai, Japan
| | - Tazro Ohta
- Database Center for Life Science, Tokyo, Japan
| | - Shujiro Okuda
- Niigata University Graduate School of Medical and Dental Sciences, Niigata, Japan
| | - Benedict Paten
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, USA
| | | | - Philip Prathipati
- National Institutes of Biomedical Innovation, Health and Nutrition, Osaka, Japan
| | - Pjotr Prins
- University Medical Center Utrecht, Utrecht, The Netherlands
- University of Tennessee Health Science Center, Memphis, USA
| | - Núria Queralt-Rosinach
- Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts, USA
| | | | - Shinya Suzuki
- School of Life Science and Technology, Tokyo Institute of Technology, Tokyo, Japan
| | - Tsuyosi Tabata
- Graduate School of Pharmaceutical Sciences, Kyoto University, Kyoto, Japan
| | | | - Kieron Taylor
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK
| | - Mark Thompson
- Leiden University Medical Center, Leiden, The Netherlands
| | - Ikuo Uchiyama
- National Institute for Basic Biology, National Institutes of Natural Sciences, Okazaki, Japan
| | - Bruno Vieira
- WurmLab, School of Biological & Chemical Sciences, Queen Mary University of London, London, UK
| | - Chih-Hsuan Wei
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, USA
| | - Mark Wilkinson
- Escuela Técnica Superior de Ingeniería Agronómica, Alimentaria y de Biosistemas, Universidad Politécnica de Madrid, Madrid, Spain
| | | | | | - Kazutoshi Yoshitake
- Graduate School of Agricultural and Life Sciences, The University of Tokyo, Tokyo, Japan
| | | | - Michel Dumontier
- Institute of Data Science, Maastricht University, Maastricht, The Netherlands
| | - Kenjiro Kosaki
- Center for Medical Genetics, Keio University School of Medicine, Tokyo, Japan
| | - Toshihisa Takagi
- National Bioscience Database Center, Japan Science and Technology Agency, Tokyo, Japan
- Department of Biological Sciences, Graduate School of Science, The University of Tokyo, Tokyo, Japan
| |
Collapse
|
21
|
David L, Walsh J, Sturm N, Feierberg I, Nissink JWM, Chen H, Bajorath J, Engkvist O. Identification of Compounds That Interfere with High-Throughput Screening Assay Technologies. ChemMedChem 2019; 14:1795-1802. [PMID: 31479198 PMCID: PMC6856845 DOI: 10.1002/cmdc.201900395] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2019] [Revised: 08/21/2019] [Indexed: 01/23/2023]
Abstract
A significant challenge in high-throughput screening (HTS) campaigns is the identification of assay technology interference compounds. A Compound Interfering with an Assay Technology (CIAT) gives false readouts in many assays. CIATs are often considered viable hits and investigated in follow-up studies, thus impeding research and wasting resources. In this study, we developed a machine-learning (ML) model to predict CIATs for three assay technologies. The model was trained on known CIATs and non-CIATs (NCIATs) identified in artefact assays and described by their 2D structural descriptors. Usual methods identifying CIATs are based on statistical analysis of historical primary screening data and do not consider experimental assays identifying CIATs. Our results show successful prediction of CIATs for existing and novel compounds and provide a complementary and wider set of predicted CIATs compared to BSF, a published structure-independent model, and to the PAINS substructural filters. Our analysis is an example of how well-curated datasets can provide powerful predictive models despite their relatively small size.
Collapse
Affiliation(s)
- Laurianne David
- Hit Discovery, Discovery Sciences, R&D BioPharmaceuticalsAstraZeneca GoteborgPepparedsleden 1431 83MölndalSweden
- Department of Life Science Informatics, B-ITLIMES Program Unit Chemical Biology and Medicinal ChemistryRheinische Friedrich-Wilhelms-Universität BonnEndenicher Allee 19c53115BonnGermany
| | - Jarrod Walsh
- Hit Discovery, Discovery Sciences, R&D BioPharmaceuticalsAstraZeneca CambridgeAlderley ParkMacclesfieldSK10 4TGUK
| | - Noé Sturm
- Data Science and AI, Drug Safety & Metabolism, R&D BioPharmaceuticalsAstraZeneca GothenburgPepparedsleden 1431 83MölndalSweden
| | - Isabella Feierberg
- Hit Discovery, Discovery Sciences, R&D BioPharmaceuticalsAstraZeneca Boston35 Gatehouse DriveWalthamMA02451USA
| | - J. Willem M. Nissink
- Computational Chemistry, Oncology R&DAstraZenecaCambridge Science Park, Milton RoadCambridgeCB4 0WGUK
| | - Hongming Chen
- Hit Discovery, Discovery Sciences, R&D BioPharmaceuticalsAstraZeneca GoteborgPepparedsleden 1431 83MölndalSweden
| | - Jürgen Bajorath
- Department of Life Science Informatics, B-ITLIMES Program Unit Chemical Biology and Medicinal ChemistryRheinische Friedrich-Wilhelms-Universität BonnEndenicher Allee 19c53115BonnGermany
| | - Ola Engkvist
- Hit Discovery, Discovery Sciences, R&D BioPharmaceuticalsAstraZeneca GoteborgPepparedsleden 1431 83MölndalSweden
| |
Collapse
|
22
|
Tarasova OA, Biziukova NY, Filimonov DA, Poroikov VV, Nicklaus MC. Data Mining Approach for Extraction of Useful Information About Biologically Active Compounds from Publications. J Chem Inf Model 2019; 59:3635-3644. [PMID: 31453694 DOI: 10.1021/acs.jcim.9b00164] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
A lot of high quality data on the biological activity of chemical compounds are required throughout the whole drug discovery process: from development of computational models of the structure-activity relationship to experimental testing of lead compounds and their validation in clinics. Currently, a large amount of such data is available from databases, scientific publications, and patents. Biological data are characterized by incompleteness, uncertainty, and low reproducibility. Despite the existence of free and commercially available databases of biological activities of compounds, they usually lack unambiguous information about peculiarities of biological assays. On the other hand, scientific papers are the primary source of new data disclosed to the scientific community for the first time. In this study, we have developed and validated a data-mining approach for extraction of text fragments containing description of bioassays. We have used this approach to evaluate compounds and their biological activity reported in scientific publications. We have found that categorization of papers into relevant and irrelevant may be performed based on the machine-learning analysis of the abstracts. Text fragments extracted from the full texts of publications allow their further partitioning into several classes according to the peculiarities of bioassays. We demonstrate the applicability of our approach to the comparison of the endpoint values of biological activity and cytotoxicity of reference compounds.
Collapse
Affiliation(s)
- Olga A Tarasova
- Department of Bioinformatics , Institute of Biomedical Chemistry , 10 Building 8, Pogodinskaya Street , Moscow 119121 , Russia
| | - Nadezhda Yu Biziukova
- Department of Bioinformatics , Institute of Biomedical Chemistry , 10 Building 8, Pogodinskaya Street , Moscow 119121 , Russia
| | - Dmitry A Filimonov
- Department of Bioinformatics , Institute of Biomedical Chemistry , 10 Building 8, Pogodinskaya Street , Moscow 119121 , Russia
| | - Vladimir V Poroikov
- Department of Bioinformatics , Institute of Biomedical Chemistry , 10 Building 8, Pogodinskaya Street , Moscow 119121 , Russia
| | - Marc C Nicklaus
- Computer-Aided Drug Design Group, Chemical Biology Laboratory, Center for Cancer Research , National Cancer Institute , Frederick , Maryland 21702 , United States
| |
Collapse
|
23
|
Improving the Utility of the Tox21 Dataset by Deep Metadata Annotations and Constructing Reusable Benchmarked Chemical Reference Signatures. Molecules 2019; 24:molecules24081604. [PMID: 31018579 PMCID: PMC6515292 DOI: 10.3390/molecules24081604] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2019] [Revised: 04/16/2019] [Accepted: 04/19/2019] [Indexed: 02/03/2023] Open
Abstract
The Toxicology in the 21st Century (Tox21) project seeks to develop and test methods for high-throughput examination of the effect certain chemical compounds have on biological systems. Although primary and toxicity assay data were readily available for multiple reporter gene modified cell lines, extensive annotation and curation was required to improve these datasets with respect to how FAIR (Findable, Accessible, Interoperable, and Reusable) they are. In this study, we fully annotated the Tox21 published data with relevant and accepted controlled vocabularies. After removing unreliable data points, we aggregated the results and created three sets of signatures reflecting activity in the reporter gene assays, cytotoxicity, and selective reporter gene activity, respectively. We benchmarked these signatures using the chemical structures of the tested compounds and obtained generally high receiver operating characteristic (ROC) scores, suggesting good quality and utility of these signatures and the underlying data. We analyzed the results to identify promiscuous individual compounds and chemotypes for the three signature categories and interpreted the results to illustrate the utility and re-usability of the datasets. With this study, we aimed to demonstrate the importance of data standards in reporting screening results and high-quality annotations to enable re-use and interpretation of these data. To improve the data with respect to all FAIR criteria, all assay annotations, cleaned and aggregate datasets, and signatures were made available as standardized dataset packages (Aggregated Tox21 bioactivity data, 2019).
Collapse
|
24
|
Kanza S, Frey JG. A new wave of innovation in Semantic web tools for drug discovery. Expert Opin Drug Discov 2019; 14:433-444. [DOI: 10.1080/17460441.2019.1586880] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Affiliation(s)
- Samantha Kanza
- Department of Chemistry, Highfield Campus, University of Southampton, Southampton, UK
| | - Jeremy Graham Frey
- Department of Chemistry, Highfield Campus, University of Southampton, Southampton, UK
| |
Collapse
|
25
|
Küçük McGinty H, Visser U, Schürer S. How to Develop a Drug Target Ontology: KNowledge Acquisition and Representation Methodology (KNARM). Methods Mol Biol 2019; 1939:49-69. [PMID: 30848456 PMCID: PMC7257161 DOI: 10.1007/978-1-4939-9089-4_4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/06/2023]
Abstract
Technological advancements in many fields have led to huge increases in data production, including data volume, diversity, and the speed at which new data is becoming available. In accordance with this, there is a lack of conformity in the ways data is interpreted. This era of "big data" provides unprecedented opportunities for data-driven research and "big picture" models. However, in-depth analyses-making use of various data types and data sources and extracting knowledge-have become a more daunting task. This is especially the case in life sciences where simplification and flattening of diverse data types often lead to incorrect predictions. Effective applications of big data approaches in life sciences require better, knowledge-based, semantic models that are suitable as a framework for big data integration, while avoiding oversimplifications, such as reducing various biological data types to the gene level. A huge hurdle in developing such semantic knowledge models, or ontologies, is the knowledge acquisition bottleneck. Automated methods are still very limited, and significant human expertise is required. In this chapter, we describe a methodology to systematize this knowledge acquisition and representation challenge, termed KNowledge Acquisition and Representation Methodology (KNARM). We then describe application of the methodology while implementing the Drug Target Ontology (DTO). We aimed to create an approach, involving domain experts and knowledge engineers, to build useful, comprehensive, consistent ontologies that will enable big data approaches in the domain of drug discovery, without the currently common simplifications.
Collapse
Affiliation(s)
- Hande Küçük McGinty
- Department of Computer Science, University of Miami, Coral Gables, FL, USA
- Collaborative Drug Discovery, Inc., Burlingame, CA, USA
| | - Ubbo Visser
- Department of Computer Science, University of Miami, Coral Gables, FL, USA
| | - Stephan Schürer
- Department of Molecular and Cellular Pharmacology, Miller School of Medicine, University of Miami, Miami, FL, USA.
- Center for Computational Science, University of Miami, Coral Gables, FL, USA.
| |
Collapse
|
26
|
Zaritsky A. Sharing and reusing cell image data. Mol Biol Cell 2018; 29:1274-1280. [PMID: 29851565 PMCID: PMC5994892 DOI: 10.1091/mbc.e17-10-0606] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2018] [Revised: 04/02/2018] [Accepted: 04/06/2018] [Indexed: 01/19/2023] Open
Abstract
The rapid growth in content and complexity of cell image data creates an opportunity for synergy between experimental and computational scientists. Sharing microscopy data enables computational scientists to develop algorithms and tools for data analysis, integration, and mining. These tools can be applied by experimentalists to promote hypothesis-generation and discovery. We are now at the dawn of this revolution: infrastructure is being developed for data standardization, deposition, sharing, and analysis; some journals and funding agencies mandate data deposition; data journals publish high-content microscopy data sets; quantification becomes standard in scientific publications; new analytic tools are being developed and dispatched to the community; and huge data sets are being generated by individual labs and philanthropic initiatives. In this Perspective, I reflect on sharing and reusing cell image data and the opportunities that will come along with it.
Collapse
|
27
|
Chen H, Bauer U, Engkvist O. Merged Multiple Ligands. METHODS AND PRINCIPLES IN MEDICINAL CHEMISTRY 2017. [DOI: 10.1002/9783527674381.ch9] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/30/2023]
Affiliation(s)
- Hongming Chen
- Discovery Sciences, Innovative Medicines and Early Development; AstraZeneca; Pepparedsleden 1 431 83 Mölndal Sweden
| | - Udo Bauer
- Cardiovascular and Metabolic Diseases, Innovative Medicines and Early Development; AstraZeneca; Pepparedsleden 1 431 83 Mölndal Sweden
| | - Ola Engkvist
- Discovery Sciences, Innovative Medicines and Early Development; AstraZeneca; Pepparedsleden 1 431 83 Mölndal Sweden
| |
Collapse
|
28
|
Bolgár B, Antal P. VB-MK-LMF: fusion of drugs, targets and interactions using variational Bayesian multiple kernel logistic matrix factorization. BMC Bioinformatics 2017; 18:440. [PMID: 28978313 PMCID: PMC5628496 DOI: 10.1186/s12859-017-1845-z] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2017] [Accepted: 09/21/2017] [Indexed: 12/20/2022] Open
Abstract
BACKGROUND Computational fusion approaches to drug-target interaction (DTI) prediction, capable of utilizing multiple sources of background knowledge, were reported to achieve superior predictive performance in multiple studies. Other studies showed that specificities of the DTI task, such as weighting the observations and focusing the side information are also vital for reaching top performance. METHOD We present Variational Bayesian Multiple Kernel Logistic Matrix Factorization (VB-MK-LMF), which unifies the advantages of (1) multiple kernel learning, (2) weighted observations, (3) graph Laplacian regularization, and (4) explicit modeling of probabilities of binary drug-target interactions. RESULTS VB-MK-LMF achieves significantly better predictive performance in standard benchmarks compared to state-of-the-art methods, which can be traced back to multiple factors. The systematic evaluation of the effect of multiple kernels confirm their benefits, but also highlights the limitations of linear kernel combinations, already recognized in other fields. The analysis of the effect of prior kernels using varying sample sizes sheds light on the balance of data and knowledge in DTI tasks and on the rate at which the effect of priors vanishes. This also shows the existence of "small sample size" regions where using side information offers significant gains. Alongside favorable predictive performance, a notable property of MF methods is that they provide a unified space for drugs and targets using latent representations. Compared to earlier studies, the dimensionality of this space proved to be surprisingly low, which makes the latent representations constructed by VB-ML-LMF especially well-suited for visual analytics. The probabilistic nature of the predictions allows the calculation of the expected values of hits in functionally relevant sets, which we demonstrate by predicting drug promiscuity. The variational Bayesian approximation is also implemented for general purpose graphics processing units yielding significantly improved computational time. CONCLUSION In standard benchmarks, VB-MK-LMF shows significantly improved predictive performance in a wide range of settings. Beyond these benchmarks, another contribution of our work is highlighting and providing estimates for further pharmaceutically relevant quantities, such as promiscuity, druggability and total number of interactions.
Collapse
Affiliation(s)
- Bence Bolgár
- Department of Measurement and Information Systems, Budapest University of Technology and Economics, Magyar tudósok krt. 2., Budapest, 1117 Hungary
| | - Péter Antal
- Department of Measurement and Information Systems, Budapest University of Technology and Economics, Magyar tudósok krt. 2., Budapest, 1117 Hungary
| |
Collapse
|
29
|
Perspectives from the NanoSafety Modelling Cluster on the validation criteria for (Q)SAR models used in nanotechnology. Food Chem Toxicol 2017; 112:478-494. [PMID: 28943385 DOI: 10.1016/j.fct.2017.09.037] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2016] [Revised: 08/31/2017] [Accepted: 09/19/2017] [Indexed: 11/20/2022]
Abstract
Nanotechnology and the production of nanomaterials have been expanding rapidly in recent years. Since many types of engineered nanoparticles are suspected to be toxic to living organisms and to have a negative impact on the environment, the process of designing new nanoparticles and their applications must be accompanied by a thorough risk analysis. (Quantitative) Structure-Activity Relationship ([Q]SAR) modelling creates promising options among the available methods for the risk assessment. These in silico models can be used to predict a variety of properties, including the toxicity of newly designed nanoparticles. However, (Q)SAR models must be appropriately validated to ensure the clarity, consistency and reliability of predictions. This paper is a joint initiative from recently completed European research projects focused on developing (Q)SAR methodology for nanomaterials. The aim was to interpret and expand the guidance for the well-known "OECD Principles for the Validation, for Regulatory Purposes, of (Q)SAR Models", with reference to nano-(Q)SAR, and present our opinions on the criteria to be fulfilled for models developed for nanoparticles.
Collapse
|
30
|
Cruz-Monteagudo M, Schürer S, Tejera E, Pérez-Castillo Y, Medina-Franco JL, Sánchez-Rodríguez A, Borges F. Systemic QSAR and phenotypic virtual screening: chasing butterflies in drug discovery. Drug Discov Today 2017; 22:994-1007. [PMID: 28274840 PMCID: PMC5487293 DOI: 10.1016/j.drudis.2017.02.004] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2016] [Revised: 02/02/2017] [Accepted: 02/27/2017] [Indexed: 12/20/2022]
Abstract
Current advances in systems biology suggest a new change of paradigm reinforcing the holistic nature of the drug discovery process. According to the principles of systems biology, a simple drug perturbing a network of targets can trigger complex reactions. Therefore, it is possible to connect initial events with final outcomes and consequently prioritize those events, leading to a desired effect. Here, we introduce a new concept, 'Systemic Chemogenomics/Quantitative Structure-Activity Relationship (QSAR)'. To elaborate on the concept, relevant information surrounding it is addressed. The concept is challenged by implementing a systemic QSAR approach for phenotypic virtual screening (VS) of candidate ligands acting as neuroprotective agents in Parkinson's disease (PD). The results support the suitability of the approach for the phenotypic prioritization of drug candidates.
Collapse
Affiliation(s)
- Maykel Cruz-Monteagudo
- CIQUP/Departamento de Química e Bioquímica, Faculdade de Ciências, Universidade do Porto, Porto 4169-007, Portugal.
| | - Stephan Schürer
- Department of Pharmacology, Miller School of Medicine and Center for Computational Science, University of Miami, Miami, FL 33136, USA
| | - Eduardo Tejera
- Instituto de Investigaciones Biomédicas (IIB), Universidad de Las Américas, 170513 Quito, Ecuador
| | - Yunierkis Pérez-Castillo
- Sección Físico Química y Matemáticas, Departamento de Química, Universidad Técnica Particular de Loja, San Cayetano Alto S/N, EC1101608 Loja, Ecuador
| | - José L Medina-Franco
- Universidad Nacional Autónoma de México, Departamento de Farmacia, Facultad de Química, Avenida Universidad 3000, Mexico City, 04510, Mexico
| | - Aminael Sánchez-Rodríguez
- Departamento de Ciencias Naturales, Universidad Técnica Particular de Loja, Calle París S/N, EC1101608 Loja, Ecuador
| | - Fernanda Borges
- CIQUP/Departamento de Química e Bioquímica, Faculdade de Ciências, Universidade do Porto, Porto 4169-007, Portugal.
| |
Collapse
|
31
|
Eftimov T, Koroušić Seljak B, Korošec P. A rule-based named-entity recognition method for knowledge extraction of evidence-based dietary recommendations. PLoS One 2017. [PMID: 28644863 PMCID: PMC5482438 DOI: 10.1371/journal.pone.0179488] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
Evidence-based dietary information represented as unstructured text is a crucial information that needs to be accessed in order to help dietitians follow the new knowledge arrives daily with newly published scientific reports. Different named-entity recognition (NER) methods have been introduced previously to extract useful information from the biomedical literature. They are focused on, for example extracting gene mentions, proteins mentions, relationships between genes and proteins, chemical concepts and relationships between drugs and diseases. In this paper, we present a novel NER method, called drNER, for knowledge extraction of evidence-based dietary information. To the best of our knowledge this is the first attempt at extracting dietary concepts. DrNER is a rule-based NER that consists of two phases. The first one involves the detection and determination of the entities mention, and the second one involves the selection and extraction of the entities. We evaluate the method by using text corpora from heterogeneous sources, including text from several scientifically validated web sites and text from scientific publications. Evaluation of the method showed that drNER gives good results and can be used for knowledge extraction of evidence-based dietary recommendations.
Collapse
Affiliation(s)
- Tome Eftimov
- Computer Systems Department, Jožef Stefan Institute, Ljubljana, Slovenia
- Jožef Stefan International Postgraduate School, Ljubljana, Slovenia
- * E-mail:
| | | | - Peter Korošec
- Computer Systems Department, Jožef Stefan Institute, Ljubljana, Slovenia
- Faculty of Mathematics, Natural Science and Information Technologies, Koper, Slovenia
| |
Collapse
|
32
|
Krallinger M, Rabal O, Lourenço A, Oyarzabal J, Valencia A. Information Retrieval and Text Mining Technologies for Chemistry. Chem Rev 2017; 117:7673-7761. [PMID: 28475312 DOI: 10.1021/acs.chemrev.6b00851] [Citation(s) in RCA: 111] [Impact Index Per Article: 15.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
Abstract
Efficient access to chemical information contained in scientific literature, patents, technical reports, or the web is a pressing need shared by researchers and patent attorneys from different chemical disciplines. Retrieval of important chemical information in most cases starts with finding relevant documents for a particular chemical compound or family. Targeted retrieval of chemical documents is closely connected to the automatic recognition of chemical entities in the text, which commonly involves the extraction of the entire list of chemicals mentioned in a document, including any associated information. In this Review, we provide a comprehensive and in-depth description of fundamental concepts, technical implementations, and current technologies for meeting these information demands. A strong focus is placed on community challenges addressing systems performance, more particularly CHEMDNER and CHEMDNER patents tasks of BioCreative IV and V, respectively. Considering the growing interest in the construction of automatically annotated chemical knowledge bases that integrate chemical information and biological data, cheminformatics approaches for mapping the extracted chemical names into chemical structures and their subsequent annotation together with text mining applications for linking chemistry with biological information are also presented. Finally, future trends and current challenges are highlighted as a roadmap proposal for research in this emerging field.
Collapse
Affiliation(s)
- Martin Krallinger
- Structural Computational Biology Group, Structural Biology and BioComputing Programme, Spanish National Cancer Research Centre , C/Melchor Fernández Almagro 3, Madrid E-28029, Spain
| | - Obdulia Rabal
- Small Molecule Discovery Platform, Molecular Therapeutics Program, Center for Applied Medical Research (CIMA), University of Navarra , Avenida Pio XII 55, Pamplona E-31008, Spain
| | - Anália Lourenço
- ESEI - Department of Computer Science, University of Vigo , Edificio Politécnico, Campus Universitario As Lagoas s/n, Ourense E-32004, Spain.,Centro de Investigaciones Biomédicas (Centro Singular de Investigación de Galicia) , Campus Universitario Lagoas-Marcosende, Vigo E-36310, Spain.,CEB-Centre of Biological Engineering, University of Minho , Campus de Gualtar, Braga 4710-057, Portugal
| | - Julen Oyarzabal
- Small Molecule Discovery Platform, Molecular Therapeutics Program, Center for Applied Medical Research (CIMA), University of Navarra , Avenida Pio XII 55, Pamplona E-31008, Spain
| | - Alfonso Valencia
- Life Science Department, Barcelona Supercomputing Centre (BSC-CNS) , C/Jordi Girona, 29-31, Barcelona E-08034, Spain.,Joint BSC-IRB-CRG Program in Computational Biology, Parc Científic de Barcelona , C/ Baldiri Reixac 10, Barcelona E-08028, Spain.,Institució Catalana de Recerca i Estudis Avançats (ICREA) , Passeig de Lluís Companys 23, Barcelona E-08010, Spain
| |
Collapse
|
33
|
Backman TWH, Evans DS, Girke T. Large-scale bioactivity analysis of the small-molecule assayed proteome. PLoS One 2017; 12:e0171413. [PMID: 28178331 PMCID: PMC5298297 DOI: 10.1371/journal.pone.0171413] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2016] [Accepted: 01/20/2017] [Indexed: 12/12/2022] Open
Abstract
This study presents an analysis of the small molecule bioactivity profiles across large quantities of diverse protein families represented in PubChem BioAssay. We compared the bioactivity profiles of FDA approved drugs to non-FDA approved compounds, and report several distinct patterns characteristic of the approved drugs. We found that a large fraction of the previously reported higher target promiscuity among FDA approved compounds, compared to non-FDA approved bioactives, was frequently due to cross-reactivity within rather than across protein families. We identified 804 potentially novel protein target candidates for FDA approved drugs, as well as 901 potentially novel target candidates with active non-FDA approved compounds, but no FDA approved drugs with activity against these targets. We also identified 486348 potentially novel compounds active against the same targets as FDA approved drugs, as well as 153402 potentially novel compounds active against targets without active FDA approved drugs. By quantifying the agreement among replicated screens, we estimated that more than half of these novel outcomes are reproducible. Using biclustering, we identified many dense clusters of FDA approved drugs with enriched activity against a common set of protein targets. We also report the distribution of compound promiscuity using a Bayesian statistical model, and report the sensitivity and specificity of two common methods for identifying promiscuous compounds. Aggregator assays exhibited greater accuracy in identifying highly promiscuous compounds, while PAINS substructures were able to identify a much larger set of "middle range" promiscuous compounds. Additionally, we report a large number of promiscuous compounds not identified as aggregators or PAINS. In summary, the results of this study represent a rich reference for selecting novel drug and target protein candidates, as well as for eliminating candidate compounds with unselective activities.
Collapse
Affiliation(s)
- Tyler William H. Backman
- Department of Bioengineering, University of California Riverside, Riverside, California, United States of America
- Institute for Integrative Genome Biology, University of California Riverside, Riverside, California, United States of America
| | - Daniel S. Evans
- California Pacific Medical Center Research Institute, San Francisco, California, United States of America
| | - Thomas Girke
- Institute for Integrative Genome Biology, University of California Riverside, Riverside, California, United States of America
- * E-mail:
| |
Collapse
|
34
|
Wang Y, Bryant SH, Cheng T, Wang J, Gindulyte A, Shoemaker BA, Thiessen PA, He S, Zhang J. PubChem BioAssay: 2017 update. Nucleic Acids Res 2016; 45:D955-D963. [PMID: 27899599 PMCID: PMC5210581 DOI: 10.1093/nar/gkw1118] [Citation(s) in RCA: 326] [Impact Index Per Article: 40.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2016] [Revised: 10/26/2016] [Accepted: 11/09/2016] [Indexed: 12/19/2022] Open
Abstract
PubChem's BioAssay database (https://pubchem.ncbi.nlm.nih.gov) has served as a public repository for small-molecule and RNAi screening data since 2004 providing open access of its data content to the community. PubChem accepts data submission from worldwide researchers at academia, industry and government agencies. PubChem also collaborates with other chemical biology database stakeholders with data exchange. With over a decade's development effort, it becomes an important information resource supporting drug discovery and chemical biology research. To facilitate data discovery, PubChem is integrated with all other databases at NCBI. In this work, we provide an update for the PubChem BioAssay database describing several recent development including added sources of research data, redesigned BioAssay record page, new BioAssay classification browser and new features in the Upload system facilitating data sharing.
Collapse
Affiliation(s)
- Yanli Wang
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - Stephen H Bryant
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - Tiejun Cheng
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - Jiyao Wang
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - Asta Gindulyte
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - Benjamin A Shoemaker
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - Paul A Thiessen
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - Siqian He
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - Jian Zhang
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| |
Collapse
|
35
|
Gaulton A, Hersey A, Nowotka M, Bento AP, Chambers J, Mendez D, Mutowo P, Atkinson F, Bellis LJ, Cibrián-Uhalte E, Davies M, Dedman N, Karlsson A, Magariños MP, Overington JP, Papadatos G, Smit I, Leach AR. The ChEMBL database in 2017. Nucleic Acids Res 2016; 45:D945-D954. [PMID: 27899562 PMCID: PMC5210557 DOI: 10.1093/nar/gkw1074] [Citation(s) in RCA: 1356] [Impact Index Per Article: 169.5] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2016] [Revised: 10/21/2016] [Accepted: 10/30/2016] [Indexed: 11/14/2022] Open
Abstract
ChEMBL is an open large-scale bioactivity database (https://www.ebi.ac.uk/chembl), previously described in the 2012 and 2014 Nucleic Acids Research Database Issues. Since then, alongside the continued extraction of data from the medicinal chemistry literature, new sources of bioactivity data have also been added to the database. These include: deposited data sets from neglected disease screening; crop protection data; drug metabolism and disposition data and bioactivity data from patents. A number of improvements and new features have also been incorporated. These include the annotation of assays and targets using ontologies, the inclusion of targets and indications for clinical candidates, addition of metabolic pathways for drugs and calculation of structural alerts. The ChEMBL data can be accessed via a web-interface, RDF distribution, data downloads and RESTful web-services.
Collapse
Affiliation(s)
- Anna Gaulton
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Anne Hersey
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Michał Nowotka
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - A Patrícia Bento
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK.,Open Targets, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Jon Chambers
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - David Mendez
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Prudence Mutowo
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Francis Atkinson
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Louisa J Bellis
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Elena Cibrián-Uhalte
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Mark Davies
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Nathan Dedman
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Anneli Karlsson
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - María Paula Magariños
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK.,Open Targets, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - John P Overington
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - George Papadatos
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Ines Smit
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Andrew R Leach
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| |
Collapse
|
36
|
Burns GAPC, Dasigi P, de Waard A, Hovy EH. Automated detection of discourse segment and experimental types from the text of cancer pathway results sections. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2016; 2016:baw122. [PMID: 27580922 PMCID: PMC5006090 DOI: 10.1093/database/baw122] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/06/2015] [Accepted: 08/04/2016] [Indexed: 12/20/2022]
Abstract
Automated machine-reading biocuration systems typically use sentence-by-sentence information extraction to construct meaning representations for use by curators. This does not directly reflect the typical discourse structure used by scientists to construct an argument from the experimental data available within a article, and is therefore less likely to correspond to representations typically used in biomedical informatics systems (let alone to the mental models that scientists have). In this study, we develop Natural Language Processing methods to locate, extract, and classify the individual passages of text from articles’ Results sections that refer to experimental data. In our domain of interest (molecular biology studies of cancer signal transduction pathways), individual articles may contain as many as 30 small-scale individual experiments describing a variety of findings, upon which authors base their overall research conclusions. Our system automatically classifies discourse segments in these texts into seven categories (fact, hypothesis, problem, goal, method, result, implication) with an F-score of 0.68. These segments describe the essential building blocks of scientific discourse to (i) provide context for each experiment, (ii) report experimental details and (iii) explain the data’s meaning in context. We evaluate our system on text passages from articles that were curated in molecular biology databases (the Pathway Logic Datum repository, the Molecular Interaction MINT and INTACT databases) linking individual experiments in articles to the type of assay used (coprecipitation, phosphorylation, translocation etc.). We use supervised machine learning techniques on text passages containing unambiguous references to experiments to obtain baseline F1 scores of 0.59 for MINT, 0.71 for INTACT and 0.63 for Pathway Logic. Although preliminary, these results support the notion that targeting information extraction methods to experimental results could provide accurate, automated methods for biocuration. We also suggest the need for finer-grained curation of experimental methods used when constructing molecular biology databases
Collapse
Affiliation(s)
- Gully A P C Burns
- Information Sciences Institute, Viterbi School of Engineering, University of Southern California, Marina del Rey, CA 90292, USA
| | - Pradeep Dasigi
- Carnegie Mellon University, Language Technologies Institute, 5000 Forbes Avenue, Pittsburgh, PA 15213, USA
| | | | - Eduard H Hovy
- Carnegie Mellon University, Language Technologies Institute, 5000 Forbes Avenue, Pittsburgh, PA 15213, USA
| |
Collapse
|
37
|
Read WJ, Demetriou G, Nenadic G, Ruddock N, Stevens R, Winter J. The BioHub Knowledge Base: Ontology and Repository for Sustainable Biosourcing. J Biomed Semantics 2016; 7:30. [PMID: 27246819 PMCID: PMC4888476 DOI: 10.1186/s13326-016-0071-3] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2015] [Accepted: 05/03/2016] [Indexed: 11/29/2022] Open
Abstract
Background The motivation for the BioHub project is to create an Integrated Knowledge Management System (IKMS) that will enable chemists to source ingredients from bio-renewables, rather than from non-sustainable sources such as fossil oil and its derivatives. Method The BioHubKB is the data repository of the IKMS; it employs Semantic Web technologies, especially OWL, to host data about chemical transformations, bio-renewable feedstocks, co-product streams and their chemical components. Access to this knowledge base is provided to other modules within the IKMS through a set of RESTful web services, driven by SPARQL queries to a Sesame back-end. The BioHubKB re-uses several bio-ontologies and bespoke extensions, primarily for chemical feedstocks and products, to form its knowledge organisation schema. Results Parts of plants form feedstocks, while various processes generate co-product streams that contain certain chemicals. Both chemicals and transformations are associated with certain qualities, which the BioHubKB also attempts to capture. Of immediate commercial and industrial importance is to estimate the cost of particular sets of chemical transformations (leading to candidate surfactants) performed in sequence, and these costs too are captured. Data are sourced from companies’ internal knowledge and document stores, and from the publicly available literature. Both text analytics and manual curation play their part in populating the ontology. We describe the prototype IKMS, the BioHubKB and the services that it supports for the IKMS. Availability The BioHubKB can be found via http://biohub.cs.manchester.ac.uk/ontology/biohub-kb.owl.
Collapse
Affiliation(s)
- Warren J Read
- Unilever Research Port Sunlight, Bebington, Wirral, CH62 4ZD, UK
| | - George Demetriou
- School of Computer Science, University of Manchester, Oxford Road, M13 9PL Manchester, UK
| | - Goran Nenadic
- Manchester Institute of Biotechnology, Princess St, Manchester M1 7DN, UK
| | - Noel Ruddock
- Unilever Research Port Sunlight, Bebington, Wirral, CH62 4ZD, UK
| | - Robert Stevens
- School of Computer Science, University of Manchester, Oxford Road, M13 9PL Manchester, UK.
| | - Jerry Winter
- Unilever Research Port Sunlight, Bebington, Wirral, CH62 4ZD, UK
| |
Collapse
|
38
|
Callahan A, Abeyruwan SW, Al-Ali H, Sakurai K, Ferguson AR, Popovich PG, Shah NH, Visser U, Bixby JL, Lemmon VP. RegenBase: a knowledge base of spinal cord injury biology for translational research. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2016; 2016:baw040. [PMID: 27055827 PMCID: PMC4823819 DOI: 10.1093/database/baw040] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/03/2015] [Accepted: 03/03/2016] [Indexed: 12/20/2022]
Abstract
Spinal cord injury (SCI) research is a data-rich field that aims to identify the biological mechanisms resulting in loss of function and mobility after SCI, as well as develop therapies that promote recovery after injury. SCI experimental methods, data and domain knowledge are locked in the largely unstructured text of scientific publications, making large scale integration with existing bioinformatics resources and subsequent analysis infeasible. The lack of standard reporting for experiment variables and results also makes experiment replicability a significant challenge. To address these challenges, we have developed RegenBase, a knowledge base of SCI biology. RegenBase integrates curated literature-sourced facts and experimental details, raw assay data profiling the effect of compounds on enzyme activity and cell growth, and structured SCI domain knowledge in the form of the first ontology for SCI, using Semantic Web representation languages and frameworks. RegenBase uses consistent identifier schemes and data representations that enable automated linking among RegenBase statements and also to other biological databases and electronic resources. By querying RegenBase, we have identified novel biological hypotheses linking the effects of perturbagens to observed behavioral outcomes after SCI. RegenBase is publicly available for browsing, querying and download. Database URL:http://regenbase.org
Collapse
Affiliation(s)
- Alison Callahan
- Stanford Center for Biomedical Informatics Research, Stanford University, Stanford, CA 94305
| | | | - Hassan Al-Ali
- Miami Project to Cure Paralysis, University of Miami School of Medicine, Miami, FL 33136
| | - Kunie Sakurai
- Miami Project to Cure Paralysis, University of Miami School of Medicine, Miami, FL 33136
| | - Adam R Ferguson
- Brain and Spinal Injury Center (BASIC), Department of Neurological Surgery, University of California, San Francisco; San Francisco Veterans Affairs Medical Center, San Francisco, CA 94143
| | - Phillip G Popovich
- Center for Brain and Spinal Cord Repair and the Department of Neuroscience, The Ohio State University, Columbus, OH 43210
| | - Nigam H Shah
- Stanford Center for Biomedical Informatics Research, Stanford University, Stanford, CA 94305
| | - Ubbo Visser
- Department of Computer Science, University of Miami, Coral Gables, FL 33146
| | - John L Bixby
- Miami Project to Cure Paralysis, University of Miami School of Medicine, Miami, FL 33136 Center for Computational Science, University of Miami, Coral Gables, FL 33146 Department of Cellular and Molecular Pharmacology, University of Miami School of Medicine, Miami, FL 33136, USA
| | - Vance P Lemmon
- Miami Project to Cure Paralysis, University of Miami School of Medicine, Miami, FL 33136 Center for Computational Science, University of Miami, Coral Gables, FL 33146
| |
Collapse
|
39
|
Fang Y. Compound annotation with real time cellular activity profiles to improve drug discovery. Expert Opin Drug Discov 2016; 11:269-80. [PMID: 26787137 DOI: 10.1517/17460441.2016.1143460] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023]
Abstract
INTRODUCTION In the past decade, a range of innovative strategies have been developed to improve the productivity of pharmaceutical research and development. In particular, compound annotation, combined with informatics, has provided unprecedented opportunities for drug discovery. AREAS COVERED In this review, a literature search from 2000 to 2015 was conducted to provide an overview of the compound annotation approaches currently used in drug discovery. Based on this, a framework related to a compound annotation approach using real-time cellular activity profiles for probe, drug, and biology discovery is proposed. EXPERT OPINION Compound annotation with chemical structure, drug-like properties, bioactivities, genome-wide effects, clinical phenotypes, and textural abstracts has received significant attention in early drug discovery. However, these annotations are mostly associated with endpoint results. Advances in assay techniques have made it possible to obtain real-time cellular activity profiles of drug molecules under different phenotypes, so it is possible to generate compound annotation with real-time cellular activity profiles. Combining compound annotation with informatics, such as similarity analysis, presents a good opportunity to improve the rate of discovery of novel drugs and probes, and enhance our understanding of the underlying biology.
Collapse
Affiliation(s)
- Ye Fang
- a Biochemical Technologies, Science and Technology Division , Corning Incorporated , Corning , NY , USA
| |
Collapse
|
40
|
Affiliation(s)
- O. Joseph Trask
- Cellular Imaging Core, The Hamner Institutes for Health Sciences, Research Triangle Park, North Carolina
| | - Paul A. Johnston
- Department of Pharmaceutical Sciences, School of Pharmacy, University of Pittsburgh, Pittsburgh, Pennsylvania
| |
Collapse
|
41
|
Abstract
The emergence of a number of publicly available bioactivity databases, such as ChEMBL, PubChem BioAssay and BindingDB, has raised awareness about the topics of data curation, quality and integrity. Here we provide an overview and discussion of the current and future approaches to activity, assay and target data curation of the ChEMBL database. This curation process involves several manual and automated steps and aims to: (1) maximise data accessibility and comparability; (2) improve data integrity and flag outliers, ambiguities and potential errors; and (3) add further curated annotations and mappings thus increasing the usefulness and accuracy of the ChEMBL data for all users and modellers in particular. Issues related to activity, assay and target data curation and integrity along with their potential impact for users of the data are discussed, alongside robust selection and filter strategies in order to avoid or minimise these, depending on the desired application.
Collapse
|
42
|
Bolton E. Reporting biological assay screening results for maximum impact. DRUG DISCOVERY TODAY. TECHNOLOGIES 2015; 14:31-6. [PMID: 26194585 DOI: 10.1016/j.ddtec.2015.03.004] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/15/2015] [Revised: 03/18/2015] [Accepted: 03/29/2015] [Indexed: 11/19/2022]
Abstract
A very large corpus of biological assay screening results exist in the public domain. The ability to compare and analyze this data is hampered due to missing details and lack of a commonly used terminology to describe assay protocols and assay endpoints. Minimum reporting guidelines exist that, if followed, would greatly enhance the utility of biological assay screening data so it may be independently reproduced, readily integrated, effectively compared, and rapidly analyzed.
Collapse
Affiliation(s)
- Evan Bolton
- National Center for Biotechnology Information, Bldg. 38A/8S810, National Library of Medicine, U.S. National Institutes of Health, 8600 Rockville Pike, Bethesda, MD 20894, USA.
| |
Collapse
|
43
|
Abstract
ChEMBL is a large-scale drug discovery database containing bioactivity information primarily extracted from scientific literature. Due to the medicinal chemistry focus of the journals from which data are extracted, the data are currently of most direct value in the field of human health research. However, many of the scientific use-cases for the current data set are equally applicable in other fields, such as crop protection research: for example, identification of chemical scaffolds active against a particular target or endpoint, the de-convolution of the potential targets of a phenotypic assay, or the potential targets/pathways for safety liabilities. In order to broaden the applicability of the ChEMBL database and allow more widespread use in crop protection research, an extensive data set of bioactivity data of insecticidal, fungicidal and herbicidal compounds and assays was collated and added to the database.
Collapse
|
44
|
Hastings J, Jeliazkova N, Owen G, Tsiliki G, Munteanu CR, Steinbeck C, Willighagen E. eNanoMapper: harnessing ontologies to enable data integration for nanomaterial risk assessment. J Biomed Semantics 2015; 6:10. [PMID: 25815161 PMCID: PMC4374589 DOI: 10.1186/s13326-015-0005-5] [Citation(s) in RCA: 44] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2014] [Accepted: 02/27/2015] [Indexed: 11/18/2022] Open
Abstract
Engineered nanomaterials (ENMs) are being developed to meet specific application needs in diverse domains across the engineering and biomedical sciences (e.g. drug delivery). However, accompanying the exciting proliferation of novel nanomaterials is a challenging race to understand and predict their possibly detrimental effects on human health and the environment. The eNanoMapper project (www.enanomapper.net) is creating a pan-European computational infrastructure for toxicological data management for ENMs, based on semantic web standards and ontologies. Here, we describe the development of the eNanoMapper ontology based on adopting and extending existing ontologies of relevance for the nanosafety domain. The resulting eNanoMapper ontology is available at http://purl.enanomapper.net/onto/enanomapper.owl. We aim to make the re-use of external ontology content seamless and thus we have developed a library to automate the extraction of subsets of ontology content and the assembly of the subsets into an integrated whole. The library is available (open source) at http://github.com/enanomapper/slimmer/. Finally, we give a comprehensive survey of the domain content and identify gap areas. ENM safety is at the boundary between engineering and the life sciences, and at the boundary between molecular granularity and bulk granularity. This creates challenges for the definition of key entities in the domain, which we also discuss.
Collapse
Affiliation(s)
- Janna Hastings
- European Molecular Biology Laboratory - European Bioinformatics Institute (EMBL-EBI), Cambridge, United Kingdom
| | | | - Gareth Owen
- European Molecular Biology Laboratory - European Bioinformatics Institute (EMBL-EBI), Cambridge, United Kingdom
| | - Georgia Tsiliki
- National Technical University of Athens (NTUA), Athens, Greece
| | - Cristian R Munteanu
- Computer Science Faculty, University of A Coruña, A Coruña, Spain ; Department of Bioinformatics - BiGCaT, NUTRIM, Maastricht University, Maastricht, Netherlands
| | - Christoph Steinbeck
- European Molecular Biology Laboratory - European Bioinformatics Institute (EMBL-EBI), Cambridge, United Kingdom
| | - Egon Willighagen
- Department of Bioinformatics - BiGCaT, NUTRIM, Maastricht University, Maastricht, Netherlands
| |
Collapse
|
45
|
Xu J, Shironoshita P, Visser U, John N, Kabuka M. Module Extraction for Efficient Object Queries over Ontologies with Large ABoxes. ARTIFICIAL INTELLIGENCE AND APPLICATIONS (COMMERCE, CALIF.) 2015; 2:8-31. [PMID: 26848490 PMCID: PMC4736732 DOI: 10.15764/aia.2015.01002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
The extraction of logically-independent fragments out of an ontology ABox can be useful for solving the tractability problem of querying ontologies with large ABoxes. In this paper, we propose a formal definition of an ABox module, such that it guarantees complete preservation of facts about a given set of individuals, and thus can be reasoned independently w.r.t. the ontology TBox. With ABox modules of this type, isolated or distributed (parallel) ABox reasoning becomes feasible, and more efficient data retrieval from ontology ABoxes can be attained. To compute such an ABox module, we present a theoretical approach and also an approximation for SHIQ ontologies. Evaluation of the module approximation on different types of ontologies shows that, on average, extracted ABox modules are significantly smaller than the entire ABox, and the time for ontology reasoning based on ABox modules can be improved significantly.
Collapse
Affiliation(s)
- Jia Xu
- Electrical and Computer Engineering Department, University of Miami, Coral Gables, FL 33146
| | - Patrick Shironoshita
- Electrical and Computer Engineering Department, University of Miami, Coral Gables, FL 33146
| | - Ubbo Visser
- Department of Computer Science, University of Miami, Coral Gables, FL 33146
| | - Nigel John
- Electrical and Computer Engineering Department, University of Miami, Coral Gables, FL 33146
| | - Mansur Kabuka
- Electrical and Computer Engineering Department, University of Miami, Coral Gables, FL 33146
| |
Collapse
|
46
|
Zander Balderud L, Murray D, Larsson N, Vempati U, Schürer SC, Bjäreland M, Engkvist O. Using the BioAssay Ontology for analyzing high-throughput screening data. ACTA ACUST UNITED AC 2014; 20:402-15. [PMID: 25512330 DOI: 10.1177/1087057114563493] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023]
Abstract
High-throughput screening (HTS) is the main starting point for hit identification in drug discovery programs. This has led to a rapid increase of available screening data both within pharmaceutical companies and the public domain. We have used the BioAssay Ontology (BAO) 2.0 for assay annotation within AstraZeneca to enable comparison with external HTS methods. The annotated assays have been analyzed to identify technology gaps, evaluate new methods, verify active hits, and compare compound activity between in-house and PubChem assays. As an example, the binding of a fluorescent ligand to formyl peptide receptor 1 (FPR1, involved in inflammation, for example) in an in-house HTS was measured by fluorescence intensity. In total, 155 active compounds were also tested in an external ligand binding flow cytometry assay, a method not used for in-house HTS detection. Twelve percent of the 155 compounds were found active in both assays. By the annotation of assay protocols using BAO terms, internal and external assays can easily be identified and method comparison facilitated. They can be used to evaluate the effectiveness of different assay methods, design appropriate confirmatory and counterassays, and analyze the activity of compounds for identification of technology artifacts.
Collapse
Affiliation(s)
| | - David Murray
- Discovery Sciences, AstraZeneca R&D Alderley Park, Alderley Park, UK
| | - Niklas Larsson
- Discovery Sciences, AstraZeneca R&D Mölndal, Mölndal, Sweden
| | - Uma Vempati
- Center for Computational Science, University of Miami, Miami, FL, USA
| | - Stephan C Schürer
- Center for Computational Science, University of Miami, Miami, FL, USA
| | | | - Ola Engkvist
- Discovery Sciences, AstraZeneca R&D Mölndal, Mölndal, Sweden
| |
Collapse
|
47
|
Howe EA, de Souza A, Lahr DL, Chatwin S, Montgomery P, Alexander BR, Nguyen DT, Cruz Y, Stonich DA, Walzer G, Rose JT, Picard SC, Liu Z, Rose JN, Xiang X, Asiedu J, Durkin D, Levine J, Yang JJ, Schürer SC, Braisted JC, Southall N, Southern MR, Chung TDY, Brudz S, Tanega C, Schreiber SL, Bittker JA, Guha R, Clemons PA. BioAssay Research Database (BARD): chemical biology and probe-development enabled by structured metadata and result types. Nucleic Acids Res 2014; 43:D1163-70. [PMID: 25477388 DOI: 10.1093/nar/gku1244] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2023] Open
Abstract
BARD, the BioAssay Research Database (https://bard.nih.gov/) is a public database and suite of tools developed to provide access to bioassay data produced by the NIH Molecular Libraries Program (MLP). Data from 631 MLP projects were migrated to a new structured vocabulary designed to capture bioassay data in a formalized manner, with particular emphasis placed on the description of assay protocols. New data can be submitted to BARD with a user-friendly set of tools that assist in the creation of appropriately formatted datasets and assay definitions. Data published through the BARD application program interface (API) can be accessed by researchers using web-based query tools or a desktop client. Third-party developers wishing to create new tools can use the API to produce stand-alone tools or new plug-ins that can be integrated into BARD. The entire BARD suite of tools therefore supports three classes of researcher: those who wish to publish data, those who wish to mine data for testable hypotheses, and those in the developer community who wish to build tools that leverage this carefully curated chemical biology resource.
Collapse
Affiliation(s)
- E A Howe
- Center for the Science of Therapeutics, Broad Institute, 415 Main Street, Cambridge, MA 02142, USA
| | - A de Souza
- Center for the Science of Therapeutics, Broad Institute, 415 Main Street, Cambridge, MA 02142, USA
| | - D L Lahr
- Center for the Science of Therapeutics, Broad Institute, 415 Main Street, Cambridge, MA 02142, USA
| | - S Chatwin
- Center for the Science of Therapeutics, Broad Institute, 415 Main Street, Cambridge, MA 02142, USA
| | - P Montgomery
- Center for the Science of Therapeutics, Broad Institute, 415 Main Street, Cambridge, MA 02142, USA
| | - B R Alexander
- Center for the Science of Therapeutics, Broad Institute, 415 Main Street, Cambridge, MA 02142, USA
| | - D-T Nguyen
- National Center for Advancing Translational Sciences (NCATS), National Institutes of Health (NIH), 9800 Medical Center Drive, Rockville, MD 20850, USA
| | - Y Cruz
- The Translational Research Institute, The Scripps Research Institute, 130 Scripps Way, Jupiter, FL 33458, USA
| | - D A Stonich
- Conrad Prebys Center for Chemical Genomics, Sanford
- Burnham Medical Research Institute, 10901 N. Torrey Pines Road, La Jolla, CA 92037, USA
| | - G Walzer
- Center for the Science of Therapeutics, Broad Institute, 415 Main Street, Cambridge, MA 02142, USA
| | - J T Rose
- Center for the Science of Therapeutics, Broad Institute, 415 Main Street, Cambridge, MA 02142, USA
| | - S C Picard
- Center for the Science of Therapeutics, Broad Institute, 415 Main Street, Cambridge, MA 02142, USA
| | - Z Liu
- Center for the Science of Therapeutics, Broad Institute, 415 Main Street, Cambridge, MA 02142, USA
| | - J N Rose
- Center for the Science of Therapeutics, Broad Institute, 415 Main Street, Cambridge, MA 02142, USA
| | - X Xiang
- Center for the Science of Therapeutics, Broad Institute, 415 Main Street, Cambridge, MA 02142, USA
| | - J Asiedu
- Center for the Science of Therapeutics, Broad Institute, 415 Main Street, Cambridge, MA 02142, USA
| | - D Durkin
- Center for the Science of Therapeutics, Broad Institute, 415 Main Street, Cambridge, MA 02142, USA
| | - J Levine
- Center for the Science of Therapeutics, Broad Institute, 415 Main Street, Cambridge, MA 02142, USA
| | - J J Yang
- University of New Mexico Center for Molecular Discovery, University of New Mexico Health Sciences Center, 2500 Marble Avenue NE, Albuquerque, NM 87131, USA
| | - S C Schürer
- Center for Computational Science, University of Miami, 1320 S. Dixie Highway, Gables One Tower, Coral Gables, FL 33146, USA
| | - J C Braisted
- National Center for Advancing Translational Sciences (NCATS), National Institutes of Health (NIH), 9800 Medical Center Drive, Rockville, MD 20850, USA
| | - N Southall
- National Center for Advancing Translational Sciences (NCATS), National Institutes of Health (NIH), 9800 Medical Center Drive, Rockville, MD 20850, USA
| | - M R Southern
- The Translational Research Institute, The Scripps Research Institute, 130 Scripps Way, Jupiter, FL 33458, USA
| | - T D Y Chung
- Conrad Prebys Center for Chemical Genomics, Sanford
- Burnham Medical Research Institute, 10901 N. Torrey Pines Road, La Jolla, CA 92037, USA
| | - S Brudz
- Center for the Science of Therapeutics, Broad Institute, 415 Main Street, Cambridge, MA 02142, USA
| | - C Tanega
- National Center for Advancing Translational Sciences (NCATS), National Institutes of Health (NIH), 9800 Medical Center Drive, Rockville, MD 20850, USA
| | - S L Schreiber
- Center for the Science of Therapeutics, Broad Institute, 415 Main Street, Cambridge, MA 02142, USA
| | - J A Bittker
- Center for the Science of Therapeutics, Broad Institute, 415 Main Street, Cambridge, MA 02142, USA
| | - R Guha
- National Center for Advancing Translational Sciences (NCATS), National Institutes of Health (NIH), 9800 Medical Center Drive, Rockville, MD 20850, USA
| | - P A Clemons
- Center for the Science of Therapeutics, Broad Institute, 415 Main Street, Cambridge, MA 02142, USA
| |
Collapse
|
48
|
Lipinski CA, Litterman NK, Southan C, Williams AJ, Clark AM, Ekins S. Parallel worlds of public and commercial bioactive chemistry data. J Med Chem 2014; 58:2068-76. [PMID: 25415348 PMCID: PMC4360371 DOI: 10.1021/jm5011308] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
Abstract
![]()
The
availability of structures and linked bioactivity data in databases
is powerfully enabling for drug discovery and chemical biology. However,
we now review some confounding issues with the divergent expansions
of public and commercial sources of chemical structures. These are
associated with not only expanding patent extraction but also increasingly
large vendor collections amassed via different selection criteria
between SciFinder from Chemical Abstracts Service (CAS) and major
public sources such as PubChem, ChemSpider, UniChem, and others. These
increasingly massive collections may include both real and virtual
compounds, as well as so-called prophetic compounds from patents.
We address a range of issues raised by the challenges faced resolving
the NIH probe compounds. In addition we highlight the confounding
of prior-art searching by virtual compounds that could impact the
composition of matter patentability of a new medicinal chemistry lead.
Finally, we propose some potential solutions.
Collapse
Affiliation(s)
- Christopher A Lipinski
- Christopher A. Lipinski, Ph.D., LLC , 10 Connshire Drive, Waterford, Connecticut 06385-4122, United States
| | | | | | | | | | | |
Collapse
|
49
|
Wassermann AM, Lounkine E, Davies JW, Glick M, Camargo LM. The opportunities of mining historical and collective data in drug discovery. Drug Discov Today 2014; 20:422-34. [PMID: 25463034 DOI: 10.1016/j.drudis.2014.11.004] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2014] [Revised: 10/21/2014] [Accepted: 11/10/2014] [Indexed: 12/26/2022]
Abstract
Vast amounts of bioactivity data have been generated for small molecules across public and corporate domains. Biological signatures, either derived from systematic profiling efforts or from existing historical assay data, have been successfully employed for small molecule mechanism-of-action elucidation, drug repositioning, hit expansion and screening subset design. This article reviews different types of biological descriptors and applications, and we demonstrate how biological data can outlive the original purpose or project for which it was generated. By comparing 150 HTS campaigns run at Novartis over the past decade on the basis of their active and inactive chemical matter, we highlight the opportunities and challenges associated with cross-project learning in drug discovery.
Collapse
Affiliation(s)
- Anne Mai Wassermann
- In Silico Lead Discovery, Novartis Institutes for Biomedical Research, 250 Massachusetts Avenue, Cambridge, MA 02139, USA.
| | - Eugen Lounkine
- In Silico Lead Discovery, Novartis Institutes for Biomedical Research, 250 Massachusetts Avenue, Cambridge, MA 02139, USA
| | - John W Davies
- In Silico Lead Discovery, Novartis Institutes for Biomedical Research, 250 Massachusetts Avenue, Cambridge, MA 02139, USA
| | - Meir Glick
- In Silico Lead Discovery, Novartis Institutes for Biomedical Research, 250 Massachusetts Avenue, Cambridge, MA 02139, USA
| | - L Miguel Camargo
- In Silico Lead Discovery, Novartis Institutes for Biomedical Research, 250 Massachusetts Avenue, Cambridge, MA 02139, USA.
| |
Collapse
|
50
|
Abstract
Within the last decade open data concepts has been gaining increasing interest in the area of drug discovery. With the launch of ChEMBL and PubChem, an enormous amount of bioactivity data was made easily accessible to the public domain. In addition, platforms that semantically integrate those data, such as the Open PHACTS Discovery Platform, permit querying across different domains of open life science data beyond the concept of ligand-target-pharmacology. However, most public databases are compiled from literature sources and are thus heterogeneous in their coverage. In addition, assay descriptions are not uniform and most often lack relevant information in the primary literature and, consequently, in databases. This raises the question how useful large public data sources are for deriving computational models. In this perspective, we highlight selected open-source initiatives and outline the possibilities and also the limitations when exploiting this huge amount of bioactivity data.
Collapse
|