1
|
Hu J, Allen BK, Stathias V, Ayad NG, Schürer SC. Kinome-Wide Virtual Screening by Multi-Task Deep Learning. Int J Mol Sci 2024; 25:2538. [PMID: 38473785 PMCID: PMC10932040 DOI: 10.3390/ijms25052538] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2023] [Revised: 02/04/2024] [Accepted: 02/17/2024] [Indexed: 03/14/2024] Open
Abstract
Deep learning is a machine learning technique to model high-level abstractions in data by utilizing a graph composed of multiple processing layers that experience various linear and non-linear transformations. This technique has been shown to perform well for applications in drug discovery, utilizing structural features of small molecules to predict activity. Here, we report a large-scale study to predict the activity of small molecules across the human kinome-a major family of drug targets, particularly in anti-cancer agents. While small-molecule kinase inhibitors exhibit impressive clinical efficacy in several different diseases, resistance often arises through adaptive kinome reprogramming or subpopulation diversity. Polypharmacology and combination therapies offer potential therapeutic strategies for patients with resistant diseases. Their development would benefit from a more comprehensive and dense knowledge of small-molecule inhibition across the human kinome. Leveraging over 650,000 bioactivity annotations for more than 300,000 small molecules, we evaluated multiple machine learning methods to predict the small-molecule inhibition of 342 kinases across the human kinome. Our results demonstrated that multi-task deep neural networks outperformed classical single-task methods, offering the potential for conducting large-scale virtual screening, predicting activity profiles, and bridging the gaps in the available data.
Collapse
Affiliation(s)
- Jiaming Hu
- Dr. John T. Macdonald Foundation Department of Human Genetics and John P. Hussman Institute for Human Genomics, Miller School of Medicine, University of Miami, Miami, FL 33136, USA;
- Department of Molecular and Cellular Pharmacology, Miller School of Medicine, University of Miami, Miami, FL 33136, USA; (B.K.A.); (V.S.)
| | - Bryce K. Allen
- Department of Molecular and Cellular Pharmacology, Miller School of Medicine, University of Miami, Miami, FL 33136, USA; (B.K.A.); (V.S.)
- Institute for Data Science & Computing, University of Miami, Miami, FL 33136, USA
| | - Vasileios Stathias
- Department of Molecular and Cellular Pharmacology, Miller School of Medicine, University of Miami, Miami, FL 33136, USA; (B.K.A.); (V.S.)
| | - Nagi G. Ayad
- Center for Therapeutic Innovation Miller School of Medicine, University of Miami, Miami, FL 33136, USA;
- Miami Project to Cure Paralysis, Department of Psychiatry and Behavioral Sciences, Miller School of Medicine, University of Miami, Miami, FL 33136, USA
- Sylvester Comprehensive Cancer Center, Miller School of Medicine, University of Miami, Miami, FL 33136, USA
| | - Stephan C. Schürer
- Department of Molecular and Cellular Pharmacology, Miller School of Medicine, University of Miami, Miami, FL 33136, USA; (B.K.A.); (V.S.)
- Institute for Data Science & Computing, University of Miami, Miami, FL 33136, USA
- Center for Therapeutic Innovation Miller School of Medicine, University of Miami, Miami, FL 33136, USA;
- Sylvester Comprehensive Cancer Center, Miller School of Medicine, University of Miami, Miami, FL 33136, USA
| |
Collapse
|
2
|
Dumschott K, Dörpholz H, Laporte MA, Brilhaus D, Schrader A, Usadel B, Neumann S, Arnaud E, Kranz A. Ontologies for increasing the FAIRness of plant research data. FRONTIERS IN PLANT SCIENCE 2023; 14:1279694. [PMID: 38098789 PMCID: PMC10720748 DOI: 10.3389/fpls.2023.1279694] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/18/2023] [Accepted: 11/15/2023] [Indexed: 12/17/2023]
Abstract
The importance of improving the FAIRness (findability, accessibility, interoperability, reusability) of research data is undeniable, especially in the face of large, complex datasets currently being produced by omics technologies. Facilitating the integration of a dataset with other types of data increases the likelihood of reuse, and the potential of answering novel research questions. Ontologies are a useful tool for semantically tagging datasets as adding relevant metadata increases the understanding of how data was produced and increases its interoperability. Ontologies provide concepts for a particular domain as well as the relationships between concepts. By tagging data with ontology terms, data becomes both human- and machine- interpretable, allowing for increased reuse and interoperability. However, the task of identifying ontologies relevant to a particular research domain or technology is challenging, especially within the diverse realm of fundamental plant research. In this review, we outline the ontologies most relevant to the fundamental plant sciences and how they can be used to annotate data related to plant-specific experiments within metadata frameworks, such as Investigation-Study-Assay (ISA). We also outline repositories and platforms most useful for identifying applicable ontologies or finding ontology terms.
Collapse
Affiliation(s)
- Kathryn Dumschott
- Institute of Bio- and Geosciences (IBG-4: Bioinformatics) & Bioeconomy Science Center (BioSC), CEPLAS, Forschungszentrum Jülich, Jülich, Germany
| | - Hannah Dörpholz
- Institute of Bio- and Geosciences (IBG-4: Bioinformatics) & Bioeconomy Science Center (BioSC), CEPLAS, Forschungszentrum Jülich, Jülich, Germany
| | - Marie-Angélique Laporte
- Digital Solutions Team, Digital Inclusion Lever, Bioversity International, Montpellier Office, Montpellier, France
| | - Dominik Brilhaus
- Data Science and Management & Cluster of Excellence on Plant Sciences (CEPLAS), Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Andrea Schrader
- Data Science and Management & Cluster of Excellence on Plant Sciences (CEPLAS), University of Cologne, Cologne, Germany
| | - Björn Usadel
- Institute of Bio- and Geosciences (IBG-4: Bioinformatics) & Bioeconomy Science Center (BioSC), CEPLAS, Forschungszentrum Jülich, Jülich, Germany
- Institute for Biological Data Science & Cluster of Excellence on Plant Sciences (CEPLAS), Faculty of Mathematics and Life Sciences, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Steffen Neumann
- Program Center MetaCom, Leibniz Institute of Plant Biochemistry, Halle, Germany
- German Centre for Integrative Biodiversity Research (iDiv), Halle-Jena-Leipzig, Germany
| | - Elizabeth Arnaud
- Digital Solutions Team, Digital Inclusion Lever, Bioversity International, Montpellier Office, Montpellier, France
| | - Angela Kranz
- Institute of Bio- and Geosciences (IBG-4: Bioinformatics) & Bioeconomy Science Center (BioSC), CEPLAS, Forschungszentrum Jülich, Jülich, Germany
| |
Collapse
|
3
|
Penn S, Lomax J, Karlsson A, Antonucci V, Zachmann CD, Kanza S, Schurer S, Turner J. An extension of the BioAssay Ontology to include pharmacokinetic/pharmacodynamic terminology for the enrichment of scientific workflows. J Biomed Semantics 2023; 14:10. [PMID: 37568227 PMCID: PMC10416407 DOI: 10.1186/s13326-023-00288-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2022] [Accepted: 04/29/2023] [Indexed: 08/13/2023] Open
Abstract
With the capacity to produce and record data electronically, Scientific research and the data associated with it have grown at an unprecedented rate. However, despite a decent amount of data now existing in an electronic form, it is still common for scientific research to be recorded in an unstructured text format with inconsistent context (vocabularies) which vastly reduces the potential for direct intelligent analysis. Research has demonstrated that the use of semantic technologies such as ontologies to structure and enrich scientific data can greatly improve this potential. However, whilst there are many ontologies that can be used for this purpose, there is still a vast quantity of scientific terminology that does not have adequate semantic representation. A key area for expansion identified by the authors was the pharmacokinetic/pharmacodynamic (PK/PD) domain due to its high usage across many areas of Pharma. As such we have produced a set of these terms and other bioassay related terms to be incorporated into the BioAssay Ontology (BAO), which was identified as the most relevant ontology for this work. A number of use cases developed by experts in the field were used to demonstrate how these new ontology terms can be used, and to set the scene for the continuation of this work with a look to expanding this work out into further relevant domains. The work done in this paper was part of Phase 1 of the SEED project (Semantically Enriching electronic laboratory notebook (eLN) Data).
Collapse
Affiliation(s)
- Steve Penn
- Pfizer Inc, 1 Portland Street, Cambridge, MA 02139 USA
| | - Jane Lomax
- Scibite an Elsevier Company, Scibite Ltd, Biodata Innovation Centre, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1DR UK
| | - Anneli Karlsson
- Scibite an Elsevier Company, Scibite Ltd, Biodata Innovation Centre, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1DR UK
| | | | - Carl-Dieter Zachmann
- Sanofi-Aventis Deutschland GmbH, R&D / Integrated Drug Discovery, Industriepark Hoechst, Frankfurt am Main, H831 C.0156, 65926 Germany
| | - Samantha Kanza
- Department of Chemistry, University of Southampton, Highfield Campus, University Road, Southampton, SO17 1BJ UK
| | - Stephan Schurer
- Department of Cellular and Molecular Pharmacology, Miller School of Medicine, University of Miami, Miami, FL 33136 USA
| | - John Turner
- Department of Cellular and Molecular Pharmacology, Miller School of Medicine, University of Miami, Miami, FL 33136 USA
| |
Collapse
|
4
|
Mortensen HM, Martens M, Senn J, Levey T, Evelo CT, Willighagen EL, Exner T. The AOP-DB RDF: Applying FAIR Principles to the Semantic Integration of AOP Data Using the Research Description Framework. FRONTIERS IN TOXICOLOGY 2022; 4:803983. [PMID: 35295213 PMCID: PMC8915825 DOI: 10.3389/ftox.2022.803983] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2021] [Accepted: 01/13/2022] [Indexed: 01/12/2023] Open
Abstract
Computational toxicology is central to the current transformation occurring in toxicology and chemical risk assessment. There is a need for more efficient use of existing data to characterize human toxicological response data for environmental chemicals in the US and Europe. The Adverse Outcome Pathway (AOP) framework helps to organize existing mechanistic information and contributes to what is currently being described as New Approach Methodologies (NAMs). AOP knowledge and data are currently submitted directly by users and stored in the AOP-Wiki (https://aopwiki.org/). Automatic and systematic parsing of AOP-Wiki data is challenging, so we have created the EPA Adverse Outcome Pathway Database. The AOP-DB, developed by the US EPA to assist in the biological and mechanistic characterization of AOP data, provides a broad, systems-level overview of the biological context of AOPs. Here we describe the recent semantic mapping efforts for the AOP-DB, and how this process facilitates the integration of AOP-DB data with other toxicologically relevant datasets through a use case example.
Collapse
Affiliation(s)
- Holly M. Mortensen
- United States Environmental Protection Agency, Office of Research and Development, Center for Public Health and Environmental Assessment, Research Triangle Park, Durham, NC, United States
| | - Marvin Martens
- Department of Bioinformatics (BiGCaT), Maastricht University, Maastricht, Netherlands
| | - Jonathan Senn
- Oak Ridge Associated Universities, Oak Ridge, TN, United States
| | - Trevor Levey
- Oak Ridge Associated Universities, Oak Ridge, TN, United States
- SAS Institute, Cary, NC, United States
| | - Chris T. Evelo
- Department of Bioinformatics (BiGCaT), Maastricht University, Maastricht, Netherlands
- Maastricht Centre for Systems Biology, Maastricht University, Maastricht, Netherlands
| | - Egon L. Willighagen
- Department of Bioinformatics (BiGCaT), Maastricht University, Maastricht, Netherlands
| | | |
Collapse
|
5
|
Tanoli Z, Aldahdooh J, Alam F, Wang Y, Seemab U, Fratelli M, Pavlis P, Hajduch M, Bietrix F, Gribbon P, Zaliani A, Hall MD, Shen M, Brimacombe K, Kulesskiy E, Saarela J, Wennerberg K, Vähä-Koskela M, Tang J. Minimal information for chemosensitivity assays (MICHA): a next-generation pipeline to enable the FAIRification of drug screening experiments. Brief Bioinform 2021; 23:6361039. [PMID: 34472587 PMCID: PMC8769689 DOI: 10.1093/bib/bbab350] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2021] [Revised: 08/03/2021] [Accepted: 08/02/2021] [Indexed: 12/29/2022] Open
Abstract
Chemosensitivity assays are commonly used for preclinical drug discovery and clinical trial optimization. However, data from independent assays are often discordant, largely attributed to uncharacterized variation in the experimental materials and protocols. We report here the launching of Minimal Information for Chemosensitivity Assays (MICHA), accessed via https://micha-protocol.org. Distinguished from existing efforts that are often lacking support from data integration tools, MICHA can automatically extract publicly available information to facilitate the assay annotation including: 1) compounds, 2) samples, 3) reagents and 4) data processing methods. For example, MICHA provides an integrative web server and database to obtain compound annotation including chemical structures, targets and disease indications. In addition, the annotation of cell line samples, assay protocols and literature references can be greatly eased by retrieving manually curated catalogues. Once the annotation is complete, MICHA can export a report that conforms to the FAIR principle (Findable, Accessible, Interoperable and Reusable) of drug screening studies. To consolidate the utility of MICHA, we provide FAIRified protocols from five major cancer drug screening studies as well as six recently conducted COVID-19 studies. With the MICHA web server and database, we envisage a wider adoption of a community-driven effort to improve the open access of drug sensitivity assays.
Collapse
Affiliation(s)
- Ziaurrehman Tanoli
- Research Program in Systems Oncology, Faculty of medicine, University of Helsinki, Finland
| | - Jehad Aldahdooh
- Research Program in Systems Oncology, Faculty of medicine, University of Helsinki, Finland
| | - Farhan Alam
- Research Program in Systems Oncology, Faculty of medicine, University of Helsinki, Finland
| | - Yinyin Wang
- Research Program in Systems Oncology, Faculty of medicine, University of Helsinki, Finland
| | - Umair Seemab
- Research Program in Systems Oncology, Faculty of medicine, University of Helsinki, Finland
| | | | - Petr Pavlis
- Institute of Molecular and Translational Medicine, Czech
| | - Marian Hajduch
- Institute of Molecular and Translational Medicine, Czech
| | | | - Philip Gribbon
- Fraunhofer Institute for Molecular Biology and Applied Ecology, Germany
| | - Andrea Zaliani
- Fraunhofer Institute for Molecular Biology and Applied Ecology, Germany
| | - Matthew D Hall
- National Center for Advancing Translational Sciences, USA
| | - Min Shen
- National Center for Advancing Translational Sciences, USA
| | | | - Evgeny Kulesskiy
- Institute for Molecular Medicine Finland, University of Helsinki, Finland
| | - Jani Saarela
- Institute for Molecular Medicine Finland, University of Helsinki, Finland
| | - Krister Wennerberg
- Biotech Research & Innovation Centre (BRIC), University of Copenhagen, Denmark
| | | | - Jing Tang
- Research Program in Systems Oncology, Faculty of medicine, University of Helsinki, Finland
| |
Collapse
|
6
|
Choteau SA, Wagner A, Pierre P, Spinelli L, Brun C. MetamORF: a repository of unique short open reading frames identified by both experimental and computational approaches for gene and metagene analyses. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2021; 2021:6307706. [PMID: 34156446 PMCID: PMC8218702 DOI: 10.1093/database/baab032] [Citation(s) in RCA: 22] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/23/2020] [Revised: 04/08/2021] [Accepted: 05/17/2021] [Indexed: 11/12/2022]
Abstract
The development of high-throughput technologies revealed the existence of non-canonical short open reading frames (sORFs) on most eukaryotic ribonucleic acids. They are ubiquitous genetic elements conserved across species and suspected to be involved in numerous cellular processes. MetamORF (https://metamorf.hb.univ-amu.fr/) aims to provide a repository of unique sORFs identified in the human and mouse genomes with both experimental and computational approaches. By gathering publicly available sORF data, normalizing them and summarizing redundant information, we were able to identify a total of 1 162 675 unique sORFs. Despite the usual characterization of ORFs as short, upstream or downstream, there is currently no clear consensus regarding the definition of these categories. Thus, the data have been reprocessed using a normalized nomenclature. MetamORF enables new analyses at locus, gene, transcript and ORF levels, which should offer the possibility to address new questions regarding sORF functions in the future. The repository is available through an user-friendly web interface, allowing easy browsing, visualization, filtering over multiple criteria and export possibilities. sORFs can be searched starting from a gene, a transcript and an ORF ID, looking in a genome area or browsing the whole repository for a species. The database content has also been made available through track hubs at UCSC Genome Browser. Finally, we demonstrated an enrichment of genes harboring upstream ORFs among genes expressed in response to reticular stress. Database URL https://metamorf.hb.univ-amu.fr/.
Collapse
Affiliation(s)
- Sebastien A Choteau
- Aix-Marseille University, INSERM, TAGC, Turing Centre for Living Systems, 163 Avenue de Luminy, Marseille 13009, France.,Aix-Marseille University, INSERM, CNRS, CIML, Turing Centre for Living Systems, 163 Avenue de Luminy, Marseille 13009, France
| | - Audrey Wagner
- Aix-Marseille University, INSERM, TAGC, Turing Centre for Living Systems, 163 Avenue de Luminy, Marseille 13009, France
| | - Philippe Pierre
- Aix-Marseille University, INSERM, CNRS, CIML, Turing Centre for Living Systems, 163 Avenue de Luminy, Marseille 13009, France.,Department of Medical Sciences, Institute for Research in Biomedicine (iBiMED) and Ilidio Pinho Foundation, University of Aveiro, Aveiro 3810-193, Portugal.,Shanghai Institute of Immunology, School of Medicine, Shanghai Jiao Tong University, Shanghai, China
| | - Lionel Spinelli
- Aix-Marseille University, INSERM, TAGC, Turing Centre for Living Systems, 163 Avenue de Luminy, Marseille 13009, France.,Aix-Marseille University, INSERM, CNRS, CIML, Turing Centre for Living Systems, 163 Avenue de Luminy, Marseille 13009, France
| | - Christine Brun
- Aix-Marseille University, INSERM, TAGC, Turing Centre for Living Systems, 163 Avenue de Luminy, Marseille 13009, France.,CNRS, 31 Chemin Joseph Aiguier, Marseille 13009, France
| |
Collapse
|
7
|
Galgonek J, Vondrášek J. IDSM ChemWebRDF: SPARQLing small-molecule datasets. J Cheminform 2021; 13:38. [PMID: 33980298 PMCID: PMC8117646 DOI: 10.1186/s13321-021-00515-1] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2021] [Accepted: 04/23/2021] [Indexed: 11/12/2022] Open
Abstract
The Resource Description Framework (RDF), together with well-defined ontologies, significantly increases data interoperability and usability. The SPARQL query language was introduced to retrieve requested RDF data and to explore links between them. Among other useful features, SPARQL supports federated queries that combine multiple independent data source endpoints. This allows users to obtain insights that are not possible using only a single data source. Owing to all of these useful features, many biological and chemical databases present their data in RDF, and support SPARQL querying. In our project, we primary focused on PubChem, ChEMBL and ChEBI small-molecule datasets. These datasets are already being exported to RDF by their creators. However, none of them has an official and currently supported SPARQL endpoint. This omission makes it difficult to construct complex or federated queries that could access all of the datasets, thus underutilising the main advantage of the availability of RDF data. Our goal is to address this gap by integrating the datasets into one database called the Integrated Database of Small Molecules (IDSM) that will be accessible through a SPARQL endpoint. Beyond that, we will also focus on increasing mutual interoperability of the datasets. To realise the endpoint, we decided to implement an in-house developed SPARQL engine based on the PostgreSQL relational database for data storage. In our approach, data are stored in the traditional relational form, and the SPARQL engine translates incoming SPARQL queries into equivalent SQL queries. An important feature of the engine is that it optimises the resulting SQL queries. Together with optimisations performed by PostgreSQL, this allows efficient evaluations of SPARQL queries. The endpoint provides not only querying in the dataset, but also the compound substructure and similarity search supported by our Sachem project. Although the endpoint is accessible from an internet browser, it is mainly intended to be used for programmatic access by other services, for example as a part of federated queries. For regular users, we offer a rich web application called ChemWebRDF using the endpoint. The application is publicly available at https://idsm.elixir-czech.cz/chemweb/.
Collapse
Affiliation(s)
- Jakub Galgonek
- Institute of Organic Chemistry and Biochemistry of the CAS, Flemingovo náměstí 2, 166 10, Prague 6, Czech Republic.
| | - Jiří Vondrášek
- Institute of Organic Chemistry and Biochemistry of the CAS, Flemingovo náměstí 2, 166 10, Prague 6, Czech Republic
| |
Collapse
|
8
|
The Life of a Trailing Spouse. J Neurosci 2021; 41:3-10. [PMID: 33408132 DOI: 10.1523/jneurosci.2874-20.2020] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2020] [Revised: 11/22/2020] [Accepted: 11/24/2020] [Indexed: 11/21/2022] Open
Abstract
In 1981, I published a paper in the first issue of the Journal of Neuroscience with my postdoctoral mentor, Alan Pearlman. It reported a quantitative analysis of the receptive field properties of neurons in reeler mouse visual cortex and the surprising conclusion that although the neuronal somas were strikingly malpositioned, their receptive fields were unchanged. This suggested that in mouse cortex at least, neuronal circuits have very robust systems in place to ensure the proper formation of connections. This had the unintended consequence of transforming me from an electrophysiologist into a cellular and molecular neuroscientist who studied cell adhesion molecules and the molecular mechanisms they use to regulate axon growth. It took me a surprisingly long time to appreciate that your science is driven by the people around you and by the technologies that are locally available. As a professional puzzler, I like all different kinds of puzzles, but the most fun puzzles involve playing with other puzzlers. This is my story of learning how to find like-minded puzzlers to solve riddles about axon growth and regeneration.
Collapse
|
9
|
Zhang R, Li X, Zhang X, Qin H, Xiao W. Machine learning approaches for elucidating the biological effects of natural products. Nat Prod Rep 2021; 38:346-361. [PMID: 32869826 DOI: 10.1039/d0np00043d] [Citation(s) in RCA: 42] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
Covering: 2000 to 2020 Machine learning (ML) is an efficient tool for the prediction of bioactivity and the study of structure-activity relationships. Over the past decade, an emerging trend for combining these approaches with the study of natural products (NPs) has developed in order to manage the challenge of the discovery of bioactive NPs. In the present review, we will introduce the basic principles and protocols for using the ML approach to investigate the bioactivity of NPs, citing a series of practical examples regarding the study of anti-microbial, anti-cancer, and anti-inflammatory NPs, etc. ML algorithms manage a variety of classification and regression problems associated with bioactive NPs, from those that are linear to non-linear and from pure compounds to plant extracts. Inspired by cases reported in the literature and our own experience, a number of key points have been emphasized for reducing modeling errors, including dataset preparation and applicability domain analysis.
Collapse
Affiliation(s)
- Ruihan Zhang
- Key Laboratory of Medicinal Chemistry for Natural Resource, Ministry of Education, Yunnan Research & Development Center for Natural Products, School of Chemical Science and Technology, Yunnan University, 2 Rd Cuihubei, P. R. China.
| | - Xiaoli Li
- Key Laboratory of Medicinal Chemistry for Natural Resource, Ministry of Education, Yunnan Research & Development Center for Natural Products, School of Chemical Science and Technology, Yunnan University, 2 Rd Cuihubei, P. R. China.
| | - Xingjie Zhang
- Key Laboratory of Medicinal Chemistry for Natural Resource, Ministry of Education, Yunnan Research & Development Center for Natural Products, School of Chemical Science and Technology, Yunnan University, 2 Rd Cuihubei, P. R. China.
| | - Huayan Qin
- Key Laboratory of Medicinal Chemistry for Natural Resource, Ministry of Education, Yunnan Research & Development Center for Natural Products, School of Chemical Science and Technology, Yunnan University, 2 Rd Cuihubei, P. R. China.
| | - Weilie Xiao
- Key Laboratory of Medicinal Chemistry for Natural Resource, Ministry of Education, Yunnan Research & Development Center for Natural Products, School of Chemical Science and Technology, Yunnan University, 2 Rd Cuihubei, P. R. China.
| |
Collapse
|
10
|
Kanza S, Graham Frey J. Semantic Technologies in Drug Discovery. SYSTEMS MEDICINE 2021. [DOI: 10.1016/b978-0-12-801238-3.11520-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022] Open
|
11
|
Issa NT, Stathias V, Schürer S, Dakshanamurthy S. Machine and deep learning approaches for cancer drug repurposing. Semin Cancer Biol 2021; 68:132-142. [PMID: 31904426 PMCID: PMC7723306 DOI: 10.1016/j.semcancer.2019.12.011] [Citation(s) in RCA: 103] [Impact Index Per Article: 34.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2019] [Revised: 10/31/2019] [Accepted: 12/15/2019] [Indexed: 02/07/2023]
Abstract
Knowledge of the underpinnings of cancer initiation, progression and metastasis has increased exponentially in recent years. Advanced "omics" coupled with machine learning and artificial intelligence (deep learning) methods have helped elucidate targets and pathways critical to those processes that may be amenable to pharmacologic modulation. However, the current anti-cancer therapeutic armamentarium continues to lag behind. As the cost of developing a new drug remains prohibitively expensive, repurposing of existing approved and investigational drugs is sought after given known safety profiles and reduction in the cost barrier. Notably, successes in oncologic drug repurposing have been infrequent. Computational in-silico strategies have been developed to aid in modeling biological processes to find new disease-relevant targets and discovering novel drug-target and drug-phenotype associations. Machine and deep learning methods have especially enabled leaps in those successes. This review will discuss these methods as they pertain to cancer biology as well as immunomodulation for drug repurposing opportunities in oncologic diseases.
Collapse
Affiliation(s)
- Naiem T Issa
- Dr. Phillip Frost Department of Dermatology and Cutaneous Surgery, University of Miami School of Medicine, Miami, FL, USA
| | - Vasileios Stathias
- Department of Molecular and Cellular Pharmacology, University of Miami School of Medicine, Miami, FL, USA
| | - Stephan Schürer
- Department of Molecular and Cellular Pharmacology, University of Miami School of Medicine, Miami, FL, USA
| | - Sivanesan Dakshanamurthy
- Department of Oncology, Lombardi Comprehensive Cancer Center, Georgetown University Medical Center, Washington, DC, USA.
| |
Collapse
|
12
|
Kochev N, Jeliazkova N, Paskaleva V, Tancheva G, Iliev L, Ritchie P, Jeliazkov V. Your Spreadsheets Can Be FAIR: A Tool and FAIRification Workflow for the eNanoMapper Database. NANOMATERIALS 2020; 10:nano10101908. [PMID: 32987901 PMCID: PMC7601422 DOI: 10.3390/nano10101908] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/29/2020] [Revised: 09/17/2020] [Accepted: 09/20/2020] [Indexed: 11/30/2022]
Abstract
The field of nanoinformatics is rapidly developing and provides data driven solutions in the area of nanomaterials (NM) safety. Safe by Design approaches are encouraged and promoted through regulatory initiatives and multiple scientific projects. Experimental data is at the core of nanoinformatics processing workflows for risk assessment. The nanosafety data is predominantly recorded in Excel spreadsheet files. Although the spreadsheets are quite convenient for the experimentalists, they also pose great challenges for the consequent processing into databases due to variability of the templates used, specific details provided by each laboratory and the need for proper metadata documentation and formatting. In this paper, we present a workflow to facilitate the conversion of spreadsheets into a FAIR (Findable, Accessible, Interoperable, and Reusable) database, with the pivotal aid of the NMDataParser tool, developed to streamline the mapping of the original file layout into the eNanoMapper semantic data model. The NMDataParser is an open source Java library and application, making use of a JSON configuration to define the mapping. We describe the JSON configuration syntax and the approaches applied for parsing different spreadsheet layouts used by the nanosafety community. Examples of using the NMDataParser tool in nanoinformatics workflows are given. Challenging cases are discussed and appropriate solutions are proposed.
Collapse
Affiliation(s)
- Nikolay Kochev
- Department of Analytical Chemistry and Computer Chemistry, Faculty of Chemistry, University of Plovdiv, 24 Tsar Assen St, 4000 Plovdiv, Bulgaria; (V.P.); (G.T.)
- Ideaconsult Ltd., 4 Angel Kanchev St, 1000 Sofia, Bulgaria; (L.I.); (V.J.)
- Correspondence: (N.K.); (N.J.)
| | - Nina Jeliazkova
- Ideaconsult Ltd., 4 Angel Kanchev St, 1000 Sofia, Bulgaria; (L.I.); (V.J.)
- Correspondence: (N.K.); (N.J.)
| | - Vesselina Paskaleva
- Department of Analytical Chemistry and Computer Chemistry, Faculty of Chemistry, University of Plovdiv, 24 Tsar Assen St, 4000 Plovdiv, Bulgaria; (V.P.); (G.T.)
| | - Gergana Tancheva
- Department of Analytical Chemistry and Computer Chemistry, Faculty of Chemistry, University of Plovdiv, 24 Tsar Assen St, 4000 Plovdiv, Bulgaria; (V.P.); (G.T.)
| | - Luchesar Iliev
- Ideaconsult Ltd., 4 Angel Kanchev St, 1000 Sofia, Bulgaria; (L.I.); (V.J.)
| | - Peter Ritchie
- Institute of Occupational Medicine, Research Avenue North, Riccarton, Edinburgh EH14 4AP, UK;
| | - Vedrin Jeliazkov
- Ideaconsult Ltd., 4 Angel Kanchev St, 1000 Sofia, Bulgaria; (L.I.); (V.J.)
| |
Collapse
|
13
|
Lunghini F, Marcou G, Azam P, Enrici MH, Van Miert E, Varnek A. Publicly available QSPR models for environmental media persistence. SAR AND QSAR IN ENVIRONMENTAL RESEARCH 2020; 31:493-510. [PMID: 32588650 DOI: 10.1080/1062936x.2020.1776387] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/22/2020] [Accepted: 05/27/2020] [Indexed: 06/11/2023]
Abstract
The evaluation of persistency of chemicals in environmental media (water, soil, sediment) is included in European Regulations, in the context of the Persistence, Bioaccumulation and Toxicity (PBT) assessment. In silico predictions are valuable alternatives for compounds screening and prioritization. However, already existing prediction tools have limitations: narrow applicability domains due to their relatively small training sets, and lack of medium-specific models. A dataset of 1579 unique compounds has been collected, merging several persistence data sources annotated by, at least, one experimental dissipation half-life value for the given environmental medium. This dataset was used to train binary classification models discriminating persistent/non-persistent (P/nP) compounds based on REACH half-life thresholds on sediment, water and soil compartments. Models were built using ISIDA (In SIlico design and Data Analysis) fragment descriptors and support vector regression, random forest and naïve Bayesian machine-learning methods. All models scored satisfactory performances: sediment being the most performing one (BAext = 0.91), followed by water (BAext = 0.77) and soil (BAext = 0.76). The latter suffer from low detection of persistent ('P') compounds (Snext = 0.50), reflecting discrepancies in reported half-life measurements among the different data sources. Generated models and collected data are made publicly available.
Collapse
Affiliation(s)
- F Lunghini
- Laboratory of Chemoinformatics, University of Strasbourg , Strasbourg, France
- Toxicological and Environmental Risk Assessment Unit, Solvay S.A ., St. Fons, France
| | - G Marcou
- Laboratory of Chemoinformatics, University of Strasbourg , Strasbourg, France
| | - P Azam
- Toxicological and Environmental Risk Assessment Unit, Solvay S.A ., St. Fons, France
| | - M H Enrici
- Toxicological and Environmental Risk Assessment Unit, Solvay S.A ., St. Fons, France
| | - E Van Miert
- Toxicological and Environmental Risk Assessment Unit, Solvay S.A ., St. Fons, France
| | - A Varnek
- Laboratory of Chemoinformatics, University of Strasbourg , Strasbourg, France
| |
Collapse
|
14
|
Mendez D, Gaulton A, Bento AP, Chambers J, De Veij M, Félix E, Magariños MP, Mosquera JF, Mutowo P, Nowotka M, Gordillo-Marañón M, Hunter F, Junco L, Mugumbate G, Rodriguez-Lopez M, Atkinson F, Bosc N, Radoux CJ, Segura-Cabrera A, Hersey A, Leach AR. ChEMBL: towards direct deposition of bioassay data. Nucleic Acids Res 2020; 47:D930-D940. [PMID: 30398643 PMCID: PMC6323927 DOI: 10.1093/nar/gky1075] [Citation(s) in RCA: 1103] [Impact Index Per Article: 275.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2018] [Accepted: 10/18/2018] [Indexed: 12/31/2022] Open
Abstract
ChEMBL is a large, open-access bioactivity database (https://www.ebi.ac.uk/chembl), previously described in the 2012, 2014 and 2017 Nucleic Acids Research Database Issues. In the last two years, several important improvements have been made to the database and are described here. These include more robust capture and representation of assay details; a new data deposition system, allowing updating of data sets and deposition of supplementary data; and a completely redesigned web interface, with enhanced search and filtering capabilities.
Collapse
Affiliation(s)
- David Mendez
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Anna Gaulton
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - A Patrícia Bento
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Jon Chambers
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Marleen De Veij
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Eloy Félix
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - María Paula Magariños
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK.,Open Targets, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Juan F Mosquera
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Prudence Mutowo
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Michal Nowotka
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - María Gordillo-Marañón
- Institute of Cardiovascular Science, University College London, Gower Street, London WC1E 6BT, UK
| | - Fiona Hunter
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Laura Junco
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Grace Mugumbate
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Milagros Rodriguez-Lopez
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Francis Atkinson
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Nicolas Bosc
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Chris J Radoux
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK.,Open Targets, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Aldo Segura-Cabrera
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Anne Hersey
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Andrew R Leach
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| |
Collapse
|
15
|
Kanavy DM, McNulty SM, Jairath MK, Brnich SE, Bizon C, Powell BC, Berg JS. Comparative analysis of functional assay evidence use by ClinGen Variant Curation Expert Panels. Genome Med 2019; 11:77. [PMID: 31783775 PMCID: PMC6884856 DOI: 10.1186/s13073-019-0683-1] [Citation(s) in RCA: 27] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2019] [Accepted: 11/05/2019] [Indexed: 12/14/2022] Open
Abstract
BACKGROUND The 2015 American College of Medical Genetics and Genomics (ACMG) and the Association for Molecular Pathology (AMP) guidelines for clinical sequence variant interpretation state that "well-established" functional studies can be used as evidence in variant classification. These guidelines articulated key attributes of functional data, including that assays should reflect the biological environment and be analytically sound; however, details of how to evaluate these attributes were left to expert judgment. The Clinical Genome Resource (ClinGen) designates Variant Curation Expert Panels (VCEPs) in specific disease areas to make gene-centric specifications to the ACMG/AMP guidelines, including more specific definitions of appropriate functional assays. We set out to evaluate the existing VCEP guidelines for functional assays. METHODS We evaluated the functional criteria (PS3/BS3) of six VCEPs (CDH1, Hearing Loss, Inherited Cardiomyopathy-MYH7, PAH, PTEN, RASopathy). We then established criteria for evaluating functional studies based on disease mechanism, general class of assay, and the characteristics of specific assay instances described in the primary literature. Using these criteria, we extensively curated assay instances cited by each VCEP in their pilot variant classification to analyze VCEP recommendations and their use in the interpretation of functional studies. RESULTS Unsurprisingly, our analysis highlighted the breadth of VCEP-approved assays, reflecting the diversity of disease mechanisms among VCEPs. We also noted substantial variability between VCEPs in the method used to select these assays and in the approach used to specify strength modifications, as well as differences in suggested validation parameters. Importantly, we observed discrepancies between the parameters VCEPs specified as required for approved assay instances and the fulfillment of these requirements in the individual assays cited in pilot variant interpretation. CONCLUSIONS Interpretation of the intricacies of functional assays often requires expert-level knowledge of the gene and disease, and current VCEP recommendations for functional assay evidence are a useful tool to improve the accessibility of functional data by providing a starting point for curators to identify approved functional assays and key metrics. However, our analysis suggests that further guidance is needed to standardize this process and ensure consistency in the application of functional evidence.
Collapse
Affiliation(s)
- Dona M Kanavy
- Department of Genetics, School of Medicine, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Shannon M McNulty
- Department of Genetics, School of Medicine, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Meera K Jairath
- Department of Genetics, School of Medicine, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Sarah E Brnich
- Department of Genetics, School of Medicine, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Chris Bizon
- Renaissance Computing Institute, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Bradford C Powell
- Department of Genetics, School of Medicine, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Jonathan S Berg
- Department of Genetics, School of Medicine, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA.
| |
Collapse
|
16
|
David L, Walsh J, Sturm N, Feierberg I, Nissink JWM, Chen H, Bajorath J, Engkvist O. Identification of Compounds That Interfere with High-Throughput Screening Assay Technologies. ChemMedChem 2019; 14:1795-1802. [PMID: 31479198 PMCID: PMC6856845 DOI: 10.1002/cmdc.201900395] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2019] [Revised: 08/21/2019] [Indexed: 01/23/2023]
Abstract
A significant challenge in high-throughput screening (HTS) campaigns is the identification of assay technology interference compounds. A Compound Interfering with an Assay Technology (CIAT) gives false readouts in many assays. CIATs are often considered viable hits and investigated in follow-up studies, thus impeding research and wasting resources. In this study, we developed a machine-learning (ML) model to predict CIATs for three assay technologies. The model was trained on known CIATs and non-CIATs (NCIATs) identified in artefact assays and described by their 2D structural descriptors. Usual methods identifying CIATs are based on statistical analysis of historical primary screening data and do not consider experimental assays identifying CIATs. Our results show successful prediction of CIATs for existing and novel compounds and provide a complementary and wider set of predicted CIATs compared to BSF, a published structure-independent model, and to the PAINS substructural filters. Our analysis is an example of how well-curated datasets can provide powerful predictive models despite their relatively small size.
Collapse
Affiliation(s)
- Laurianne David
- Hit Discovery, Discovery Sciences, R&D BioPharmaceuticalsAstraZeneca GoteborgPepparedsleden 1431 83MölndalSweden
- Department of Life Science Informatics, B-ITLIMES Program Unit Chemical Biology and Medicinal ChemistryRheinische Friedrich-Wilhelms-Universität BonnEndenicher Allee 19c53115BonnGermany
| | - Jarrod Walsh
- Hit Discovery, Discovery Sciences, R&D BioPharmaceuticalsAstraZeneca CambridgeAlderley ParkMacclesfieldSK10 4TGUK
| | - Noé Sturm
- Data Science and AI, Drug Safety & Metabolism, R&D BioPharmaceuticalsAstraZeneca GothenburgPepparedsleden 1431 83MölndalSweden
| | - Isabella Feierberg
- Hit Discovery, Discovery Sciences, R&D BioPharmaceuticalsAstraZeneca Boston35 Gatehouse DriveWalthamMA02451USA
| | - J. Willem M. Nissink
- Computational Chemistry, Oncology R&DAstraZenecaCambridge Science Park, Milton RoadCambridgeCB4 0WGUK
| | - Hongming Chen
- Hit Discovery, Discovery Sciences, R&D BioPharmaceuticalsAstraZeneca GoteborgPepparedsleden 1431 83MölndalSweden
| | - Jürgen Bajorath
- Department of Life Science Informatics, B-ITLIMES Program Unit Chemical Biology and Medicinal ChemistryRheinische Friedrich-Wilhelms-Universität BonnEndenicher Allee 19c53115BonnGermany
| | - Ola Engkvist
- Hit Discovery, Discovery Sciences, R&D BioPharmaceuticalsAstraZeneca GoteborgPepparedsleden 1431 83MölndalSweden
| |
Collapse
|
17
|
Watford S, Edwards S, Angrish M, Judson RS, Paul Friedman K. Progress in data interoperability to support computational toxicology and chemical safety evaluation. Toxicol Appl Pharmacol 2019; 380:114707. [PMID: 31404555 PMCID: PMC7705611 DOI: 10.1016/j.taap.2019.114707] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2019] [Revised: 07/29/2019] [Accepted: 08/06/2019] [Indexed: 12/20/2022]
Abstract
New approach methodologies (NAMs) in chemical safety evaluation are being explored to address the current public health implications of human environmental exposures to chemicals with limited or no data for assessment. For over a decade since a push toward "Toxicity Testing in the 21st Century," the field has focused on massive data generation efforts to inform computational approaches for preliminary hazard identification, adverse outcome pathways that link molecular initiating events and key events to apical outcomes, and high-throughput approaches to risk-based ratios of bioactivity and exposure to inform relative priority and safety assessment. Projects like the interagency Tox21 program and the US EPA ToxCast program have generated dose-response information on thousands of chemicals, identified and aggregated information from legacy systems, and created tools for access and analysis. The resulting information has been used to develop computational models as viable options for regulatory applications. This progress has introduced challenges in data management that are new, but not unique, to toxicology. Some of the key questions require critical thinking and solutions to promote semantic interoperability, including: (1) identification of bioactivity information from NAMs that might be related to a biological process; (2) identification of legacy hazard information that might be related to a key event or apical outcomes of interest; and, (3) integration of these NAM and traditional data for computational modeling and prediction of complex apical outcomes such as carcinogenesis. This work reviews a number of toxicology-related efforts specifically related to bioactivity and toxicological data interoperability based on the goals established by Findable, Accessible, Interoperable, and Reusable (FAIR) Data Principles. These efforts are essential to enable better integration of NAM and traditional toxicology information to support data-driven toxicology applications.
Collapse
Affiliation(s)
- Sean Watford
- Booz Allen Hamilton, Rockville, MD 20852, USA; National Center for Computational Toxicology, Office of Research and Development, U.S. Environmental Protection Agency, Research Triangle Park, NC 27711, USA
| | - Stephen Edwards
- Research Triangle Institute International, Research Triangle Park, NC 27709, USA
| | - Michelle Angrish
- National Center for Environmental Assessment, Office of Research and Development, U.S. Environmental Protection Agency, Research Triangle Park, NC 27711, USA
| | - Richard S Judson
- National Center for Computational Toxicology, Office of Research and Development, U.S. Environmental Protection Agency, Research Triangle Park, NC 27711, USA
| | - Katie Paul Friedman
- National Center for Computational Toxicology, Office of Research and Development, U.S. Environmental Protection Agency, Research Triangle Park, NC 27711, USA.
| |
Collapse
|
18
|
Koleti A, Terryn R, Stathias V, Chung C, Cooper DJ, Turner JP, Vidovic D, Forlin M, Kelley TT, D'Urso A, Allen BK, Torre D, Jagodnik KM, Wang L, Jenkins SL, Mader C, Niu W, Fazel M, Mahi N, Pilarczyk M, Clark N, Shamsaei B, Meller J, Vasiliauskas J, Reichard J, Medvedovic M, Ma'ayan A, Pillai A, Schürer SC. Data Portal for the Library of Integrated Network-based Cellular Signatures (LINCS) program: integrated access to diverse large-scale cellular perturbation response data. Nucleic Acids Res 2019; 46:D558-D566. [PMID: 29140462 PMCID: PMC5753343 DOI: 10.1093/nar/gkx1063] [Citation(s) in RCA: 107] [Impact Index Per Article: 21.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2017] [Accepted: 10/19/2017] [Indexed: 11/21/2022] Open
Abstract
The Library of Integrated Network-based Cellular Signatures (LINCS) program is a national consortium funded by the NIH to generate a diverse and extensive reference library of cell-based perturbation-response signatures, along with novel data analytics tools to improve our understanding of human diseases at the systems level. In contrast to other large-scale data generation efforts, LINCS Data and Signature Generation Centers (DSGCs) employ a wide range of assay technologies cataloging diverse cellular responses. Integration of, and unified access to LINCS data has therefore been particularly challenging. The Big Data to Knowledge (BD2K) LINCS Data Coordination and Integration Center (DCIC) has developed data standards specifications, data processing pipelines, and a suite of end-user software tools to integrate and annotate LINCS-generated data, to make LINCS signatures searchable and usable for different types of users. Here, we describe the LINCS Data Portal (LDP) (http://lincsportal.ccs.miami.edu/), a unified web interface to access datasets generated by the LINCS DSGCs, and its underlying database, LINCS Data Registry (LDR). LINCS data served on the LDP contains extensive metadata and curated annotations. We highlight the features of the LDP user interface that is designed to enable search, browsing, exploration, download and analysis of LINCS data and related curated content.
Collapse
Affiliation(s)
- Amar Koleti
- Center for Computational Science, University of Miami, FL, USA.,BD2K LINCS Data Coordination and Integration Center, Icahn School of Medicine at Mount Sinai, University of Miami, University of Cincinnati, New York NY, Miami FL, Cincinnati OH, USA
| | - Raymond Terryn
- Center for Computational Science, University of Miami, FL, USA.,BD2K LINCS Data Coordination and Integration Center, Icahn School of Medicine at Mount Sinai, University of Miami, University of Cincinnati, New York NY, Miami FL, Cincinnati OH, USA.,Department of Molecular and Cellular Pharmacology, Miller School of Medicine, University of Miami, FL, USA
| | - Vasileios Stathias
- BD2K LINCS Data Coordination and Integration Center, Icahn School of Medicine at Mount Sinai, University of Miami, University of Cincinnati, New York NY, Miami FL, Cincinnati OH, USA.,Department of Molecular and Cellular Pharmacology, Miller School of Medicine, University of Miami, FL, USA.,Department of Human Genetics and Genomics, Miller School of Medicine, University of Miami, FL, USA
| | - Caty Chung
- Center for Computational Science, University of Miami, FL, USA.,BD2K LINCS Data Coordination and Integration Center, Icahn School of Medicine at Mount Sinai, University of Miami, University of Cincinnati, New York NY, Miami FL, Cincinnati OH, USA
| | - Daniel J Cooper
- BD2K LINCS Data Coordination and Integration Center, Icahn School of Medicine at Mount Sinai, University of Miami, University of Cincinnati, New York NY, Miami FL, Cincinnati OH, USA.,Department of Molecular and Cellular Pharmacology, Miller School of Medicine, University of Miami, FL, USA
| | - John P Turner
- Center for Computational Science, University of Miami, FL, USA.,BD2K LINCS Data Coordination and Integration Center, Icahn School of Medicine at Mount Sinai, University of Miami, University of Cincinnati, New York NY, Miami FL, Cincinnati OH, USA.,Department of Molecular and Cellular Pharmacology, Miller School of Medicine, University of Miami, FL, USA
| | - Dušica Vidovic
- Center for Computational Science, University of Miami, FL, USA.,BD2K LINCS Data Coordination and Integration Center, Icahn School of Medicine at Mount Sinai, University of Miami, University of Cincinnati, New York NY, Miami FL, Cincinnati OH, USA.,Department of Molecular and Cellular Pharmacology, Miller School of Medicine, University of Miami, FL, USA
| | - Michele Forlin
- Center for Computational Science, University of Miami, FL, USA.,BD2K LINCS Data Coordination and Integration Center, Icahn School of Medicine at Mount Sinai, University of Miami, University of Cincinnati, New York NY, Miami FL, Cincinnati OH, USA.,Department of Molecular and Cellular Pharmacology, Miller School of Medicine, University of Miami, FL, USA
| | - Tanya T Kelley
- BD2K LINCS Data Coordination and Integration Center, Icahn School of Medicine at Mount Sinai, University of Miami, University of Cincinnati, New York NY, Miami FL, Cincinnati OH, USA.,Department of Molecular and Cellular Pharmacology, Miller School of Medicine, University of Miami, FL, USA
| | - Alessandro D'Urso
- Center for Computational Science, University of Miami, FL, USA.,BD2K LINCS Data Coordination and Integration Center, Icahn School of Medicine at Mount Sinai, University of Miami, University of Cincinnati, New York NY, Miami FL, Cincinnati OH, USA
| | - Bryce K Allen
- Center for Computational Science, University of Miami, FL, USA.,BD2K LINCS Data Coordination and Integration Center, Icahn School of Medicine at Mount Sinai, University of Miami, University of Cincinnati, New York NY, Miami FL, Cincinnati OH, USA.,Department of Molecular and Cellular Pharmacology, Miller School of Medicine, University of Miami, FL, USA
| | - Denis Torre
- BD2K LINCS Data Coordination and Integration Center, Icahn School of Medicine at Mount Sinai, University of Miami, University of Cincinnati, New York NY, Miami FL, Cincinnati OH, USA.,Department of Pharmacological Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Kathleen M Jagodnik
- BD2K LINCS Data Coordination and Integration Center, Icahn School of Medicine at Mount Sinai, University of Miami, University of Cincinnati, New York NY, Miami FL, Cincinnati OH, USA.,Department of Pharmacological Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Lily Wang
- BD2K LINCS Data Coordination and Integration Center, Icahn School of Medicine at Mount Sinai, University of Miami, University of Cincinnati, New York NY, Miami FL, Cincinnati OH, USA.,Department of Pharmacological Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Sherry L Jenkins
- BD2K LINCS Data Coordination and Integration Center, Icahn School of Medicine at Mount Sinai, University of Miami, University of Cincinnati, New York NY, Miami FL, Cincinnati OH, USA.,Department of Pharmacological Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Christopher Mader
- Center for Computational Science, University of Miami, FL, USA.,BD2K LINCS Data Coordination and Integration Center, Icahn School of Medicine at Mount Sinai, University of Miami, University of Cincinnati, New York NY, Miami FL, Cincinnati OH, USA
| | - Wen Niu
- BD2K LINCS Data Coordination and Integration Center, Icahn School of Medicine at Mount Sinai, University of Miami, University of Cincinnati, New York NY, Miami FL, Cincinnati OH, USA.,Division of Biostatistics and Bioinformatics, Department of Environmental Health, University of Cincinnati, Cincinnati, OH, USA
| | - Mehdi Fazel
- BD2K LINCS Data Coordination and Integration Center, Icahn School of Medicine at Mount Sinai, University of Miami, University of Cincinnati, New York NY, Miami FL, Cincinnati OH, USA.,Division of Biostatistics and Bioinformatics, Department of Environmental Health, University of Cincinnati, Cincinnati, OH, USA
| | - Naim Mahi
- BD2K LINCS Data Coordination and Integration Center, Icahn School of Medicine at Mount Sinai, University of Miami, University of Cincinnati, New York NY, Miami FL, Cincinnati OH, USA.,Division of Biostatistics and Bioinformatics, Department of Environmental Health, University of Cincinnati, Cincinnati, OH, USA
| | - Marcin Pilarczyk
- BD2K LINCS Data Coordination and Integration Center, Icahn School of Medicine at Mount Sinai, University of Miami, University of Cincinnati, New York NY, Miami FL, Cincinnati OH, USA.,Division of Biostatistics and Bioinformatics, Department of Environmental Health, University of Cincinnati, Cincinnati, OH, USA
| | - Nicholas Clark
- BD2K LINCS Data Coordination and Integration Center, Icahn School of Medicine at Mount Sinai, University of Miami, University of Cincinnati, New York NY, Miami FL, Cincinnati OH, USA.,Division of Biostatistics and Bioinformatics, Department of Environmental Health, University of Cincinnati, Cincinnati, OH, USA
| | - Behrouz Shamsaei
- BD2K LINCS Data Coordination and Integration Center, Icahn School of Medicine at Mount Sinai, University of Miami, University of Cincinnati, New York NY, Miami FL, Cincinnati OH, USA.,Division of Biostatistics and Bioinformatics, Department of Environmental Health, University of Cincinnati, Cincinnati, OH, USA
| | - Jarek Meller
- BD2K LINCS Data Coordination and Integration Center, Icahn School of Medicine at Mount Sinai, University of Miami, University of Cincinnati, New York NY, Miami FL, Cincinnati OH, USA.,Division of Biostatistics and Bioinformatics, Department of Environmental Health, University of Cincinnati, Cincinnati, OH, USA
| | - Juozas Vasiliauskas
- BD2K LINCS Data Coordination and Integration Center, Icahn School of Medicine at Mount Sinai, University of Miami, University of Cincinnati, New York NY, Miami FL, Cincinnati OH, USA.,Division of Biostatistics and Bioinformatics, Department of Environmental Health, University of Cincinnati, Cincinnati, OH, USA
| | - John Reichard
- BD2K LINCS Data Coordination and Integration Center, Icahn School of Medicine at Mount Sinai, University of Miami, University of Cincinnati, New York NY, Miami FL, Cincinnati OH, USA.,Division of Biostatistics and Bioinformatics, Department of Environmental Health, University of Cincinnati, Cincinnati, OH, USA
| | - Mario Medvedovic
- BD2K LINCS Data Coordination and Integration Center, Icahn School of Medicine at Mount Sinai, University of Miami, University of Cincinnati, New York NY, Miami FL, Cincinnati OH, USA.,Division of Biostatistics and Bioinformatics, Department of Environmental Health, University of Cincinnati, Cincinnati, OH, USA
| | - Avi Ma'ayan
- BD2K LINCS Data Coordination and Integration Center, Icahn School of Medicine at Mount Sinai, University of Miami, University of Cincinnati, New York NY, Miami FL, Cincinnati OH, USA.,Department of Pharmacological Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Ajay Pillai
- Division of Genome Sciences, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Stephan C Schürer
- Center for Computational Science, University of Miami, FL, USA.,BD2K LINCS Data Coordination and Integration Center, Icahn School of Medicine at Mount Sinai, University of Miami, University of Cincinnati, New York NY, Miami FL, Cincinnati OH, USA.,Department of Molecular and Cellular Pharmacology, Miller School of Medicine, University of Miami, FL, USA
| |
Collapse
|
19
|
Liechti R, George N, Götz L, El-Gebali S, Chasapi A, Crespo I, Xenarios I, Lemberger T. SourceData: a semantic platform for curating and searching figures. Nat Methods 2019; 14:1021-1022. [PMID: 29088127 DOI: 10.1038/nmeth.4471] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Affiliation(s)
- Robin Liechti
- Vital-IT, Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Nancy George
- EMBO, Heidelberg, Germany.,EMBL-EBI, Hinxton, UK
| | - Lou Götz
- Vital-IT, Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | | | | | - Isaac Crespo
- Vital-IT, Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Ioannis Xenarios
- Vital-IT, Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | | |
Collapse
|
20
|
Improving the Utility of the Tox21 Dataset by Deep Metadata Annotations and Constructing Reusable Benchmarked Chemical Reference Signatures. Molecules 2019; 24:molecules24081604. [PMID: 31018579 PMCID: PMC6515292 DOI: 10.3390/molecules24081604] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2019] [Revised: 04/16/2019] [Accepted: 04/19/2019] [Indexed: 02/03/2023] Open
Abstract
The Toxicology in the 21st Century (Tox21) project seeks to develop and test methods for high-throughput examination of the effect certain chemical compounds have on biological systems. Although primary and toxicity assay data were readily available for multiple reporter gene modified cell lines, extensive annotation and curation was required to improve these datasets with respect to how FAIR (Findable, Accessible, Interoperable, and Reusable) they are. In this study, we fully annotated the Tox21 published data with relevant and accepted controlled vocabularies. After removing unreliable data points, we aggregated the results and created three sets of signatures reflecting activity in the reporter gene assays, cytotoxicity, and selective reporter gene activity, respectively. We benchmarked these signatures using the chemical structures of the tested compounds and obtained generally high receiver operating characteristic (ROC) scores, suggesting good quality and utility of these signatures and the underlying data. We analyzed the results to identify promiscuous individual compounds and chemotypes for the three signature categories and interpreted the results to illustrate the utility and re-usability of the datasets. With this study, we aimed to demonstrate the importance of data standards in reporting screening results and high-quality annotations to enable re-use and interpretation of these data. To improve the data with respect to all FAIR criteria, all assay annotations, cleaned and aggregate datasets, and signatures were made available as standardized dataset packages (Aggregated Tox21 bioactivity data, 2019).
Collapse
|
21
|
Kanza S, Frey JG. A new wave of innovation in Semantic web tools for drug discovery. Expert Opin Drug Discov 2019; 14:433-444. [DOI: 10.1080/17460441.2019.1586880] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Affiliation(s)
- Samantha Kanza
- Department of Chemistry, Highfield Campus, University of Southampton, Southampton, UK
| | - Jeremy Graham Frey
- Department of Chemistry, Highfield Campus, University of Southampton, Southampton, UK
| |
Collapse
|
22
|
Nikitina AA, Orlov AA, Kozlovskaya LI, Palyulin VA, Osolodkin DI. Enhanced taxonomy annotation of antiviral activity data from ChEMBL. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2019; 2019:5308407. [PMID: 30753475 PMCID: PMC6367519 DOI: 10.1093/database/bay139] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/21/2018] [Accepted: 12/09/2018] [Indexed: 11/14/2022]
Abstract
The discovery of antiviral drugs is a rapidly developing area of medicinal chemistry research. The emergence of resistant variants and outbreaks of poorly studied viral diseases make this area constantly developing. The amount of antiviral activity data available in ChEMBL consistently grows, but virus taxonomy annotation of these data is not sufficient for thorough studies of antiviral chemical space. We developed a procedure for semi-automatic extraction of antiviral activity data from ChEMBL and mapped them to the virus taxonomy developed by the International Committee for Taxonomy of Viruses (ICTV). The procedure is based on the lists of virus-related values of ChEMBL annotation fields and a dictionary of virus names and acronyms mapped to ICTV taxa. Application of this data extraction procedure allows retrieving from ChEMBL 1.6 times more assays linked to 2.5 times more compounds and data points than ChEMBL web interface allows. Mapping of these data to ICTV taxa allows analyzing all the compounds tested against each viral species. Activity values and structures of the compounds were standardized, and the antiviral activity profile was created for each standard structure. Data set compiled using this algorithm was called ViralChEMBL. As case studies, we compared descriptor and scaffold distributions for the full ChEMBL and its `viral' and `non-viral' subsets, identified the most studied compounds and created a self-organizing map for ViralChEMBL. Our approach to data annotation appeared to be a very efficient tool for the study of antiviral chemical space.
Collapse
Affiliation(s)
- Anastasia A Nikitina
- FSBSI "Chumakov FSC R&D IBP RAS", Moscow, Russia.,Department of Chemistry, Lomonosov Moscow State University, Moscow, Russia
| | - Alexey A Orlov
- FSBSI "Chumakov FSC R&D IBP RAS", Moscow, Russia.,Department of Chemistry, Lomonosov Moscow State University, Moscow, Russia
| | - Liubov I Kozlovskaya
- FSBSI "Chumakov FSC R&D IBP RAS", Moscow, Russia.,Institute of Translational Medicine and Biotechnology, Sechenov First Moscow State Medical University, Moscow, Russia
| | | | - Dmitry I Osolodkin
- FSBSI "Chumakov FSC R&D IBP RAS", Moscow, Russia.,Department of Chemistry, Lomonosov Moscow State University, Moscow, Russia.,Institute of Translational Medicine and Biotechnology, Sechenov First Moscow State Medical University, Moscow, Russia
| |
Collapse
|
23
|
Stanford NJ, Scharm M, Dobson PD, Golebiewski M, Hucka M, Kothamachu VB, Nickerson D, Owen S, Pahle J, Wittig U, Waltemath D, Goble C, Mendes P, Snoep J. Data Management in Computational Systems Biology: Exploring Standards, Tools, Databases, and Packaging Best Practices. Methods Mol Biol 2019; 2049:285-314. [PMID: 31602618 DOI: 10.1007/978-1-4939-9736-7_17] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
Computational systems biology involves integrating heterogeneous datasets in order to generate models. These models can assist with understanding and prediction of biological phenomena. Generating datasets and integrating them into models involves a wide range of scientific expertise. As a result these datasets are often collected by one set of researchers, and exchanged with others researchers for constructing the models. For this process to run smoothly the data and models must be FAIR-findable, accessible, interoperable, and reusable. In order for data and models to be FAIR they must be structured in consistent and predictable ways, and described sufficiently for other researchers to understand them. Furthermore, these data and models must be shared with other researchers, with appropriately controlled sharing permissions, before and after publication. In this chapter we explore the different data and model standards that assist with structuring, describing, and sharing. We also highlight the popular standards and sharing databases within computational systems biology.
Collapse
Affiliation(s)
| | - Martin Scharm
- Department of Systems Biology and Bioinformatics, University of Rostock, Rostock, Germany
| | - Paul D Dobson
- School of Computer Science, University of Manchester, Manchester, UK
| | - Martin Golebiewski
- Heidelberg Institute for Theoretical Studies (HITS), Heidelberg, Germany
| | - Michael Hucka
- Computing and Mathematical Sciences, California Institute of Technology, Pasadena, CA, USA
| | | | - David Nickerson
- Auckland Bioengineering Institute, University of Auckland, Auckland, New Zealand
| | - Stuart Owen
- School of Computer Science, University of Manchester, Manchester, UK
| | - Jürgen Pahle
- BIOMS/BioQuant, Heidelberg University, Heidelberg, Germany.
| | - Ulrike Wittig
- Heidelberg Institute for Theoretical Studies (HITS), Heidelberg, Germany
| | - Dagmar Waltemath
- Medical Informatics, University Medicine Greifswald, Greifswald, Germany
| | - Carole Goble
- School of Computer Science, University of Manchester, Manchester, UK
| | - Pedro Mendes
- Centre for Quantitative Medicine, University of Connecticut, Farmington, CT, USA
| | - Jacky Snoep
- School of Computer Science, University of Manchester, Manchester, UK.,Biochemistry, Stellenbosch University, Stellenbosch, South Africa
| |
Collapse
|
24
|
Küçük McGinty H, Visser U, Schürer S. How to Develop a Drug Target Ontology: KNowledge Acquisition and Representation Methodology (KNARM). Methods Mol Biol 2019; 1939:49-69. [PMID: 30848456 PMCID: PMC7257161 DOI: 10.1007/978-1-4939-9089-4_4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/06/2023]
Abstract
Technological advancements in many fields have led to huge increases in data production, including data volume, diversity, and the speed at which new data is becoming available. In accordance with this, there is a lack of conformity in the ways data is interpreted. This era of "big data" provides unprecedented opportunities for data-driven research and "big picture" models. However, in-depth analyses-making use of various data types and data sources and extracting knowledge-have become a more daunting task. This is especially the case in life sciences where simplification and flattening of diverse data types often lead to incorrect predictions. Effective applications of big data approaches in life sciences require better, knowledge-based, semantic models that are suitable as a framework for big data integration, while avoiding oversimplifications, such as reducing various biological data types to the gene level. A huge hurdle in developing such semantic knowledge models, or ontologies, is the knowledge acquisition bottleneck. Automated methods are still very limited, and significant human expertise is required. In this chapter, we describe a methodology to systematize this knowledge acquisition and representation challenge, termed KNowledge Acquisition and Representation Methodology (KNARM). We then describe application of the methodology while implementing the Drug Target Ontology (DTO). We aimed to create an approach, involving domain experts and knowledge engineers, to build useful, comprehensive, consistent ontologies that will enable big data approaches in the domain of drug discovery, without the currently common simplifications.
Collapse
Affiliation(s)
- Hande Küçük McGinty
- Department of Computer Science, University of Miami, Coral Gables, FL, USA
- Collaborative Drug Discovery, Inc., Burlingame, CA, USA
| | - Ubbo Visser
- Department of Computer Science, University of Miami, Coral Gables, FL, USA
| | - Stephan Schürer
- Department of Molecular and Cellular Pharmacology, Miller School of Medicine, University of Miami, Miami, FL, USA.
- Center for Computational Science, University of Miami, Coral Gables, FL, USA.
| |
Collapse
|
25
|
Hunter FMI, L Atkinson F, Bento AP, Bosc N, Gaulton A, Hersey A, Leach AR. A large-scale dataset of in vivo pharmacology assay results. Sci Data 2018; 5:180230. [PMID: 30351302 PMCID: PMC6206617 DOI: 10.1038/sdata.2018.230] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2018] [Accepted: 09/03/2018] [Indexed: 12/17/2022] Open
Abstract
ChEMBL is a large-scale, open-access drug discovery resource containing bioactivity
information primarily extracted from scientific literature. A substantial dataset of more
than 135,000 in vivo assays has been collated as a key resource of animal models for
translational medicine within drug discovery. To improve the utility of the in vivo
data, an extensive data curation task has been undertaken that allows the assays to be
grouped by animal disease model or phenotypic endpoint. The dataset contains previously
unavailable information about compounds or drugs tested in animal models and, in conjunction
with assay data on protein targets or cell- or tissue- based systems, allows the
investigation of the effects of compounds at differing levels of biological complexity.
Equally, it enables researchers to identify compounds that have been investigated for a
group of disease-, pharmacology- or toxicity-relevant assays.
Collapse
Affiliation(s)
- Fiona M I Hunter
- European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
| | - Francis L Atkinson
- European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
| | - A Patrícia Bento
- European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
| | - Nicolas Bosc
- European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
| | - Anna Gaulton
- European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
| | - Anne Hersey
- European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
| | - Andrew R Leach
- European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
| |
Collapse
|
26
|
Sustainable data and metadata management at the BD2K-LINCS Data Coordination and Integration Center. Sci Data 2018; 5:180117. [PMID: 29917015 PMCID: PMC6007090 DOI: 10.1038/sdata.2018.117] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2017] [Accepted: 05/11/2018] [Indexed: 12/18/2022] Open
Abstract
The NIH-funded LINCS Consortium is creating an extensive reference library of cell-based perturbation response signatures and sophisticated informatics tools incorporating a large number of perturbagens, model systems, and assays. To date, more than 350 datasets have been generated including transcriptomics, proteomics, epigenomics, cell phenotype and competitive binding profiling assays. The large volume and variety of data necessitate rigorous data standards and effective data management including modular data processing pipelines and end-user interfaces to facilitate accurate and reliable data exchange, curation, validation, standardization, aggregation, integration, and end user access. Deep metadata annotations and the use of qualified data standards enable integration with many external resources. Here we describe the end-to-end data processing and management at the DCIC to generate a high-quality and persistent product. Our data management and stewardship solutions enable a functioning Consortium and make LINCS a valuable scientific resource that aligns with big data initiatives such as the BD2K NIH Program and concords with emerging data science best practices including the findable, accessible, interoperable, and reusable (FAIR) principles.
Collapse
|
27
|
Mervin LH, Afzal AM, Brive L, Engkvist O, Bender A. Extending in Silico Protein Target Prediction Models to Include Functional Effects. Front Pharmacol 2018; 9:613. [PMID: 29942259 PMCID: PMC6004408 DOI: 10.3389/fphar.2018.00613] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2018] [Accepted: 05/22/2018] [Indexed: 12/31/2022] Open
Abstract
In silico protein target deconvolution is frequently used for mechanism-of-action investigations; however existing protocols usually do not predict compound functional effects, such as activation or inhibition, upon binding to their protein counterparts. This study is hence concerned with including functional effects in target prediction. To this end, we assimilated a bioactivity training set for 332 targets, comprising 817,239 active data points with unknown functional effect (binding data) and 20,761,260 inactive compounds, along with 226,045 activating and 1,032,439 inhibiting data points from functional screens. Chemical space analysis of the data first showed some separation between compound sets (binding and inhibiting compounds were more similar to each other than both binding and activating or activating and inhibiting compounds), providing a rationale for implementing functional prediction models. We employed three different architectures to predict functional response, ranging from simplistic random forest models ('Arch1') to cascaded models which use separate binding and functional effect classification steps ('Arch2' and 'Arch3'), differing in the way training sets were generated. Fivefold stratified cross-validation outlined cascading predictions provides superior precision and recall based on an internal test set. We next prospectively validated the architectures using a temporal set of 153,467 of in-house data points (after a 4-month interim from initial data extraction). Results outlined Arch3 performed with the highest target class averaged precision and recall scores of 71% and 53%, which we attribute to the use of inactive background sets. Distance-based applicability domain (AD) analysis outlined that Arch3 provides superior extrapolation into novel areas of chemical space, and thus based on the results presented here, propose as the most suitable architecture for the functional effect prediction of small molecules. We finally conclude including functional effects could provide vital insight in future studies, to annotate cases of unanticipated functional changeover, as outlined by our CHRM1 case study.
Collapse
Affiliation(s)
- Lewis H Mervin
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Cambridge, United Kingdom
| | - Avid M Afzal
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Cambridge, United Kingdom
| | | | - Ola Engkvist
- Hit Discovery, Discovery Sciences, IMED Biotech Unit, AstraZeneca, Gothenburg, Sweden
| | - Andreas Bender
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Cambridge, United Kingdom
| |
Collapse
|
28
|
Giraldo O, Garcia A, Corcho O. A guideline for reporting experimental protocols in life sciences. PeerJ 2018; 6:e4795. [PMID: 29868256 PMCID: PMC5978404 DOI: 10.7717/peerj.4795] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2017] [Accepted: 04/29/2018] [Indexed: 01/01/2023] Open
Abstract
Experimental protocols are key when planning, performing and publishing research in many disciplines, especially in relation to the reporting of materials and methods. However, they vary in their content, structure and associated data elements. This article presents a guideline for describing key content for reporting experimental protocols in the domain of life sciences, together with the methodology followed in order to develop such guideline. As part of our work, we propose a checklist that contains 17 data elements that we consider fundamental to facilitate the execution of the protocol. These data elements are formally described in the SMART Protocols ontology. By providing guidance for the key content to be reported, we aim (1) to make it easier for authors to report experimental protocols with necessary and sufficient information that allow others to reproduce an experiment, (2) to promote consistency across laboratories by delivering an adaptable set of data elements, and (3) to make it easier for reviewers and editors to measure the quality of submitted manuscripts against an established criteria. Our checklist focuses on the content, what should be included. Rather than advocating a specific format for protocols in life sciences, the checklist includes a full description of the key data elements that facilitate the execution of the protocol.
Collapse
Affiliation(s)
- Olga Giraldo
- Ontology Engineering Group, Campus de Montegancedo, Boadilla del Monte, Universidad Politécnica de Madrid, Madrid, Spain
| | - Alexander Garcia
- Ontology Engineering Group, Campus de Montegancedo, Boadilla del Monte, Universidad Politécnica de Madrid, Madrid, Spain
- Technische Universität Graz, Graz, Austria
| | - Oscar Corcho
- Ontology Engineering Group, Campus de Montegancedo, Boadilla del Monte, Universidad Politécnica de Madrid, Madrid, Spain
| |
Collapse
|
29
|
He Y, Xiang Z, Zheng J, Lin Y, Overton JA, Ong E. The eXtensible ontology development (XOD) principles and tool implementation to support ontology interoperability. J Biomed Semantics 2018; 9:3. [PMID: 29329592 PMCID: PMC5765662 DOI: 10.1186/s13326-017-0169-2] [Citation(s) in RCA: 36] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2017] [Accepted: 12/07/2017] [Indexed: 11/13/2022] Open
Abstract
Ontologies are critical to data/metadata and knowledge standardization, sharing, and analysis. With hundreds of biological and biomedical ontologies developed, it has become critical to ensure ontology interoperability and the usage of interoperable ontologies for standardized data representation and integration. The suite of web-based Ontoanimal tools (e.g., Ontofox, Ontorat, and Ontobee) support different aspects of extensible ontology development. By summarizing the common features of Ontoanimal and other similar tools, we identified and proposed an “eXtensible Ontology Development” (XOD) strategy and its associated four principles. These XOD principles reuse existing terms and semantic relations from reliable ontologies, develop and apply well-established ontology design patterns (ODPs), and involve community efforts to support new ontology development, promoting standardized and interoperable data and knowledge representation and integration. The adoption of the XOD strategy, together with robust XOD tool development, will greatly support ontology interoperability and robust ontology applications to support data to be Findable, Accessible, Interoperable and Reusable (i.e., FAIR).
Collapse
Affiliation(s)
- Yongqun He
- Unit for Laboratory Animal Medicine, Department of Microbiology and Immunology, Center for Computational Medicine and Bioinformatics, University of Michigan Medical School, Ann Arbor, MI, USA.
| | - Zuoshuang Xiang
- Unit for Laboratory Animal Medicine, Department of Microbiology and Immunology, Center for Computational Medicine and Bioinformatics, University of Michigan Medical School, Ann Arbor, MI, USA
| | - Jie Zheng
- Department of Genetics, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, 19104, USA
| | - Yu Lin
- Center for Computational Science, University of Miami, Coral Gables, FL, USA
| | | | - Edison Ong
- Department of Computational Medicine and Bioinformatics, University of Michigan Medical School, Ann Arbor, MI, USA
| |
Collapse
|
30
|
Harding SD, Sharman JL, Faccenda E, Southan C, Pawson AJ, Ireland S, Gray AJG, Bruce L, Alexander SPH, Anderton S, Bryant C, Davenport AP, Doerig C, Fabbro D, Levi-Schaffer F, Spedding M, Davies JA. The IUPHAR/BPS Guide to PHARMACOLOGY in 2018: updates and expansion to encompass the new guide to IMMUNOPHARMACOLOGY. Nucleic Acids Res 2018; 46:D1091-D1106. [PMID: 29149325 PMCID: PMC5753190 DOI: 10.1093/nar/gkx1121] [Citation(s) in RCA: 1455] [Impact Index Per Article: 242.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2017] [Accepted: 10/25/2017] [Indexed: 02/06/2023] Open
Abstract
The IUPHAR/BPS Guide to PHARMACOLOGY (GtoPdb, www.guidetopharmacology.org) and its precursor IUPHAR-DB, have captured expert-curated interactions between targets and ligands from selected papers in pharmacology and drug discovery since 2003. This resource continues to be developed in conjunction with the International Union of Basic and Clinical Pharmacology (IUPHAR) and the British Pharmacological Society (BPS). As previously described, our unique model of content selection and quality control is based on 96 target-class subcommittees comprising 512 scientists collaborating with in-house curators. This update describes content expansion, new features and interoperability improvements introduced in the 10 releases since August 2015. Our relationship matrix now describes ∼9000 ligands, ∼15 000 binding constants, ∼6000 papers and ∼1700 human proteins. As an important addition, we also introduce our newly funded project for the Guide to IMMUNOPHARMACOLOGY (GtoImmuPdb, www.guidetoimmunopharmacology.org). This has been 'forked' from the well-established GtoPdb data model and expanded into new types of data related to the immune system and inflammatory processes. This includes new ligands, targets, pathways, cell types and diseases for which we are recruiting new IUPHAR expert committees. Designed as an immunopharmacological gateway, it also has an emphasis on potential therapeutic interventions.
Collapse
Affiliation(s)
- Simon D Harding
- Deanery of Biomedical Sciences, University of Edinburgh, Edinburgh EH8 9XD, UK
| | - Joanna L Sharman
- Deanery of Biomedical Sciences, University of Edinburgh, Edinburgh EH8 9XD, UK
| | - Elena Faccenda
- Deanery of Biomedical Sciences, University of Edinburgh, Edinburgh EH8 9XD, UK
| | - Chris Southan
- Deanery of Biomedical Sciences, University of Edinburgh, Edinburgh EH8 9XD, UK
| | - Adam J Pawson
- Deanery of Biomedical Sciences, University of Edinburgh, Edinburgh EH8 9XD, UK
| | - Sam Ireland
- Department of Structural & Molecular Biology, University College London, London WC1E 6BT, UK
| | - Alasdair J G Gray
- School of Mathematical and Computer Sciences, Heriot-Watt University, Edinburgh EH14 4AS, UK
| | - Liam Bruce
- School of Mathematical and Computer Sciences, Heriot-Watt University, Edinburgh EH14 4AS, UK
| | - Stephen P H Alexander
- School of Life Sciences, University of Nottingham Medical School, Nottingham NG7 2UH, UK
| | - Stephen Anderton
- MRC Centre for inflammation Research, University of Edinburgh, Edinburgh EH16 4TJ, UK
| | - Clare Bryant
- Department of Veterinary Medicine, University of Cambridge, Cambridge CB3 0ES, UK
| | - Anthony P Davenport
- Experimental Medicine and Immunotherapeutics, University of Cambridge, Cambridge CB2 0QQ, UK
| | - Christian Doerig
- Department of Microbiology, Monash University, Clayton 3800, Australia
| | | | - Francesca Levi-Schaffer
- Pharmacology and Experimental Therapeutics Unit, School of Pharmacy, Institute for Drug Research, Hebrew University of Jerusalem, Jerusalem 9112102, Israel
| | | | - Jamie A Davies
- Deanery of Biomedical Sciences, University of Edinburgh, Edinburgh EH8 9XD, UK
- To whom correspondence should be addressed. Tel: +44 131 650 2999;
| | | |
Collapse
|
31
|
Tanoli Z, Alam Z, Vähä-Koskela M, Ravikumar B, Malyutina A, Jaiswal A, Tang J, Wennerberg K, Aittokallio T. Drug Target Commons 2.0: a community platform for systematic analysis of drug-target interaction profiles. Database (Oxford) 2018; 2018:1-13. [PMID: 30219839 PMCID: PMC6146131 DOI: 10.1093/database/bay083] [Citation(s) in RCA: 27] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2018] [Revised: 06/27/2018] [Accepted: 07/18/2018] [Indexed: 12/20/2022]
Abstract
Drug Target Commons (DTC) is a web platform (database with user interface) for community-driven bioactivity data integration and standardization for comprehensive mapping, reuse and analysis of compound-target interaction profiles. End users can search, upload, edit, annotate and export expert-curated bioactivity data for further analysis, using an application programmable interface, database dump or tab-delimited text download options. To guide chemical biology and drug-repurposing applications, DTC version 2.0 includes updated clinical development information for the compounds and target gene-disease associations, as well as cancer-type indications for mutant protein targets, which are critical for precision oncology developments.
Collapse
Affiliation(s)
- ZiaurRehman Tanoli
- Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki, Finland
| | - Zaid Alam
- Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki, Finland
| | - Markus Vähä-Koskela
- Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki, Finland
| | - Balaguru Ravikumar
- Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki, Finland
| | - Alina Malyutina
- Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki, Finland
| | - Alok Jaiswal
- Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki, Finland
| | - Jing Tang
- Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki, Finland
- Department of Mathematics and Statistics, University of Turku, Turku, Finland
| | - Krister Wennerberg
- Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki, Finland
- Biotech Research & Innovation Centre (BRIC), University of Copenhagen, Copenhagen, Denmark
| | - Tero Aittokallio
- Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki, Finland
- Department of Mathematics and Statistics, University of Turku, Turku, Finland
| |
Collapse
|
32
|
Ong E, Xie J, Ni Z, Liu Q, Sarntivijai S, Lin Y, Cooper D, Terryn R, Stathias V, Chung C, Schürer S, He Y. Ontological representation, integration, and analysis of LINCS cell line cells and their cellular responses. BMC Bioinformatics 2017; 18:556. [PMID: 29322930 PMCID: PMC5763302 DOI: 10.1186/s12859-017-1981-5] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
Background Aiming to understand cellular responses to different perturbations, the NIH Common Fund Library of Integrated Network-based Cellular Signatures (LINCS) program involves many institutes and laboratories working on over a thousand cell lines. The community-based Cell Line Ontology (CLO) is selected as the default ontology for LINCS cell line representation and integration. Results CLO has consistently represented all 1097 LINCS cell lines and included information extracted from the LINCS Data Portal and ChEMBL. Using MCF 10A cell line cells as an example, we demonstrated how to ontologically model LINCS cellular signatures such as their non-tumorigenic epithelial cell type, three-dimensional growth, latrunculin-A-induced actin depolymerization and apoptosis, and cell line transfection. A CLO subset view of LINCS cell lines, named LINCS-CLOview, was generated to support systematic LINCS cell line analysis and queries. In summary, LINCS cell lines are currently associated with 43 cell types, 131 tissues and organs, and 121 cancer types. The LINCS-CLO view information can be queried using SPARQL scripts. Conclusions CLO was used to support ontological representation, integration, and analysis of over a thousand LINCS cell line cells and their cellular responses.
Collapse
Affiliation(s)
- Edison Ong
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Jiangan Xie
- Unit of Laboratory Animal Medicine and Department of Micro biology and Immunology, University of Michigan, Ann Arbor, MI, USA
| | - Zhaohui Ni
- Unit of Laboratory Animal Medicine and Department of Micro biology and Immunology, University of Michigan, Ann Arbor, MI, USA
| | - Qingping Liu
- Unit of Laboratory Animal Medicine and Department of Micro biology and Immunology, University of Michigan, Ann Arbor, MI, USA
| | - Sirarat Sarntivijai
- Samples, Phenotypes and Ontologies Team, European Bioinformatics Institute (EMBL-EBI), European Molecular Biology Laboratory, Hinxton, Cambridge, UK
| | - Yu Lin
- Department of Molecular and Cellular Pharmacology, University of Miami, Miami, FL, USA
| | - Daniel Cooper
- Department of Molecular and Cellular Pharmacology, University of Miami, Miami, FL, USA.,BD2K LINCS Data Coordination and Integration Center, University of Miami, Miami, FL, USA
| | - Raymond Terryn
- Department of Molecular and Cellular Pharmacology, University of Miami, Miami, FL, USA.,BD2K LINCS Data Coordination and Integration Center, University of Miami, Miami, FL, USA
| | - Vasileios Stathias
- Department of Molecular and Cellular Pharmacology, University of Miami, Miami, FL, USA.,BD2K LINCS Data Coordination and Integration Center, University of Miami, Miami, FL, USA
| | - Caty Chung
- BD2K LINCS Data Coordination and Integration Center, University of Miami, Miami, FL, USA.,Center for Computational Science, University of Miami, Miami, FL, USA
| | - Stephan Schürer
- Department of Molecular and Cellular Pharmacology, University of Miami, Miami, FL, USA. .,BD2K LINCS Data Coordination and Integration Center, University of Miami, Miami, FL, USA. .,Center for Computational Science, University of Miami, Miami, FL, USA.
| | - Yongqun He
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA. .,Unit of Laboratory Animal Medicine and Department of Micro biology and Immunology, University of Michigan, Ann Arbor, MI, USA.
| |
Collapse
|
33
|
Drug Target Commons: A Community Effort to Build a Consensus Knowledge Base for Drug-Target Interactions. Cell Chem Biol 2017; 25:224-229.e2. [PMID: 29276046 PMCID: PMC5814751 DOI: 10.1016/j.chembiol.2017.11.009] [Citation(s) in RCA: 85] [Impact Index Per Article: 12.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2017] [Revised: 10/03/2017] [Accepted: 11/20/2017] [Indexed: 11/23/2022]
Abstract
Knowledge of the full target space of bioactive substances, approved and investigational drugs as well as chemical probes, provides important insights into therapeutic potential and possible adverse effects. The existing compound-target bioactivity data resources are often incomparable due to non-standardized and heterogeneous assay types and variability in endpoint measurements. To extract higher value from the existing and future compound target-profiling data, we implemented an open-data web platform, named Drug Target Commons (DTC), which features tools for crowd-sourced compound-target bioactivity data annotation, standardization, curation, and intra-resource integration. We demonstrate the unique value of DTC with several examples related to both drug discovery and drug repurposing applications and invite researchers to join this community effort to increase the reuse and extension of compound bioactivity data. DTC is a crowd-sourcing-based web platform to annotate drug-target bioactivity data The open environment improves data harmonization for drug repurposing applications DTC offers a comprehensive, reproducible, and sustainable bioactivity knowledge base
Collapse
|
34
|
Giraldo O, García A, López F, Corcho O. Using semantics for representing experimental protocols. J Biomed Semantics 2017; 8:52. [PMID: 29132408 PMCID: PMC5683383 DOI: 10.1186/s13326-017-0160-y] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2016] [Accepted: 10/15/2017] [Indexed: 02/19/2024] Open
Abstract
Background An experimental protocol is a sequence of tasks and operations executed to perform experimental research in biological and biomedical areas, e.g. biology, genetics, immunology, neurosciences, virology. Protocols often include references to equipment, reagents, descriptions of critical steps, troubleshooting and tips, as well as any other information that researchers deem important for facilitating the reusability of the protocol. Although experimental protocols are central to reproducibility, the descriptions are often cursory. There is the need for a unified framework with respect to the syntactic structure and the semantics for representing experimental protocols. Results In this paper we present “SMART Protocols ontology”, an ontology for representing experimental protocols. Our ontology represents the protocol as a workflow with domain specific knowledge embedded within a document. We also present the Sample Instrument Reagent Objective (SIRO) model, which represents the minimal common information shared across experimental protocols. SIRO was conceived in the same realm as the Patient Intervention Comparison Outcome (PICO) model that supports search, retrieval and classification purposes in evidence based medicine. We evaluate our approach against a set of competency questions modeled as SPARQL queries and processed against a set of published and unpublished protocols modeled with the SP Ontology and the SIRO model. Our approach makes it possible to answer queries such as Which protocols use tumor tissue as a sample. Conclusion Improving reporting structures for experimental protocols requires collective efforts from authors, peer reviewers, editors and funding bodies. The SP Ontology is a contribution towards this goal. We build upon previous experiences and bringing together the view of researchers managing protocols in their laboratory work. Website: https://smartprotocols.github.io/.
Collapse
Affiliation(s)
- Olga Giraldo
- Ontology Engineering Group, Madrid, Universidad Politécnica de Madrid, Madrid, 28660, Spain.
| | - Alexander García
- Ontology Engineering Group, Madrid, Universidad Politécnica de Madrid, Madrid, 28660, Spain
| | | | - Oscar Corcho
- Ontology Engineering Group, Madrid, Universidad Politécnica de Madrid, Madrid, 28660, Spain
| |
Collapse
|
35
|
Lin Y, Mehta S, Küçük-McGinty H, Turner JP, Vidovic D, Forlin M, Koleti A, Nguyen DT, Jensen LJ, Guha R, Mathias SL, Ursu O, Stathias V, Duan J, Nabizadeh N, Chung C, Mader C, Visser U, Yang JJ, Bologa CG, Oprea TI, Schürer SC. Drug target ontology to classify and integrate drug discovery data. J Biomed Semantics 2017; 8:50. [PMID: 29122012 PMCID: PMC5679337 DOI: 10.1186/s13326-017-0161-x] [Citation(s) in RCA: 37] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2017] [Accepted: 10/17/2017] [Indexed: 11/12/2022] Open
Abstract
Background One of the most successful approaches to develop new small molecule therapeutics has been to start from a validated druggable protein target. However, only a small subset of potentially druggable targets has attracted significant research and development resources. The Illuminating the Druggable Genome (IDG) project develops resources to catalyze the development of likely targetable, yet currently understudied prospective drug targets. A central component of the IDG program is a comprehensive knowledge resource of the druggable genome. Results As part of that effort, we have developed a framework to integrate, navigate, and analyze drug discovery data based on formalized and standardized classifications and annotations of druggable protein targets, the Drug Target Ontology (DTO). DTO was constructed by extensive curation and consolidation of various resources. DTO classifies the four major drug target protein families, GPCRs, kinases, ion channels and nuclear receptors, based on phylogenecity, function, target development level, disease association, tissue expression, chemical ligand and substrate characteristics, and target-family specific characteristics. The formal ontology was built using a new software tool to auto-generate most axioms from a database while supporting manual knowledge acquisition. A modular, hierarchical implementation facilitate ontology development and maintenance and makes use of various external ontologies, thus integrating the DTO into the ecosystem of biomedical ontologies. As a formal OWL-DL ontology, DTO contains asserted and inferred axioms. Modeling data from the Library of Integrated Network-based Cellular Signatures (LINCS) program illustrates the potential of DTO for contextual data integration and nuanced definition of important drug target characteristics. DTO has been implemented in the IDG user interface Portal, Pharos and the TIN-X explorer of protein target disease relationships. Conclusions DTO was built based on the need for a formal semantic model for druggable targets including various related information such as protein, gene, protein domain, protein structure, binding site, small molecule drug, mechanism of action, protein tissue localization, disease association, and many other types of information. DTO will further facilitate the otherwise challenging integration and formal linking to biological assays, phenotypes, disease models, drug poly-pharmacology, binding kinetics and many other processes, functions and qualities that are at the core of drug discovery. The first version of DTO is publically available via the website http://drugtargetontology.org/, Github (http://github.com/DrugTargetOntology/DTO), and the NCBO Bioportal (http://bioportal.bioontology.org/ontologies/DTO). The long-term goal of DTO is to provide such an integrative framework and to populate the ontology with this information as a community resource. Electronic supplementary material The online version of this article (10.1186/s13326-017-0161-x) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Yu Lin
- Center for Computational Science, University of Miami, Coral Gables, FL, USA
| | - Saurabh Mehta
- Center for Computational Science, University of Miami, Coral Gables, FL, USA.,Department of Applied Chemistry, Delhi Technological University, Delhi, India
| | - Hande Küçük-McGinty
- Center for Computational Science, University of Miami, Coral Gables, FL, USA.,Department of Computer Science, University of Miami, Coral Gables, FL, USA
| | - John Paul Turner
- Department of Molecular and Cellular Pharmacology, Miller School of Medicine, University of Miami, Miami, FL, USA
| | - Dusica Vidovic
- Center for Computational Science, University of Miami, Coral Gables, FL, USA.,Department of Molecular and Cellular Pharmacology, Miller School of Medicine, University of Miami, Miami, FL, USA
| | - Michele Forlin
- Center for Computational Science, University of Miami, Coral Gables, FL, USA.,Department of Molecular and Cellular Pharmacology, Miller School of Medicine, University of Miami, Miami, FL, USA
| | - Amar Koleti
- Center for Computational Science, University of Miami, Coral Gables, FL, USA
| | - Dac-Trung Nguyen
- National Center for Advancing Translational Science, Rockville, MD, USA
| | - Lars Juhl Jensen
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Rajarshi Guha
- National Center for Advancing Translational Science, Rockville, MD, USA
| | - Stephen L Mathias
- Department of Internal Medicine, Translational Informatics Division, University of New Mexico School of Medicine, Albuquerque, NM, USA
| | - Oleg Ursu
- Department of Internal Medicine, Translational Informatics Division, University of New Mexico School of Medicine, Albuquerque, NM, USA
| | - Vasileios Stathias
- Department of Molecular and Cellular Pharmacology, Miller School of Medicine, University of Miami, Miami, FL, USA
| | - Jianbin Duan
- Center for Computational Science, University of Miami, Coral Gables, FL, USA.,Department of Computer Science, University of Miami, Coral Gables, FL, USA
| | - Nooshin Nabizadeh
- Center for Computational Science, University of Miami, Coral Gables, FL, USA
| | - Caty Chung
- Center for Computational Science, University of Miami, Coral Gables, FL, USA
| | - Christopher Mader
- Center for Computational Science, University of Miami, Coral Gables, FL, USA
| | - Ubbo Visser
- Department of Computer Science, University of Miami, Coral Gables, FL, USA
| | - Jeremy J Yang
- Department of Internal Medicine, Translational Informatics Division, University of New Mexico School of Medicine, Albuquerque, NM, USA
| | - Cristian G Bologa
- Department of Internal Medicine, Translational Informatics Division, University of New Mexico School of Medicine, Albuquerque, NM, USA
| | - Tudor I Oprea
- Department of Internal Medicine, Translational Informatics Division, University of New Mexico School of Medicine, Albuquerque, NM, USA.
| | - Stephan C Schürer
- Center for Computational Science, University of Miami, Coral Gables, FL, USA. .,Department of Molecular and Cellular Pharmacology, Miller School of Medicine, University of Miami, Miami, FL, USA.
| |
Collapse
|
36
|
Dragsted LO, Gao Q, Praticò G, Manach C, Wishart DS, Scalbert A, Feskens EJM. Dietary and health biomarkers-time for an update. GENES & NUTRITION 2017; 12:24. [PMID: 28974991 PMCID: PMC5622518 DOI: 10.1186/s12263-017-0578-y] [Citation(s) in RCA: 32] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/16/2017] [Accepted: 09/19/2017] [Indexed: 01/11/2023]
Abstract
In the dietary and health research area, biomarkers are extensively used for multiple purposes. These include biomarkers of dietary intake and nutrient status, biomarkers used to measure the biological effects of specific dietary components, and biomarkers to assess the effects of diet on health. The implementation of biomarkers in nutritional research will be important to improve measurements of dietary intake, exposure to specific dietary components, and of compliance to dietary interventions. Biomarkers could also help with improved characterization of nutritional status in study volunteers and to provide much mechanistic insight into the effects of food components and diets. Although hundreds of papers in nutrition are published annually, there is no current ontology for the area, no generally accepted classification terminology for biomarkers in nutrition and health, no systematic validation scheme for these biomarker classes, and no recent systematic review of all proposed biomarkers for food intake. While advanced databases exist for the human and food metabolomes, additional tools are needed to curate and evaluate current data on dietary and health biomarkers. The Food Biomarkers Alliance (FoodBAll) under the Joint Programming Initiative-A Healthy Diet for a Healthy Life (JPI-HDHL)-is aimed at meeting some of these challenges, identifying new dietary biomarkers, and producing new databases and review papers on biomarkers for nutritional research. This current paper outlines the needs and serves as an introduction to this thematic issue of Genes & Nutrition on dietary and health biomarkers.
Collapse
Affiliation(s)
- Lars O Dragsted
- Department of Nutrition, Exercise and Sports, University of Copenhagen, Copenhagen, Denmark
| | - Qian Gao
- Department of Nutrition, Exercise and Sports, University of Copenhagen, Copenhagen, Denmark
| | - Giulia Praticò
- Department of Nutrition, Exercise and Sports, University of Copenhagen, Copenhagen, Denmark
- Department of Food Science, University of Copenhagen, Copenhagen, Denmark
| | - Claudine Manach
- INRA, Human Nutrition Unit, Université Clermont Auvergne, F63000 Clermont-Ferrand, France
| | - David S Wishart
- Department of Biological Sciences, University of Alberta, Edmonton, Canada
| | - Augustin Scalbert
- Nutrition and Metabolism Section, Biomarkers Group, International Agency for Research on Cancer (IARC), Lyon, France
| | - Edith J M Feskens
- Division of Human Nutrition, Wageningen University & Research, Wageningen, The Netherlands
| |
Collapse
|
37
|
Burgoon LD. The AOPOntology: A Semantic Artificial Intelligence Tool for Predictive Toxicology. ACTA ACUST UNITED AC 2017. [DOI: 10.1089/aivt.2017.0012] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Affiliation(s)
- Lyle D. Burgoon
- U.S. Army Engineer Research and Development Center, Research Triangle Park, North Carolina
| |
Collapse
|
38
|
Cruz-Monteagudo M, Schürer S, Tejera E, Pérez-Castillo Y, Medina-Franco JL, Sánchez-Rodríguez A, Borges F. Systemic QSAR and phenotypic virtual screening: chasing butterflies in drug discovery. Drug Discov Today 2017; 22:994-1007. [PMID: 28274840 PMCID: PMC5487293 DOI: 10.1016/j.drudis.2017.02.004] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2016] [Revised: 02/02/2017] [Accepted: 02/27/2017] [Indexed: 12/20/2022]
Abstract
Current advances in systems biology suggest a new change of paradigm reinforcing the holistic nature of the drug discovery process. According to the principles of systems biology, a simple drug perturbing a network of targets can trigger complex reactions. Therefore, it is possible to connect initial events with final outcomes and consequently prioritize those events, leading to a desired effect. Here, we introduce a new concept, 'Systemic Chemogenomics/Quantitative Structure-Activity Relationship (QSAR)'. To elaborate on the concept, relevant information surrounding it is addressed. The concept is challenged by implementing a systemic QSAR approach for phenotypic virtual screening (VS) of candidate ligands acting as neuroprotective agents in Parkinson's disease (PD). The results support the suitability of the approach for the phenotypic prioritization of drug candidates.
Collapse
Affiliation(s)
- Maykel Cruz-Monteagudo
- CIQUP/Departamento de Química e Bioquímica, Faculdade de Ciências, Universidade do Porto, Porto 4169-007, Portugal.
| | - Stephan Schürer
- Department of Pharmacology, Miller School of Medicine and Center for Computational Science, University of Miami, Miami, FL 33136, USA
| | - Eduardo Tejera
- Instituto de Investigaciones Biomédicas (IIB), Universidad de Las Américas, 170513 Quito, Ecuador
| | - Yunierkis Pérez-Castillo
- Sección Físico Química y Matemáticas, Departamento de Química, Universidad Técnica Particular de Loja, San Cayetano Alto S/N, EC1101608 Loja, Ecuador
| | - José L Medina-Franco
- Universidad Nacional Autónoma de México, Departamento de Farmacia, Facultad de Química, Avenida Universidad 3000, Mexico City, 04510, Mexico
| | - Aminael Sánchez-Rodríguez
- Departamento de Ciencias Naturales, Universidad Técnica Particular de Loja, Calle París S/N, EC1101608 Loja, Ecuador
| | - Fernanda Borges
- CIQUP/Departamento de Química e Bioquímica, Faculdade de Ciências, Universidade do Porto, Porto 4169-007, Portugal.
| |
Collapse
|
39
|
Accelerating Adverse Outcome Pathway Development Using Publicly Available Data Sources. Curr Environ Health Rep 2016; 3:53-63. [PMID: 26809562 DOI: 10.1007/s40572-016-0079-y] [Citation(s) in RCA: 45] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
Abstract
The adverse outcome pathway (AOP) concept links molecular perturbations with organism and population-level outcomes to support high-throughput toxicity (HTT) testing. International efforts are underway to define AOPs and store the information supporting these AOPs in a central knowledge base; however, this process is currently labor-intensive and time-consuming. Publicly available data sources provide a wealth of information that could be used to define computationally predicted AOPs (cpAOPs), which could serve as a basis for creating expert-derived AOPs in a much more efficient way. Computational tools for mining large datasets provide the means for extracting and organizing the information captured in these public data sources. Using cpAOPs as a starting point for expert-derived AOPs should accelerate AOP development. Coupling this with tools to coordinate and facilitate the expert development efforts will increase the number and quality of AOPs produced, which should play a key role in advancing the adoption of HTT testing, thereby reducing the use of animals in toxicity testing and greatly increasing the number of chemicals that can be tested.
Collapse
|
40
|
Gaulton A, Hersey A, Nowotka M, Bento AP, Chambers J, Mendez D, Mutowo P, Atkinson F, Bellis LJ, Cibrián-Uhalte E, Davies M, Dedman N, Karlsson A, Magariños MP, Overington JP, Papadatos G, Smit I, Leach AR. The ChEMBL database in 2017. Nucleic Acids Res 2016; 45:D945-D954. [PMID: 27899562 PMCID: PMC5210557 DOI: 10.1093/nar/gkw1074] [Citation(s) in RCA: 1405] [Impact Index Per Article: 175.6] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2016] [Revised: 10/21/2016] [Accepted: 10/30/2016] [Indexed: 11/14/2022] Open
Abstract
ChEMBL is an open large-scale bioactivity database (https://www.ebi.ac.uk/chembl), previously described in the 2012 and 2014 Nucleic Acids Research Database Issues. Since then, alongside the continued extraction of data from the medicinal chemistry literature, new sources of bioactivity data have also been added to the database. These include: deposited data sets from neglected disease screening; crop protection data; drug metabolism and disposition data and bioactivity data from patents. A number of improvements and new features have also been incorporated. These include the annotation of assays and targets using ontologies, the inclusion of targets and indications for clinical candidates, addition of metabolic pathways for drugs and calculation of structural alerts. The ChEMBL data can be accessed via a web-interface, RDF distribution, data downloads and RESTful web-services.
Collapse
Affiliation(s)
- Anna Gaulton
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Anne Hersey
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Michał Nowotka
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - A Patrícia Bento
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK.,Open Targets, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Jon Chambers
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - David Mendez
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Prudence Mutowo
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Francis Atkinson
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Louisa J Bellis
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Elena Cibrián-Uhalte
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Mark Davies
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Nathan Dedman
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Anneli Karlsson
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - María Paula Magariños
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK.,Open Targets, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - John P Overington
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - George Papadatos
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Ines Smit
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Andrew R Leach
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| |
Collapse
|
41
|
Sharma R, Schürer SC, Muskal SM. High quality, small molecule-activity datasets for kinase research. F1000Res 2016; 5. [PMID: 27429748 DOI: 10.12688/f1000research.8950.1] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 06/03/2016] [Indexed: 01/22/2023] Open
Abstract
Kinases regulate cell growth, movement, and death. Deregulated kinase activity is a frequent cause of disease. The therapeutic potential of kinase inhibitors has led to large amounts of published structure activity relationship (SAR) data. Bioactivity databases such as the Kinase Knowledgebase (KKB), WOMBAT, GOSTAR, and ChEMBL provide researchers with quantitative data characterizing the activity of compounds across many biological assays. The KKB, for example, contains over 1.8M kinase structure-activity data points reported in peer-reviewed journals and patents. In the spirit of fostering methods development and validation worldwide, we have extracted and have made available from the KKB 258K structure activity data points and 76K associated unique chemical structures across eight kinase targets. These data are freely available for download within this data note.
Collapse
Affiliation(s)
- Rajan Sharma
- Eidogen-Sertanty, Inc., Oceanside, CA, 92056, USA
| | - Stephan C Schürer
- Department of Pharmacology, Miller School of Medicine and Center for Computational Science, University of Miami, Miami, FL, 33136, USA
| | | |
Collapse
|
42
|
Tetko IV, Engkvist O, Koch U, Reymond JL, Chen H. BIGCHEM: Challenges and Opportunities for Big Data Analysis in Chemistry. Mol Inform 2016; 35:615-621. [PMID: 27464907 PMCID: PMC5129546 DOI: 10.1002/minf.201600073] [Citation(s) in RCA: 68] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2016] [Accepted: 07/06/2016] [Indexed: 01/19/2023]
Abstract
The increasing volume of biomedical data in chemistry and life sciences requires the development of new methods and approaches for their handling. Here, we briefly discuss some challenges and opportunities of this fast growing area of research with a focus on those to be addressed within the BIGCHEM project. The article starts with a brief description of some available resources for “Big Data” in chemistry and a discussion of the importance of data quality. We then discuss challenges with visualization of millions of compounds by combining chemical and biological data, the expectations from mining the “Big Data” using advanced machine‐learning methods, and their applications in polypharmacology prediction and target de‐convolution in phenotypic screening. We show that the efficient exploration of billions of molecules requires the development of smart strategies. We also address the issue of secure information sharing without disclosing chemical structures, which is critical to enable bi‐party or multi‐party data sharing. Data sharing is important in the context of the recent trend of “open innovation” in pharmaceutical industry, which has led to not only more information sharing among academics and pharma industries but also the so‐called “precompetitive” collaboration between pharma companies. At the end we highlight the importance of education in “Big Data” for further progress of this area.
Collapse
Affiliation(s)
- Igor V Tetko
- Helmholtz Zentrum München - German Research Center for Environmental Health (GmbH), Institute of Structural Biology, Ingolstädter Landstraße 1, b. 60w, D-85764, Neuherberg, Germany.,BIGCHEM GmbH, Ingolstädter Landstraße 1, b. 60w, D-85764, Neuherberg, Germany
| | - Ola Engkvist
- Discovery Sciences, AstraZeneca R&D Gothenburg, Pepparedsleden 1, Mölndal, SE-43183, Sweden
| | - Uwe Koch
- Lead Discovery Center GmbH, Otto-Hahn Strasse 15, Dortmund, 44227, Germany
| | - Jean-Louis Reymond
- Department of Chemistry and Biochemistry, University of Bern, Freiestrasse 3, 3012, Bern, Switzerland
| | - Hongming Chen
- Discovery Sciences, AstraZeneca R&D Gothenburg, Pepparedsleden 1, Mölndal, SE-43183, Sweden
| |
Collapse
|
43
|
Sharma R, Schürer SC, Muskal SM. High quality, small molecule-activity datasets for kinase research. F1000Res 2016; 5. [PMID: 27429748 DOI: 10.12688/f1000research.8950.2] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 07/18/2016] [Indexed: 12/17/2022] Open
Abstract
Kinases regulate cell growth, movement, and death. Deregulated kinase activity is a frequent cause of disease. The therapeutic potential of kinase inhibitors has led to large amounts of published structure activity relationship (SAR) data. Bioactivity databases such as the Kinase Knowledgebase (KKB), WOMBAT, GOSTAR, and ChEMBL provide researchers with quantitative data characterizing the activity of compounds across many biological assays. The KKB, for example, contains over 1.8M kinase structure-activity data points reported in peer-reviewed journals and patents. In the spirit of fostering methods development and validation worldwide, we have extracted and have made available from the KKB 258K structure activity data points and 76K associated unique chemical structures across eight kinase targets. These data are freely available for download within this data note.
Collapse
Affiliation(s)
- Rajan Sharma
- Eidogen-Sertanty, Inc., Oceanside, CA, 92056, USA
| | - Stephan C Schürer
- Department of Pharmacology, Miller School of Medicine and Center for Computational Science, University of Miami, Miami, FL, 33136, USA
| | | |
Collapse
|
44
|
Abstract
Kinases regulate cell growth, movement, and death. Deregulated kinase activity is a frequent cause of disease. The therapeutic potential of kinase inhibitors has led to large amounts of published structure activity relationship (SAR) data. Bioactivity databases such as the Kinase Knowledgebase (KKB), WOMBAT, GOSTAR, and ChEMBL provide researchers with quantitative data characterizing the activity of compounds across many biological assays. The KKB, for example, contains over 1.8M kinase structure-activity data points reported in peer-reviewed journals and patents. In the spirit of fostering methods development and validation worldwide, we have extracted and have made available from the KKB 258K structure activity data points and 76K associated unique chemical structures across eight kinase targets. These data are freely available for download within this data note.
Collapse
Affiliation(s)
- Rajan Sharma
- Eidogen-Sertanty, Inc., Oceanside, CA, 92056, USA
| | - Stephan C Schürer
- Department of Pharmacology, Miller School of Medicine and Center for Computational Science, University of Miami, Miami, FL, 33136, USA
| | | |
Collapse
|
45
|
Callahan A, Abeyruwan SW, Al-Ali H, Sakurai K, Ferguson AR, Popovich PG, Shah NH, Visser U, Bixby JL, Lemmon VP. RegenBase: a knowledge base of spinal cord injury biology for translational research. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2016; 2016:baw040. [PMID: 27055827 PMCID: PMC4823819 DOI: 10.1093/database/baw040] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/03/2015] [Accepted: 03/03/2016] [Indexed: 12/20/2022]
Abstract
Spinal cord injury (SCI) research is a data-rich field that aims to identify the biological mechanisms resulting in loss of function and mobility after SCI, as well as develop therapies that promote recovery after injury. SCI experimental methods, data and domain knowledge are locked in the largely unstructured text of scientific publications, making large scale integration with existing bioinformatics resources and subsequent analysis infeasible. The lack of standard reporting for experiment variables and results also makes experiment replicability a significant challenge. To address these challenges, we have developed RegenBase, a knowledge base of SCI biology. RegenBase integrates curated literature-sourced facts and experimental details, raw assay data profiling the effect of compounds on enzyme activity and cell growth, and structured SCI domain knowledge in the form of the first ontology for SCI, using Semantic Web representation languages and frameworks. RegenBase uses consistent identifier schemes and data representations that enable automated linking among RegenBase statements and also to other biological databases and electronic resources. By querying RegenBase, we have identified novel biological hypotheses linking the effects of perturbagens to observed behavioral outcomes after SCI. RegenBase is publicly available for browsing, querying and download. Database URL:http://regenbase.org
Collapse
Affiliation(s)
- Alison Callahan
- Stanford Center for Biomedical Informatics Research, Stanford University, Stanford, CA 94305
| | | | - Hassan Al-Ali
- Miami Project to Cure Paralysis, University of Miami School of Medicine, Miami, FL 33136
| | - Kunie Sakurai
- Miami Project to Cure Paralysis, University of Miami School of Medicine, Miami, FL 33136
| | - Adam R Ferguson
- Brain and Spinal Injury Center (BASIC), Department of Neurological Surgery, University of California, San Francisco; San Francisco Veterans Affairs Medical Center, San Francisco, CA 94143
| | - Phillip G Popovich
- Center for Brain and Spinal Cord Repair and the Department of Neuroscience, The Ohio State University, Columbus, OH 43210
| | - Nigam H Shah
- Stanford Center for Biomedical Informatics Research, Stanford University, Stanford, CA 94305
| | - Ubbo Visser
- Department of Computer Science, University of Miami, Coral Gables, FL 33146
| | - John L Bixby
- Miami Project to Cure Paralysis, University of Miami School of Medicine, Miami, FL 33136 Center for Computational Science, University of Miami, Coral Gables, FL 33146 Department of Cellular and Molecular Pharmacology, University of Miami School of Medicine, Miami, FL 33136, USA
| | - Vance P Lemmon
- Miami Project to Cure Paralysis, University of Miami School of Medicine, Miami, FL 33136 Center for Computational Science, University of Miami, Coral Gables, FL 33146
| |
Collapse
|
46
|
Fang Y. Compound annotation with real time cellular activity profiles to improve drug discovery. Expert Opin Drug Discov 2016; 11:269-80. [PMID: 26787137 DOI: 10.1517/17460441.2016.1143460] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023]
Abstract
INTRODUCTION In the past decade, a range of innovative strategies have been developed to improve the productivity of pharmaceutical research and development. In particular, compound annotation, combined with informatics, has provided unprecedented opportunities for drug discovery. AREAS COVERED In this review, a literature search from 2000 to 2015 was conducted to provide an overview of the compound annotation approaches currently used in drug discovery. Based on this, a framework related to a compound annotation approach using real-time cellular activity profiles for probe, drug, and biology discovery is proposed. EXPERT OPINION Compound annotation with chemical structure, drug-like properties, bioactivities, genome-wide effects, clinical phenotypes, and textural abstracts has received significant attention in early drug discovery. However, these annotations are mostly associated with endpoint results. Advances in assay techniques have made it possible to obtain real-time cellular activity profiles of drug molecules under different phenotypes, so it is possible to generate compound annotation with real-time cellular activity profiles. Combining compound annotation with informatics, such as similarity analysis, presents a good opportunity to improve the rate of discovery of novel drugs and probes, and enhance our understanding of the underlying biology.
Collapse
Affiliation(s)
- Ye Fang
- a Biochemical Technologies, Science and Technology Division , Corning Incorporated , Corning , NY , USA
| |
Collapse
|
47
|
Hoehndorf R, Schofield PN, Gkoutos GV. The role of ontologies in biological and biomedical research: a functional perspective. Brief Bioinform 2015; 16:1069-80. [PMID: 25863278 PMCID: PMC4652617 DOI: 10.1093/bib/bbv011] [Citation(s) in RCA: 119] [Impact Index Per Article: 13.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2014] [Revised: 01/20/2015] [Indexed: 12/19/2022] Open
Abstract
Ontologies are widely used in biological and biomedical research. Their success lies in their combination of four main features present in almost all ontologies: provision of standard identifiers for classes and relations that represent the phenomena within a domain; provision of a vocabulary for a domain; provision of metadata that describes the intended meaning of the classes and relations in ontologies; and the provision of machine-readable axioms and definitions that enable computational access to some aspects of the meaning of classes and relations. While each of these features enables applications that facilitate data integration, data access and analysis, a great potential lies in the possibility of combining these four features to support integrative analysis and interpretation of multimodal data. Here, we provide a functional perspective on ontologies in biology and biomedicine, focusing on what ontologies can do and describing how they can be used in support of integrative research. We also outline perspectives for using ontologies in data-driven science, in particular their application in structured data mining and machine learning applications.
Collapse
|
48
|
Affiliation(s)
- O. Joseph Trask
- Cellular Imaging Core, The Hamner Institutes for Health Sciences, Research Triangle Park, North Carolina
| | - Paul A. Johnston
- Department of Pharmaceutical Sciences, School of Pharmacy, University of Pittsburgh, Pittsburgh, Pennsylvania
| |
Collapse
|
49
|
Jeliazkova N, Chomenidis C, Doganis P, Fadeel B, Grafström R, Hardy B, Hastings J, Hegi M, Jeliazkov V, Kochev N, Kohonen P, Munteanu CR, Sarimveis H, Smeets B, Sopasakis P, Tsiliki G, Vorgrimmler D, Willighagen E. The eNanoMapper database for nanomaterial safety information. BEILSTEIN JOURNAL OF NANOTECHNOLOGY 2015; 6:1609-34. [PMID: 26425413 PMCID: PMC4578352 DOI: 10.3762/bjnano.6.165] [Citation(s) in RCA: 63] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/31/2015] [Accepted: 07/03/2015] [Indexed: 05/20/2023]
Abstract
BACKGROUND The NanoSafety Cluster, a cluster of projects funded by the European Commision, identified the need for a computational infrastructure for toxicological data management of engineered nanomaterials (ENMs). Ontologies, open standards, and interoperable designs were envisioned to empower a harmonized approach to European research in nanotechnology. This setting provides a number of opportunities and challenges in the representation of nanomaterials data and the integration of ENM information originating from diverse systems. Within this cluster, eNanoMapper works towards supporting the collaborative safety assessment for ENMs by creating a modular and extensible infrastructure for data sharing, data analysis, and building computational toxicology models for ENMs. RESULTS The eNanoMapper database solution builds on the previous experience of the consortium partners in supporting diverse data through flexible data storage, open source components and web services. We have recently described the design of the eNanoMapper prototype database along with a summary of challenges in the representation of ENM data and an extensive review of existing nano-related data models, databases, and nanomaterials-related entries in chemical and toxicogenomic databases. This paper continues with a focus on the database functionality exposed through its application programming interface (API), and its use in visualisation and modelling. Considering the preferred community practice of using spreadsheet templates, we developed a configurable spreadsheet parser facilitating user friendly data preparation and data upload. We further present a web application able to retrieve the experimental data via the API and analyze it with multiple data preprocessing and machine learning algorithms. CONCLUSION We demonstrate how the eNanoMapper database is used to import and publish online ENM and assay data from several data sources, how the "representational state transfer" (REST) API enables building user friendly interfaces and graphical summaries of the data, and how these resources facilitate the modelling of reproducible quantitative structure-activity relationships for nanomaterials (NanoQSAR).
Collapse
Affiliation(s)
| | | | - Philip Doganis
- National Technical University of Athens, School of Chemical Engineering, Athens, Greece
| | | | | | - Barry Hardy
- Douglas Connect GmbH, Zeiningen, Switzerland
| | - Janna Hastings
- European Molecular Biology Laboratory – European Bioinformatics Institute (EMBL-EBI), Hinxton, United Kingdom
| | - Markus Hegi
- Douglas Connect GmbH, Zeiningen, Switzerland
| | | | - Nikolay Kochev
- Ideaconsult Ltd., Sofia, Bulgaria
- Department of Analytical Chemistry and Computer Chemistry, University of Plovdiv, Plovdiv, Bulgaria
| | | | - Cristian R Munteanu
- Department of Bioinformatics, NUTRIM, Maastricht University, Maastricht, The Netherlands
- Computer Science Faculty, University of A Coruna, A Coruña, Spain
| | - Haralambos Sarimveis
- National Technical University of Athens, School of Chemical Engineering, Athens, Greece
| | - Bart Smeets
- Department of Bioinformatics, NUTRIM, Maastricht University, Maastricht, The Netherlands
| | - Pantelis Sopasakis
- National Technical University of Athens, School of Chemical Engineering, Athens, Greece
- IMT Institute for Advanced Studies Lucca, Lucca, Italy
| | - Georgia Tsiliki
- National Technical University of Athens, School of Chemical Engineering, Athens, Greece
| | | | - Egon Willighagen
- Department of Bioinformatics, NUTRIM, Maastricht University, Maastricht, The Netherlands
| |
Collapse
|
50
|
Abstract
The emergence of a number of publicly available bioactivity databases, such as ChEMBL, PubChem BioAssay and BindingDB, has raised awareness about the topics of data curation, quality and integrity. Here we provide an overview and discussion of the current and future approaches to activity, assay and target data curation of the ChEMBL database. This curation process involves several manual and automated steps and aims to: (1) maximise data accessibility and comparability; (2) improve data integrity and flag outliers, ambiguities and potential errors; and (3) add further curated annotations and mappings thus increasing the usefulness and accuracy of the ChEMBL data for all users and modellers in particular. Issues related to activity, assay and target data curation and integrity along with their potential impact for users of the data are discussed, alongside robust selection and filter strategies in order to avoid or minimise these, depending on the desired application.
Collapse
|