1
|
Ontology mapping for semantically enabled applications. Drug Discov Today 2019; 24:2068-2075. [PMID: 31158512 DOI: 10.1016/j.drudis.2019.05.020] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2018] [Revised: 04/12/2019] [Accepted: 05/28/2019] [Indexed: 12/14/2022]
Abstract
In this review, we provide a summary of recent progress in ontology mapping (OM) at a crucial time when biomedical research is under a deluge of an increasing amount and variety of data. This is particularly important for realising the full potential of semantically enabled or enriched applications and for meaningful insights, such as drug discovery, using machine-learning technologies. We discuss challenges and solutions for better ontology mappings, as well as how to select ontologies before their application. In addition, we describe tools and algorithms for ontology mapping, including evaluation of tool capability and quality of mappings. Finally, we outline the requirements for an ontology mapping service (OMS) and the progress being made towards implementation of such sustainable services.
Collapse
|
2
|
Brown AP, Drew P, Knight B, Marc P, Troth S, Wuersch K, Zandee J. Graphical display of histopathology data from toxicology studies for drug discovery and development: An industry perspective. Regul Toxicol Pharmacol 2016; 82:167-172. [PMID: 27769829 DOI: 10.1016/j.yrtph.2016.10.009] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2016] [Accepted: 10/17/2016] [Indexed: 11/18/2022]
Abstract
Histopathology data comprise a critical component of pharmaceutical toxicology studies and are typically presented as finding incidence counts and severity scores per organ, and tabulated on multiple pages which can be challenging for review and aggregation of results. However, the SEND (Standard for Exchange of Nonclinical Data) standard provides a means for collecting and managing histopathology data in a uniform fashion which can allow informatics systems to archive, display and analyze data in novel ways. Various software applications have become available to convert histopathology data into graphical displays for analyses. A subgroup of the FDA-PhUSE Nonclinical Working Group conducted intra-industry surveys regarding the use of graphical displays of histopathology data. Visual cues, use-cases, the value of cross-domain and cross-study visualizations, and limitations were topics for discussion in the context of the surveys. The subgroup came to the following conclusions. Graphical displays appear advantageous as a communication tool to both pathologists and non-pathologists, and provide an efficient means for communicating pathology findings to project teams. Graphics can support hypothesis-generation which could include cross-domain interactive visualizations and/-or aggregating large datasets from multiple studies to observe and/or display patterns and trends. Incorporation of the SEND standard will provide a platform by which visualization tools will be able to aggregate, select and display information from complex and disparate datasets.
Collapse
Affiliation(s)
- Alan P Brown
- Novartis Institutes for Biomedical Research, 100 Technology Square, Cambridge, MA 02139, USA.
| | - Philip Drew
- PDS Consultants, Innovation Centre, 49 Oxford Street, Leicester, LE1 5XY England, UK
| | - Brian Knight
- Boehringer-Ingelheim, 900 Ridgebury Road, Ridgefield, CT 06877, USA
| | - Philippe Marc
- Novartis Institutes for Biomedical Research, Basel CH-4200, Switzerland
| | - Sean Troth
- Merck & Co., Inc., WP81-404, Sumneytown Pike, West Point, PA 19486, USA
| | - Kuno Wuersch
- Novartis Institutes for Biomedical Research, Basel CH-4200, Switzerland
| | - Joyce Zandee
- Integrated Nonclinical Development Solutions (INDS) Incorporated, 6111 Jackson Road, Suite 100, Ann Arbor, MI 48103, USA
| |
Collapse
|
3
|
Finding the right approach to big data-driven medicinal chemistry. Future Med Chem 2015; 7:1213-6. [DOI: 10.4155/fmc.15.58] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023] Open
|
4
|
The eTOX data-sharing project to advance in silico drug-induced toxicity prediction. Int J Mol Sci 2014; 15:21136-54. [PMID: 25405742 PMCID: PMC4264217 DOI: 10.3390/ijms151121136] [Citation(s) in RCA: 48] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2014] [Accepted: 10/20/2014] [Indexed: 11/16/2022] Open
Abstract
The high-quality in vivo preclinical safety data produced by the pharmaceutical industry during drug development, which follows numerous strict guidelines, are mostly not available in the public domain. These safety data are sometimes published as a condensed summary for the few compounds that reach the market, but the majority of studies are never made public and are often difficult to access in an automated way, even sometimes within the owning company itself. It is evident from many academic and industrial examples, that useful data mining and model development requires large and representative data sets and careful curation of the collected data. In 2010, under the auspices of the Innovative Medicines Initiative, the eTOX project started with the objective of extracting and sharing preclinical study data from paper or pdf archives of toxicology departments of the 13 participating pharmaceutical companies and using such data for establishing a detailed, well-curated database, which could then serve as source for read-across approaches (early assessment of the potential toxicity of a drug candidate by comparison of similar structure and/or effects) and training of predictive models. The paper describes the efforts undertaken to allow effective data sharing intellectual property (IP) protection and set up of adequate controlled vocabularies) and to establish the database (currently with over 4000 studies contributed by the pharma companies corresponding to more than 1400 compounds). In addition, the status of predictive models building and some specific features of the eTOX predictive system (eTOXsys) are presented as decision support knowledge-based tools for drug development process at an early stage.
Collapse
|
5
|
Vempati UD, Chung C, Mader C, Koleti A, Datar N, Vidović D, Wrobel D, Erickson S, Muhlich JL, Berriz G, Benes CH, Subramanian A, Pillai A, Shamu CE, Schürer SC. Metadata Standard and Data Exchange Specifications to Describe, Model, and Integrate Complex and Diverse High-Throughput Screening Data from the Library of Integrated Network-based Cellular Signatures (LINCS). JOURNAL OF BIOMOLECULAR SCREENING 2014; 19:803-16. [PMID: 24518066 PMCID: PMC7723305 DOI: 10.1177/1087057114522514] [Citation(s) in RCA: 60] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/09/2013] [Accepted: 01/13/2014] [Indexed: 01/15/2023]
Abstract
The National Institutes of Health Library of Integrated Network-based Cellular Signatures (LINCS) program is generating extensive multidimensional data sets, including biochemical, genome-wide transcriptional, and phenotypic cellular response signatures to a variety of small-molecule and genetic perturbations with the goal of creating a sustainable, widely applicable, and readily accessible systems biology knowledge resource. Integration and analysis of diverse LINCS data sets depend on the availability of sufficient metadata to describe the assays and screening results and on their syntactic, structural, and semantic consistency. Here we report metadata specifications for the most important molecular and cellular components and recommend them for adoption beyond the LINCS project. We focus on the minimum required information to model LINCS assays and results based on a number of use cases, and we recommend controlled terminologies and ontologies to annotate assays with syntactic consistency and semantic integrity. We also report specifications for a simple annotation format (SAF) to describe assays and screening results based on our metadata specifications with explicit controlled vocabularies. SAF specifically serves to programmatically access and exchange LINCS data as a prerequisite for a distributed information management infrastructure. We applied the metadata specifications to annotate large numbers of LINCS cell lines, proteins, and small molecules. The resources generated and presented here are freely available.
Collapse
Affiliation(s)
- Uma D Vempati
- Center for Computational Science, University of Miami, Miami, FL, USA
| | - Caty Chung
- Center for Computational Science, University of Miami, Miami, FL, USA
| | - Chris Mader
- Center for Computational Science, University of Miami, Miami, FL, USA
| | - Amar Koleti
- Center for Computational Science, University of Miami, Miami, FL, USA
| | - Nakul Datar
- Center for Computational Science, University of Miami, Miami, FL, USA
| | - Dušica Vidović
- Center for Computational Science, University of Miami, Miami, FL, USA
| | - David Wrobel
- ICCB-Longwood Screening Facility, Harvard Medical School, Boston, MA, USA
| | - Sean Erickson
- ICCB-Longwood Screening Facility, Harvard Medical School, Boston, MA, USA
| | - Jeremy L Muhlich
- Department of Systems Biology, Harvard Medical School, Boston, Massachusetts, USA
| | - Gabriel Berriz
- Department of Systems Biology, Harvard Medical School, Boston, Massachusetts, USA
| | - Cyril H Benes
- Center for Molecular Therapeutics, Massachusetts General Hospital, Boston, Massachusetts, USA
| | | | - Ajay Pillai
- National Human Genome Research Institute, National Institutes of Health, Rockville, Maryland, USA
| | - Caroline E Shamu
- ICCB-Longwood Screening Facility, Harvard Medical School, Boston, MA, USA Department of Systems Biology, Harvard Medical School, Boston, Massachusetts, USA
| | - Stephan C Schürer
- Center for Computational Science, University of Miami, Miami, FL, USA Department of Molecular and Cellular Pharmacology, University of Miami, Miami, Florida, USA
| |
Collapse
|
6
|
Kell DB, Goodacre R. Metabolomics and systems pharmacology: why and how to model the human metabolic network for drug discovery. Drug Discov Today 2014; 19:171-82. [PMID: 23892182 PMCID: PMC3989035 DOI: 10.1016/j.drudis.2013.07.014] [Citation(s) in RCA: 118] [Impact Index Per Article: 11.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2013] [Revised: 07/03/2013] [Accepted: 07/16/2013] [Indexed: 02/06/2023]
Abstract
Metabolism represents the 'sharp end' of systems biology, because changes in metabolite concentrations are necessarily amplified relative to changes in the transcriptome, proteome and enzyme activities, which can be modulated by drugs. To understand such behaviour, we therefore need (and increasingly have) reliable consensus (community) models of the human metabolic network that include the important transporters. Small molecule 'drug' transporters are in fact metabolite transporters, because drugs bear structural similarities to metabolites known from the network reconstructions and from measurements of the metabolome. Recon2 represents the present state-of-the-art human metabolic network reconstruction; it can predict inter alia: (i) the effects of inborn errors of metabolism; (ii) which metabolites are exometabolites, and (iii) how metabolism varies between tissues and cellular compartments. However, even these qualitative network models are not yet complete. As our understanding improves so do we recognise more clearly the need for a systems (poly)pharmacology.
Collapse
Affiliation(s)
- Douglas B Kell
- School of Chemistry and Manchester Institute of Biotechnology, The University of Manchester, 131 Princess Street, Manchester M1 7DN, UK.
| | - Royston Goodacre
- School of Chemistry and Manchester Institute of Biotechnology, The University of Manchester, 131 Princess Street, Manchester M1 7DN, UK
| |
Collapse
|
7
|
Boeker M, Jansen L, Grewe N, Röhl J, Schober D, Seddig-Raufie D, Schulz S. Effects of guideline-based training on the quality of formal ontologies: a randomized controlled trial. PLoS One 2013; 8:e61425. [PMID: 23667440 PMCID: PMC3646875 DOI: 10.1371/journal.pone.0061425] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2012] [Accepted: 03/09/2013] [Indexed: 11/24/2022] Open
Abstract
BACKGROUND The importance of ontologies in the biomedical domain is generally recognized. However, their quality is often too poor for large-scale use in critical applications, at least partially due to insufficient training of ontology developers. OBJECTIVE To show the efficacy of guideline-based ontology development training on the performance of ontology developers. The hypothesis was that students who received training on top-level ontologies and design patterns perform better than those who only received training in the basic principles of formal ontology engineering. METHODS A curriculum was implemented based on a guideline for ontology design. A randomized controlled trial on the efficacy of this curriculum was performed with 24 students from bioinformatics and related fields. After joint training on the fundamentals of ontology development the students were randomly allocated to two groups. During the intervention, each group received training on different topics in ontology development. In the assessment phase, all students were asked to solve modeling problems on topics taught differentially in the intervention phase. Primary outcome was the similarity of the students' ontology artefacts compared with gold standard ontologies developed by the authors before the experiment; secondary outcome was the intra-group similarity of group members' ontologies. RESULTS The experiment showed no significant effect of the guideline-based training on the performance of ontology developers (a) the ontologies developed after specific training were only slightly but not significantly closer to the gold standard ontologies than the ontologies developed without prior specific training; (b) although significant differences for certain ontologies were detected, the intra-group similarity was not consistently influenced in one direction by the differential training. CONCLUSION Methodologically limited, this study cannot be interpreted as a general failure of a guideline-based approach to ontology development. Further research is needed to increase insight into whether specific development guidelines and practices in ontology design are effective.
Collapse
Affiliation(s)
- Martin Boeker
- Institute of Medical Biometry and Medical Informatics, Albert-Ludwigs University Freiburg, Freiburg, Germany.
| | | | | | | | | | | | | |
Collapse
|
8
|
Harrow I, Filsell W, Woollard P, Dix I, Braxenthaler M, Gedye R, Hoole D, Kidd R, Wilson J, Rebholz-Schuhmann D. Towards Virtual Knowledge Broker services for semantic integration of life science literature and data sources. Drug Discov Today 2013; 18:428-34. [DOI: 10.1016/j.drudis.2012.11.012] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2012] [Revised: 11/09/2012] [Accepted: 11/22/2012] [Indexed: 10/27/2022]
|
9
|
Eriksson M, Nilsson I, Kogej T, Southan C, Johansson M, Tyrchan C, Muresan S, Blomberg N, Bjäreland M. SARConnect: A Tool to Interrogate the Connectivity Between Proteins, Chemical Structures and Activity Data. Mol Inform 2012; 31:555-568. [PMID: 23308082 PMCID: PMC3535785 DOI: 10.1002/minf.201200030] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2012] [Accepted: 04/14/2012] [Indexed: 11/21/2022]
Abstract
The access and use of large-scale structure-activity relationships (SAR) is increasing as the range of targets and availability of bioactive compound-to-protein mappings expands. However, effective exploitation requires merging and normalisation of activity data, mappings to target classifications as well as visual display of chemical structure relationships. This work describes the development of the application "SARConnect" to address these issues. We discuss options for delivery and analysis of large-scale SAR data together with a set of use-cases to illustrate the design choices and utility. The main activity sources of ChEMBL,1 GOSTAR2 and AstraZeneca's internal system IBIS, had already been integrated in Chemistry Connect.3 For target relationships we selected human UniProtKB/Swiss-Prot4 as our primary source of a heuristic target classification. Similarly, to explore chemical relationships we combined several methods for framework and scaffold analysis into a unified, hierarchical classification where ease of navigation was the primary goal. An application was built on TIBCO Spotfire to retrieve data for visual display. Consequently, users can explore relationships between target, activity and structure across internal, external and commercial sources that encompass approximately 3 million compounds, 2000 human proteins and 10 million activity values. Examples showing the utility of the application are given.
Collapse
Affiliation(s)
- Mats Eriksson
- Discovery Sciences, Computational
Sciences, AstraZeneca R&D Mölndal,
S-431 83 Mölndal, Sweden
| | | | - Thierry Kogej
- Discovery Sciences, Computational
Sciences, AstraZeneca R&D Mölndal,
S-431 83 Mölndal, Sweden
| | | | | | | | - Sorel Muresan
- Discovery Sciences, Computational
Sciences, AstraZeneca R&D Mölndal,
S-431 83 Mölndal, Sweden
| | | | | |
Collapse
|
10
|
Sansone SA, Rocca-Serra P. On the evolving portfolio of community-standards and data sharing policies: turning challenges into new opportunities. Gigascience 2012; 1:10. [PMID: 23587326 PMCID: PMC3626509 DOI: 10.1186/2047-217x-1-10] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2012] [Accepted: 07/12/2012] [Indexed: 12/04/2022] Open
Abstract
There are thousands of biology databases with hundreds of terminologies, reporting guidelines, representations models, and exchange formats to help annotate, report, and share bioscience investigations. It is evident, however, that researchers and bioinformaticians struggle to navigate the various standards and to find the appropriate database to collect, manage, and share data. Further, policy makers, funders, and publishers lack sufficient information to formulate their guidelines. In this paper, we highlight a number of key issues that can be used to turn these challenges into new opportunities. It is time for all stakeholders to work together to reconcile cause and effect and make the data-sharing culture functional and efficient.
Collapse
|
11
|
Williams AJ, Harland L, Groth P, Pettifer S, Chichester C, Willighagen EL, Evelo CT, Blomberg N, Ecker G, Goble C, Mons B. Open PHACTS: semantic interoperability for drug discovery. Drug Discov Today 2012; 17:1188-98. [PMID: 22683805 DOI: 10.1016/j.drudis.2012.05.016] [Citation(s) in RCA: 172] [Impact Index Per Article: 14.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2012] [Revised: 05/18/2012] [Accepted: 05/31/2012] [Indexed: 01/22/2023]
Abstract
Open PHACTS is a public-private partnership between academia, publishers, small and medium sized enterprises and pharmaceutical companies. The goal of the project is to deliver and sustain an 'open pharmacological space' using and enhancing state-of-the-art semantic web standards and technologies. It is focused on practical and robust applications to solve specific questions in drug discovery research. OPS is intended to facilitate improvements in drug discovery in academia and industry and to support open innovation and in-house non-public drug discovery research. This paper lays out the challenges and how the Open PHACTS project is hoping to address these challenges technically and socially.
Collapse
Affiliation(s)
- Antony J Williams
- Royal Society of Chemistry, ChemSpider, US Office, Wake Forest, NC 27587, USA.
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
12
|
Hastings J, Magka D, Batchelor C, Duan L, Stevens R, Ennis M, Steinbeck C. Structure-based classification and ontology in chemistry. J Cheminform 2012; 4:8. [PMID: 22480202 PMCID: PMC3361486 DOI: 10.1186/1758-2946-4-8] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2012] [Accepted: 04/05/2012] [Indexed: 02/04/2023] Open
Abstract
BACKGROUND Recent years have seen an explosion in the availability of data in the chemistry domain. With this information explosion, however, retrieving relevant results from the available information, and organising those results, become even harder problems. Computational processing is essential to filter and organise the available resources so as to better facilitate the work of scientists. Ontologies encode expert domain knowledge in a hierarchically organised machine-processable format. One such ontology for the chemical domain is ChEBI. ChEBI provides a classification of chemicals based on their structural features and a role or activity-based classification. An example of a structure-based class is 'pentacyclic compound' (compounds containing five-ring structures), while an example of a role-based class is 'analgesic', since many different chemicals can act as analgesics without sharing structural features. Structure-based classification in chemistry exploits elegant regularities and symmetries in the underlying chemical domain. As yet, there has been neither a systematic analysis of the types of structural classification in use in chemistry nor a comparison to the capabilities of available technologies. RESULTS We analyze the different categories of structural classes in chemistry, presenting a list of patterns for features found in class definitions. We compare these patterns of class definition to tools which allow for automation of hierarchy construction within cheminformatics and within logic-based ontology technology, going into detail in the latter case with respect to the expressive capabilities of the Web Ontology Language and recent extensions for modelling structured objects. Finally we discuss the relationships and interactions between cheminformatics approaches and logic-based approaches. CONCLUSION Systems that perform intelligent reasoning tasks on chemistry data require a diverse set of underlying computational utilities including algorithmic, statistical and logic-based tools. For the task of automatic structure-based classification of chemical entities, essential to managing the vast swathes of chemical data being brought online, systems which are capable of hybrid reasoning combining several different approaches are crucial. We provide a thorough review of the available tools and methodologies, and identify areas of open research.
Collapse
Affiliation(s)
- Janna Hastings
- Cheminformatics and Metabolism, European Bioinformatics Institute, Hinxton, UK
- Swiss Center for Affective Sciences, University of Geneva, Geneva, Switzerland
| | - Despoina Magka
- Department of Computer Science, University of Oxford, Oxford, UK
| | | | - Lian Duan
- Cheminformatics and Metabolism, European Bioinformatics Institute, Hinxton, UK
- ETH, Zürich, Switzerland
| | - Robert Stevens
- School of Computer Science, University of Manchester, Manchester, UK
| | - Marcus Ennis
- Cheminformatics and Metabolism, European Bioinformatics Institute, Hinxton, UK
| | - Christoph Steinbeck
- Cheminformatics and Metabolism, European Bioinformatics Institute, Hinxton, UK
| |
Collapse
|
13
|
|
14
|
Sansone SA, Rocca-Serra P, Field D, Maguire E, Taylor C, Hofmann O, Fang H, Neumann S, Tong W, Amaral-Zettler L, Begley K, Booth T, Bougueleret L, Burns G, Chapman B, Clark T, Coleman LA, Copeland J, Das S, de Daruvar A, de Matos P, Dix I, Edmunds S, Evelo CT, Forster MJ, Gaudet P, Gilbert J, Goble C, Griffin JL, Jacob D, Kleinjans J, Harland L, Haug K, Hermjakob H, Ho Sui SJ, Laederach A, Liang S, Marshall S, McGrath A, Merrill E, Reilly D, Roux M, Shamu CE, Shang CA, Steinbeck C, Trefethen A, Williams-Jones B, Wolstencroft K, Xenarios I, Hide W. Toward interoperable bioscience data. Nat Genet 2012; 44:121-6. [PMID: 22281772 PMCID: PMC3428019 DOI: 10.1038/ng.1054] [Citation(s) in RCA: 251] [Impact Index Per Article: 20.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
To make full use of research data, the bioscience community needs to adopt technologies and reward mechanisms that support interoperability and promote the growth of an open 'data commoning' culture. Here we describe the prerequisites for data commoning and present an established and growing ecosystem of solutions using the shared 'Investigation-Study-Assay' framework to support that vision.
Collapse
|