1
|
|
2
|
Aranguren ME, Fernández-Breis JT, Mungall C, Antezana E, González AR, Wilkinson MD. OPPL-Galaxy, a Galaxy tool for enhancing ontology exploitation as part of bioinformatics workflows. J Biomed Semantics 2013; 4:2. [PMID: 23286517 PMCID: PMC3643862 DOI: 10.1186/2041-1480-4-2] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2012] [Accepted: 12/27/2012] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Biomedical ontologies are key elements for building up the Life Sciences Semantic Web. Reusing and building biomedical ontologies requires flexible and versatile tools to manipulate them efficiently, in particular for enriching their axiomatic content. The Ontology Pre Processor Language (OPPL) is an OWL-based language for automating the changes to be performed in an ontology. OPPL augments the ontologists' toolbox by providing a more efficient, and less error-prone, mechanism for enriching a biomedical ontology than that obtained by a manual treatment. RESULTS We present OPPL-Galaxy, a wrapper for using OPPL within Galaxy. The functionality delivered by OPPL (i.e. automated ontology manipulation) can be combined with the tools and workflows devised within the Galaxy framework, resulting in an enhancement of OPPL. Use cases are provided in order to demonstrate OPPL-Galaxy's capability for enriching, modifying and querying biomedical ontologies. CONCLUSIONS Coupling OPPL-Galaxy with other bioinformatics tools of the Galaxy framework results in a system that is more than the sum of its parts. OPPL-Galaxy opens a new dimension of analyses and exploitation of biomedical ontologies, including automated reasoning, paving the way towards advanced biological data analyses.
Collapse
Affiliation(s)
- Mikel Egaña Aranguren
- Ontology Engineering Group, School of Computer Science, Technical University of Madrid (UPM), Boadilla del Monte, 28660, Spain
- Biological Informatics Group, Centre for Plant Biotechnology and Genomics (CBGP), Technical University of Madrid (UPM), Pozuelo de Alarcón, 28223, Spain
| | | | - Chris Mungall
- Genomics Division, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, US
| | - Erick Antezana
- Department of Biology, Norwegian University of Science and Technology (NTNU), Høgskoleringen 5, Trondheim, N-7491, Norway
| | - Alejandro Rodríguez González
- Biological Informatics Group, Centre for Plant Biotechnology and Genomics (CBGP), Technical University of Madrid (UPM), Pozuelo de Alarcón, 28223, Spain
| | - Mark D Wilkinson
- Biological Informatics Group, Centre for Plant Biotechnology and Genomics (CBGP), Technical University of Madrid (UPM), Pozuelo de Alarcón, 28223, Spain
| |
Collapse
|
3
|
Gosal G, Kochut KJ, Kannan N. ProKinO: an ontology for integrative analysis of protein kinases in cancer. PLoS One 2011; 6:e28782. [PMID: 22194913 PMCID: PMC3237543 DOI: 10.1371/journal.pone.0028782] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2011] [Accepted: 11/15/2011] [Indexed: 11/18/2022] Open
Abstract
BACKGROUND Protein kinases are a large and diverse family of enzymes that are genomically altered in many human cancers. Targeted cancer genome sequencing efforts have unveiled the mutational profiles of protein kinase genes from many different cancer types. While mutational data on protein kinases is currently catalogued in various databases, integration of mutation data with other forms of data on protein kinases such as sequence, structure, function and pathway is necessary to identify and characterize key cancer causing mutations. Integrative analysis of protein kinase data, however, is a challenge because of the disparate nature of protein kinase data sources and data formats. RESULTS Here, we describe ProKinO, a protein kinase-specific ontology, which provides a controlled vocabulary of terms, their hierarchy, and relationships unifying sequence, structure, function, mutation and pathway information on protein kinases. The conceptual representation of such diverse forms of information in one place not only allows rapid discovery of significant information related to a specific protein kinase, but also enables large-scale integrative analysis of protein kinase data in ways not possible through other kinase-specific resources. We have performed several integrative analyses of ProKinO data and, as an example, found that a large number of somatic mutations (∼288 distinct mutations) associated with the haematopoietic neoplasm cancer type map to only 8 kinases in the human kinome. This is in contrast to glioma, where the mutations are spread over 82 distinct kinases. We also provide examples of how ontology-based data analysis can be used to generate testable hypotheses regarding cancer mutations. CONCLUSION We present an integrated framework for large-scale integrative analysis of protein kinase data. Navigation and analysis of ontology data can be performed using the ontology browser available at: http://vulcan.cs.uga.edu/prokino.
Collapse
Affiliation(s)
- Gurinder Gosal
- Department of Biochemistry and Molecular Biology, University of Georgia, Athens, United States of America
| | - Krys J. Kochut
- Department of Computer Science, University of Georgia, Athens, United States of America
- Institute of Bioinformatics, University of Georgia, Athens, United States of America
- * E-mail: (NK); (KK)
| | - Natarajan Kannan
- Department of Biochemistry and Molecular Biology, University of Georgia, Athens, United States of America
- Institute of Bioinformatics, University of Georgia, Athens, United States of America
- * E-mail: (NK); (KK)
| |
Collapse
|
4
|
Nakai T, Bagarinao E, Tanaka Y, Matsuo K, Racoceanu D. Ontology for FMRI as a biomedical informatics method. Magn Reson Med Sci 2008; 7:141-55. [PMID: 18827457 DOI: 10.2463/mrms.7.141] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022] Open
Abstract
Ontological engineering is one of the most challenging topics in biomedical informatics because of its key role in integrating the heterogeneous database used by biomedical information services. Ontology can translate concepts and their real-world relationships into expressions that can be processed by computer programs or web services, providing a unique taxonomic frame to describe a pathway for extracting, processing, storing, and retrieving information. In developing clinical functional neuroimaging, which requires the integration of heterogeneous information derived from multimodal measurement of the brain, these features will be indispensable. Neuroimaging ontology is remarkable in that it requires detailed description of the hypothesis, the paradigm employed, and a scheme for data generation. Neuroimaging modalities, such as functional magnetic resonance imaging (fMRI), magnetoencephalography (MEG), electroencephalography (EEG), and near infrared spectroscopy (NIRS), share similar application purposes, imaging protocol, analyzing methods, and data structure; semantic gaps that remain among the modalities will be bridged as ontology develops. High-performance, global resource information database (GRID) computing and the applications organized as service-oriented computing (SOC) will support the heavy processing to integrate the heterogeneous neuroimaging system. We have been developing such a distributed intelligent neuroimaging system for real-time fMRI analysis, called BAXGRID, and a neuroimaging database. The fMRI ontology of this system will be integrated with established medical ontologies, such as the Unified Medical Language System (UMLS).
Collapse
Affiliation(s)
- Toshiharu Nakai
- Functional Brain Imaging Lab, Department of Gerontechnology, National Center for Geriatrics and Gerontology, Aichi, Japan.
| | | | | | | | | |
Collapse
|
5
|
Abstract
The past twenty years have witnessed an explosion of biological data in diverse database formats governed by heterogeneous infrastructures. Not only are semantics (attribute terms) different in meaning across databases, but their organization varies widely. Ontologies are a concept imported from computing science to describe different conceptual frameworks that guide the collection, organization and publication of biological data. An ontology is similar to a paradigm but has very strict implications for formatting and meaning in a computational context. The use of ontologies is a means of communicating and resolving semantic and organizational differences between biological databases in order to enhance their integration. The purpose of interoperability (or sharing between divergent storage and semantic protocols) is to allow scientists from around the world to share and communicate with each other. This paper describes the rapid accumulation of biological data, its various organizational structures, and the role that ontologies play in interoperability.
Collapse
Affiliation(s)
- Nadine Schuurman
- Department of Geography, Simon Fraser University RCB 7123, 8888 University Drive, Burnaby, British Columbia, Canada.
| | | |
Collapse
|
6
|
Mabee PM, Arratia G, Coburn M, Haendel M, Hilton EJ, Lundberg JG, Mayden RL, Rios N, Westerfield M. Connecting evolutionary morphology to genomics using ontologies: a case study from Cypriniformes including zebrafish. JOURNAL OF EXPERIMENTAL ZOOLOGY PART B-MOLECULAR AND DEVELOPMENTAL EVOLUTION 2007; 308:655-68. [PMID: 17599725 DOI: 10.1002/jez.b.21181] [Citation(s) in RCA: 43] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
One focus of developmental biology is to understand how genes regulate development, and therefore examining the phenotypic effects of gene mutation is a major emphasis in studies of zebrafish and other model organisms. Genetic change underlies alterations in evolutionary characters, or phenotype, and morphological phylogenies inferred by comparison of these characters. We will utilize both existing and new ontologies to connect the evolutionary anatomy and image database that is being developed in the Cypriniformes Tree of Life project to the Zebrafish Information Network (HYPERLINK "file://localhost/Library/Local%20Settings/Temp/zfin.org" zfin.org) database. Ontologies are controlled vocabularies that formally represent hierarchical relationships among defined biological concepts. If used to recode the free-form text descriptors of anatomical characters, evolutionary character data can become more easily computed, explored, and mined. A shared ontology for homologous modules of the phenotype must be referenced to connect the growing databases in each area in a way that evolutionary questions can be addressed. We present examples that demonstrate the broad utility of this approach.
Collapse
Affiliation(s)
- Paula M Mabee
- Department of Biology, University of South Dakota, Vermillion, South Dakota 57069, USA.
| | | | | | | | | | | | | | | | | |
Collapse
|
7
|
Abstract
Classification of proteins into families of homologous sequences constitutes the basis of functional analysis or of evolutionary studies. Here we present INVertebrate HOmologous GENes (INVHOGEN), a database combining the available invertebrate protein genes from UniProt (consisting of Swiss-Prot and TrEMBL) into gene families. For each family INVHOGEN provides a multiple protein alignment, a maximum likelihood based phylogenetic tree and taxonomic information about the sequences. It is possible to download the corresponding GenBank flatfiles, the alignment and the tree in Newick format. Sequences and related information have been structured in an ACNUC database under a client/server architecture. Thus, complex selections can be performed. An external graphical tool (FamFetch) allows access to the data to evaluate homology relationships between genes and distinguish orthologous from paralogous sequences. Thus, INVHOGEN complements the well-known HOVERGEN database. The databank is available at .
Collapse
Affiliation(s)
- Ingo Paulsen
- Department of Bioinformatics, Institute for Computer Sciences, Heinrich-Heine-University Duesseldorf, Universitaetsstrasse 1, 40225 Duesseldorf, Germany.
| | | |
Collapse
|
8
|
|
9
|
Thompson JD, Holbrook SR, Katoh K, Koehl P, Moras D, Westhof E, Poch O. MAO: a Multiple Alignment Ontology for nucleic acid and protein sequences. Nucleic Acids Res 2005; 33:4164-71. [PMID: 16043635 PMCID: PMC1180671 DOI: 10.1093/nar/gki735] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
The application of high-throughput techniques such as genomics, proteomics or transcriptomics means that vast amounts of heterogeneous data are now available in the public databases. Bioinformatics is responding to the challenge with new integrated management systems for data collection, validation and analysis. Multiple alignments of genomic and protein sequences provide an ideal environment for the integration of this mass of information. In the context of the sequence family, structural and functional data can be evaluated and propagated from known to unknown sequences. However, effective integration is being hindered by syntactic and semantic differences between the different data resources and the alignment techniques employed. One solution to this problem is the development of an ontology that systematically defines the terms used in a specific domain. Ontologies are used to share data from different resources, to automatically analyse information and to represent domain knowledge for non-experts. Here, we present MAO, a new ontology for multiple alignments of nucleic and protein sequences. MAO is designed to improve interoperation and data sharing between different alignment protocols for the construction of a high quality, reliable multiple alignment in order to facilitate knowledge extraction and the presentation of the most pertinent information to the biologist.
Collapse
Affiliation(s)
- Julie D Thompson
- Institut de Génétique et deBiologie Moléculaire et Cellulaire 1 rue Laurent Fries, B.P. 10142, 67404 Illkirch Cedex, France.
| | | | | | | | | | | | | |
Collapse
|