1
|
Alghamdi SM, Hoehndorf R. Improving the classification of cardinality phenotypes using collections. J Biomed Semantics 2023; 14:9. [PMID: 37550716 PMCID: PMC10405428 DOI: 10.1186/s13326-023-00290-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2023] [Accepted: 07/07/2023] [Indexed: 08/09/2023] Open
Abstract
MOTIVATION Phenotypes are observable characteristics of an organism and they can be highly variable. Information about phenotypes is collected in a clinical context to characterize disease, and is also collected in model organisms and stored in model organism databases where they are used to understand gene functions. Phenotype data is also used in computational data analysis and machine learning methods to provide novel insights into disease mechanisms and support personalized diagnosis of disease. For mammalian organisms and in a clinical context, ontologies such as the Human Phenotype Ontology and the Mammalian Phenotype Ontology are widely used to formally and precisely describe phenotypes. We specifically analyze axioms pertaining to phenotypes of collections of entities within a body, and we find that some of the axioms in phenotype ontologies lead to inferences that may not accurately reflect the underlying biological phenomena. RESULTS We reformulate the phenotypes of collections of entities using an ontological theory of collections. By reformulating phenotypes of collections in phenotypes ontologies, we avoid potentially incorrect inferences pertaining to the cardinality of these collections. We apply our method to two phenotype ontologies and show that the reformulation not only removes some problematic inferences but also quantitatively improves biological data analysis.
Collapse
Affiliation(s)
- Sarah M Alghamdi
- Computational Bioscience Research Center (CBRC), Computer, Electrical, and Mathematical Sciences & Engineering Division, King Abdullah University of Science and Technology, 4700 KAUST, 23955, Thuwal, Saudi Arabia.
- King Abdul-Aziz University, Faculty of Computing and Information Technology, 25732, Rabigh, Saudi Arabia.
| | - Robert Hoehndorf
- Computational Bioscience Research Center (CBRC), Computer, Electrical, and Mathematical Sciences & Engineering Division, King Abdullah University of Science and Technology, 4700 KAUST, 23955, Thuwal, Saudi Arabia.
| |
Collapse
|
2
|
Hu ZL, Park CA, Reecy JM. A combinatorial approach implementing new database structures to facilitate practical data curation management of QTL, association, correlation and heritability data on trait variants. Database (Oxford) 2023; 2023:7135870. [PMID: 37084387 PMCID: PMC10121204 DOI: 10.1093/database/baad024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2022] [Revised: 02/28/2023] [Accepted: 03/27/2023] [Indexed: 04/23/2023]
Abstract
A precise description of traits is essential in genetics and genomics studies to facilitate comparative genetics and meta-analyses. It is an ongoing challenge in research and production environments to unambiguously and consistently compare traits of interest from data collected under various conditions. Despite previous efforts to standardize trait nomenclature, it remains a challenge to fully and accurately capture trait nomenclature granularity in a way that ensures long-term data sustainability in terms of the data curation processes, data management logistics and the ability to make meaningful comparisons across studies. In the Animal Quantitative Trait Loci Database and the Animal Trait Correlation Database, we have recently introduced a new method to extend livestock trait ontologies by using trait modifiers and qualifiers to define traits that differ slightly in how they are measured, examined or combined with other traits or factors. Here, we describe the implementation of a system in which the extended trait data, with modifiers, are managed at the experiment level as 'trait variants'. This has helped us to streamline the management and curation of such trait information in our database environment. Database URL https://www.animalgenome.org/PGNET/.
Collapse
Affiliation(s)
- Zhi-Liang Hu
- Department of Animal Science, Iowa State University, 2255 Kildee Hall, 806 Stange Road, Ames, IA 50011-3150, USA
| | - Carissa A Park
- Department of Animal Science, Iowa State University, 2255 Kildee Hall, 806 Stange Road, Ames, IA 50011-3150, USA
| | - James M Reecy
- Department of Animal Science, Iowa State University, 2255 Kildee Hall, 806 Stange Road, Ames, IA 50011-3150, USA
| |
Collapse
|
3
|
Slater LT, Gkoutos GV, Hoehndorf R. Towards semantic interoperability: finding and repairing hidden contradictions in biomedical ontologies. BMC Med Inform Decis Mak 2020; 20:311. [PMID: 33319712 PMCID: PMC7736131 DOI: 10.1186/s12911-020-01336-2] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2020] [Accepted: 11/16/2020] [Indexed: 12/25/2022] Open
Abstract
Background Ontologies are widely used throughout the biomedical domain. These ontologies formally represent the classes and relations assumed to exist within a domain. As scientific domains are deeply interlinked, so too are their representations. While individual ontologies can be tested for consistency and coherency using automated reasoning methods, systematically combining ontologies of multiple domains together may reveal previously hidden contradictions. Methods We developed a method that tests for hidden unsatisfiabilities in an ontology that arise when combined with other ontologies. For this purpose, we combined sets of ontologies and use automated reasoning to determine whether unsatisfiable classes are present. In addition, we designed and implemented a novel algorithm that can determine justifications for contradictions across extremely large and complicated ontologies, and use these justifications to semi-automatically repair ontologies by identifying a small set of axioms that, when removed, result in a consistent and coherent set of ontologies.
Results We tested the mutual consistency of the OBO Foundry and the OBO ontologies and find that the combined OBO Foundry gives rise to at least 636 unsatisfiable classes, while the OBO ontologies give rise to more than 300,000 unsatisfiable classes. We also applied our semi-automatic repair algorithm to each combination of OBO ontologies that resulted in unsatisfiable classes, finding that only 117 axioms could be removed to account for all cases of unsatisfiability across all OBO ontologies. Conclusions We identified a large set of hidden unsatisfiability across a broad range of biomedical ontologies, and we find that this large set of unsatisfiable classes is the result of a relatively small amount of axiomatic disagreements. Our results show that hidden unsatisfiability is a serious problem in ontology interoperability; however, our results also provide a way towards more consistent ontologies by addressing the issues we identified.
Collapse
Affiliation(s)
- Luke T Slater
- College of Medical and Dental Sciences, Institute of Cancer and Genomic Sciences, University of Birmingham, Birmingham, B15 2TT, UK. .,Institute of Translational Medicine, University Hospitals Birmingham, NHS Foundation Trust, Birmingham, B15 2TT, UK.
| | - Georgios V Gkoutos
- College of Medical and Dental Sciences, Institute of Cancer and Genomic Sciences, University of Birmingham, Birmingham, B15 2TT, UK.,Institute of Translational Medicine, University Hospitals Birmingham, NHS Foundation Trust, Birmingham, B15 2TT, UK.,NIHR Experimental Cancer Medicine Centre, Birmingham, B15 2TT, UK.,NIHR Surgical Reconstruction and Microbiology Research Centre, Birmingham, B15 2TT, UK.,NIHR Biomedical Research Centre, Birmingham, B15 2TT, UK.,MRC Health Data Research UK (HDR UK Midlands, Birmingham, B15 2TT, UK
| | - Robert Hoehndorf
- Computer, Electrical and Mathematical Sciences and Engineering Division, Computational Bioscience Research Center, King Abdullah University of Science and Technology, Thuwal, 23955, Saudi Arabia
| |
Collapse
|
4
|
Fernando PC, Mabee PM, Zeng E. Integration of anatomy ontology data with protein-protein interaction networks improves the candidate gene prediction accuracy for anatomical entities. BMC Bioinformatics 2020; 21:442. [PMID: 33028186 PMCID: PMC7542696 DOI: 10.1186/s12859-020-03773-2] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2020] [Accepted: 09/22/2020] [Indexed: 01/04/2023] Open
Abstract
Background Identification of genes responsible for anatomical entities is a major requirement in many fields including developmental biology, medicine, and agriculture. Current wet lab techniques used for this purpose, such as gene knockout, are high in resource and time consumption. Protein–protein interaction (PPI) networks are frequently used to predict disease genes for humans and gene candidates for molecular functions, but they are rarely used to predict genes for anatomical entities. Moreover, PPI networks suffer from network quality issues, which can be a limitation for their usage in predicting candidate genes. Therefore, we developed an integrative framework to improve the candidate gene prediction accuracy for anatomical entities by combining existing experimental knowledge about gene-anatomical entity relationships with PPI networks using anatomy ontology annotations. We hypothesized that this integration improves the quality of the PPI networks by reducing the number of false positive and false negative interactions and is better optimized to predict candidate genes for anatomical entities. We used existing Uberon anatomical entity annotations for zebrafish and mouse genes to construct gene networks by calculating semantic similarity between the genes. These anatomy-based gene networks were semantic networks, as they were constructed based on the anatomy ontology annotations that were obtained from the experimental data in the literature. We integrated these anatomy-based gene networks with mouse and zebrafish PPI networks retrieved from the STRING database and compared the performance of their network-based candidate gene predictions. Results According to evaluations of candidate gene prediction performance tested under four different semantic similarity calculation methods (Lin, Resnik, Schlicker, and Wang), the integrated networks, which were semantically improved PPI networks, showed better performances by having higher area under the curve values for receiver operating characteristic and precision-recall curves than PPI networks for both zebrafish and mouse. Conclusion Integration of existing experimental knowledge about gene-anatomical entity relationships with PPI networks via anatomy ontology improved the candidate gene prediction accuracy and optimized them for predicting candidate genes for anatomical entities.
Collapse
Affiliation(s)
- Pasan C Fernando
- Department of Biology, University of South Dakota, Vermillion, SD, USA.
| | - Paula M Mabee
- Department of Biology, University of South Dakota, Vermillion, SD, USA.,National Ecological Observatory Network, Battelle Memorial Institute, 1685 38th St., Suite 100, Boulder, CO, 80301, USA
| | - Erliang Zeng
- Division of Biostatistics and Computational Biology, College of Dentistry, University of Iowa, Iowa City, IA, USA. .,Department of Preventive and Community Dentistry, College of Dentistry, University of Iowa, Iowa City, IA, USA. .,Department of Biostatistics, College of Public Health, University of Iowa, Iowa City, IA, USA. .,Department of Biomedical Engineering, College of Engineering, University of Iowa, Iowa City, IA, USA.
| |
Collapse
|
5
|
Endara L, Thessen AE, Cole HA, Walls R, Gkoutos G, Cao Y, Chong SS, Cui H. Modifier Ontologies for frequency, certainty, degree, and coverage phenotype modifier. Biodivers Data J 2018; 6:e29232. [PMID: 30532623 PMCID: PMC6281706 DOI: 10.3897/bdj.6.e29232] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2018] [Accepted: 11/20/2018] [Indexed: 11/21/2022] Open
Abstract
Background: When phenotypic characters are described in the literature, they may be constrained or clarified with additional information such as the location or degree of expression, these terms are called "modifiers". With effort underway to convert narrative character descriptions to computable data, ontologies for such modifiers are needed. Such ontologies can also be used to guide term usage in future publications. Spatial and method modifiers are the subjects of ontologies that already have been developed or are under development. In this work, frequency (e.g., rarely, usually), certainty (e.g., probably, definitely), degree (e.g., slightly, extremely), and coverage modifiers (e.g., sparsely, entirely) are collected, reviewed, and used to create two modifier ontologies with different design considerations. The basic goal is to express the sequential relationships within a type of modifiers, for example, usually is more frequent than rarely, in order to allow data annotated with ontology terms to be classified accordingly. Method: Two designs are proposed for the ontology, both using the list pattern: a closed ordered list (i.e., five-bin design) and an open ordered list design. The five-bin design puts the modifier terms into a set of 5 fixed bins with interval object properties, for example, one_level_more/less_frequently_than, where new terms can only be added as synonyms to existing classes. The open list approach starts with 5 bins, but supports the extensibility of the list via ordinal properties, for example, more/less_frequently_than, allowing new terms to be inserted as a new class anywhere in the list. The consequences of the different design decisions are discussed in the paper. CharaParser was used to extract modifiers from plant, ant, and other taxonomic descriptions. After a manual screening, 130 modifier words were selected as the candidate terms for the modifier ontologies. Four curators/experts (three biologists and one information scientist specialized in biosemantics) reviewed and categorized the terms into 20 bins using the Ontology Term Organizer (OTO) (http://biosemantics.arizona.edu/OTO). Inter-curator variations were reviewed and expressed in the final ontologies. Results: Frequency, certainty, degree, and coverage terms with complete agreement among all curators were used as class labels or exact synonyms. Terms with different interpretations were either excluded or included using "broader synonym" or "not recommended" annotation properties. These annotations explicitly allow for the user to be aware of the semantic ambiguity associated with the terms and whether they should be used with caution or avoided. Expert categorization results showed that 16 out of 20 bins contained terms with full agreements, suggesting differentiating the modifiers into 5 levels/bins balances the need to differentiate modifiers and the need for the ontology to reflect user consensus. Two ontologies, developed using the Protege ontology editor, are made available as OWL files and can be downloaded from https://github.com/biosemantics/ontologies. Contribution: We built the first two modifier ontologies following a consensus-based approach with terms commonly used in taxonomic literature. The five-bin ontology has been used in the Explorer of Taxon Concepts web toolkit to compute the similarity between characters extracted from literature to facilitate taxon concepts alignments. The two ontologies will also be used in an ontology-informed authoring tool for taxonomists to facilitate consistency in modifier term usage.
Collapse
Affiliation(s)
- Lorena Endara
- University of Florida, Gainesville, United States of AmericaUniversity of FloridaGainesvilleUnited States of America
| | - Anne E Thessen
- The Ronin Institute for Independent Scholarship, Monclair, NJ, United States of AmericaThe Ronin Institute for Independent ScholarshipMonclair, NJUnited States of America
| | - Heather A Cole
- Science and Technology Branch, Agriculture and Agri-Food Canada, Government of Canada, Ottawa, CanadaScience and Technology Branch, Agriculture and Agri-Food Canada, Government of CanadaOttawaCanada
| | - Ramona Walls
- CyVerse, Tucson, United States of AmericaCyVerseTucsonUnited States of America
| | - Georgios Gkoutos
- College of Medical and Dental Sciences, Institute of Cancer and Genomic Sciences, Centre for Computational Biology, University of Birmingham, Birmingham, United KingdomCollege of Medical and Dental Sciences, Institute of Cancer and Genomic Sciences, Centre for Computational Biology, University of BirminghamBirminghamUnited Kingdom
- Institute of Translational Medicine, University Hospitals Birmingham NHS Foundation Trust, B15 2TT, Birmingham, United KingdomInstitute of Translational Medicine, University Hospitals Birmingham NHS Foundation Trust, B15 2TTBirminghamUnited Kingdom
| | - Yujie Cao
- Center for Studies of Information Resources, Wuhan Universtity, Wuhan, ChinaCenter for Studies of Information Resources, Wuhan UniverstityWuhanChina
| | - Steven S. Chong
- National Center for Ecological Analysis and Synthesis, University of California, Santa Barbara, Santa Barbara, United States of AmericaNational Center for Ecological Analysis and Synthesis, University of California, Santa BarbaraSanta BarbaraUnited States of America
- University of Arizona, Tucson, United States of AmericaUniversity of ArizonaTucsonUnited States of America
| | - Hong Cui
- University of Arizona, Tucson, United States of AmericaUniversity of ArizonaTucsonUnited States of America
| |
Collapse
|
6
|
Gkoutos GV, Schofield PN, Hoehndorf R. The anatomy of phenotype ontologies: principles, properties and applications. Brief Bioinform 2018; 19:1008-1021. [PMID: 28387809 PMCID: PMC6169674 DOI: 10.1093/bib/bbx035] [Citation(s) in RCA: 48] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/01/2017] [Revised: 02/05/2017] [Indexed: 12/14/2022] Open
Abstract
The past decade has seen an explosion in the collection of genotype data in domains as diverse as medicine, ecology, livestock and plant breeding. Along with this comes the challenge of dealing with the related phenotype data, which is not only large but also highly multidimensional. Computational analysis of phenotypes has therefore become critical for our ability to understand the biological meaning of genomic data in the biological sciences. At the heart of computational phenotype analysis are the phenotype ontologies. A large number of these ontologies have been developed across many domains, and we are now at a point where the knowledge captured in the structure of these ontologies can be used for the integration and analysis of large interrelated data sets. The Phenotype And Trait Ontology framework provides a method for formal definitions of phenotypes and associated data sets and has proved to be key to our ability to develop methods for the integration and analysis of phenotype data. Here, we describe the development and products of the ontological approach to phenotype capture, the formal content of phenotype ontologies and how their content can be used computationally.
Collapse
Affiliation(s)
| | | | - Robert Hoehndorf
- Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology, King Abdullah University of Science and Technology, Thuwal
| |
Collapse
|
7
|
Tierrafría VH, Mejía-Almonte C, Camacho-Zaragoza JM, Salgado H, Alquicira K, Ishida C, Gama-Castro S, Collado-Vides J. MCO: towards an ontology and unified vocabulary for a framework-based annotation of microbial growth conditions. Bioinformatics 2018; 35:856-864. [PMID: 30137210 PMCID: PMC7963087 DOI: 10.1093/bioinformatics/bty689] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2018] [Revised: 06/22/2018] [Accepted: 08/16/2018] [Indexed: 01/31/2023] Open
Abstract
MOTIVATION A major component in increasing our understanding of the biology of an organism is the mapping of its genotypic potential into its phenotypic expression profiles. This mapping is executed by the machinery of gene regulation, which is essentially studied by changes in growth conditions. Although many efforts have been made to systematize the annotation of experimental conditions in microbiology, the available annotations are not based on a consistent and controlled vocabulary, making difficult the identification of biologically meaningful comparisons of knowledge derived from different experiments or laboratories. RESULTS We curated terms related to experimental conditions that affect gene expression in Escherichia coli K-12. Since this is the best-studied microorganism, the collected terms are the seed for the Microbial Conditions Ontology (MCO), a controlled and structured vocabulary that can be expanded to annotate microbial conditions in general. Moreover, we developed an annotation framework to describe experimental conditions, providing the foundation to identify regulatory networks that operate under particular conditions. AVAILABILITY AND IMPLEMENTATION As far as we know, MCO is the first ontology for growth conditions of any bacterial organism, and it is available at http://regulondb.ccg.unam.mx and https://github.com/microbial-conditions-ontology. Furthermore, we will disseminate MCO throughout the Open Biological and Biomedical Ontology (OBO) Foundry in order to set a standard for the annotation of gene expression data. This will enable comparison of data from diverse data sources. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | | | - J M Camacho-Zaragoza
- Programa de Genómica Computacional, Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Cuernavaca, Morelos, México
| | - H Salgado
- Programa de Genómica Computacional, Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Cuernavaca, Morelos, México
| | - K Alquicira
- Programa de Genómica Computacional, Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Cuernavaca, Morelos, México
| | - C Ishida
- Programa de Genómica Computacional, Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Cuernavaca, Morelos, México
| | | | | |
Collapse
|
8
|
Taboada M, Rodriguez H, Gudivada RC, Martinez D. A new synonym-substitution method to enrich the human phenotype ontology. BMC Bioinformatics 2017; 18:446. [PMID: 29017443 PMCID: PMC5635572 DOI: 10.1186/s12859-017-1858-7] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2017] [Accepted: 10/02/2017] [Indexed: 12/29/2022] Open
Abstract
Background Named entity recognition is critical for biomedical text mining, where it is not unusual to find entities labeled by a wide range of different terms. Nowadays, ontologies are one of the crucial enabling technologies in bioinformatics, providing resources for improved natural language processing tasks. However, biomedical ontology-based named entity recognition continues to be a major research problem. Results This paper presents an automated synonym-substitution method to enrich the Human Phenotype Ontology (HPO) with new synonyms. The approach is mainly based on both the lexical properties of the terms and the hierarchical structure of the ontology. By scanning the lexical difference between a term and its descendant terms, the method can learn new names and modifiers in order to generate synonyms for the descendant terms. By searching for the exact phrases in MEDLINE, the method can automatically rule out illogical candidate synonyms. In total, 745 new terms were identified. These terms were indirectly evaluated through the concept annotations on a gold standard corpus and also by document retrieval on a collection of abstracts on hereditary diseases. A moderate improvement in the F-measure performance on the gold standard corpus was observed. Additionally, 6% more abstracts on hereditary diseases were retrieved, and this percentage was 33% higher if only the highly informative concepts were considered. Conclusions A synonym-substitution procedure that leverages the HPO hierarchical structure works well for a reliable and automatic extension of the terminology. The results show that the generated synonyms have a positive impact on concept recognition, mainly those synonyms corresponding to highly informative HPO terms. Electronic supplementary material The online version of this article (10.1186/s12859-017-1858-7) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Maria Taboada
- Department of Electronics & Computer Science, University of Santiago de Compostela, Campus Vida, Santiago de Compostela, 15705, Spain.
| | - Hadriana Rodriguez
- Department of Electronics & Computer Science, University of Santiago de Compostela, Campus Vida, Santiago de Compostela, 15705, Spain
| | | | - Diego Martinez
- Department of Applied Physics, University of Santiago de Compostela, 15705, Santiago de Compostela, Campus Vida, Spain
| |
Collapse
|
9
|
Boudellioua I, Mahamad Razali RB, Kulmanov M, Hashish Y, Bajic VB, Goncalves-Serra E, Schoenmakers N, Gkoutos GV, Schofield PN, Hoehndorf R. Semantic prioritization of novel causative genomic variants. PLoS Comput Biol 2017; 13:e1005500. [PMID: 28414800 PMCID: PMC5411092 DOI: 10.1371/journal.pcbi.1005500] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2016] [Revised: 05/01/2017] [Accepted: 04/04/2017] [Indexed: 12/14/2022] Open
Abstract
Discriminating the causative disease variant(s) for individuals with inherited or de novo mutations presents one of the main challenges faced by the clinical genetics community today. Computational approaches for variant prioritization include machine learning methods utilizing a large number of features, including molecular information, interaction networks, or phenotypes. Here, we demonstrate the PhenomeNET Variant Predictor (PVP) system that exploits semantic technologies and automated reasoning over genotype-phenotype relations to filter and prioritize variants in whole exome and whole genome sequencing datasets. We demonstrate the performance of PVP in identifying causative variants on a large number of synthetic whole exome and whole genome sequences, covering a wide range of diseases and syndromes. In a retrospective study, we further illustrate the application of PVP for the interpretation of whole exome sequencing data in patients suffering from congenital hypothyroidism. We find that PVP accurately identifies causative variants in whole exome and whole genome sequencing datasets and provides a powerful resource for the discovery of causal variants. We address the problem of how to distinguish which of the many thousands of DNA sequence variants carried by an individual with a rare disease is responsible for the disease phenotypes. This can help clinicians arrive at a diagnosis, but also can be instrumental in improving our understanding of the pathobiology of the disease. Many methods are currently available to help with the problem of determining causative variant, using information about evolutionary conservation and prediction of the functional consequences of the sequence variant. We have developed a novel algorithm (PVP) which augments existing strategies by using the similarity of the patients phenotype to known phenotype-genotype data in human and model organism databases to further rank potential candidate genes. In a retrospective study, we apply PVP to the interpretation of whole exome sequencing data in patients suffering from congenital hypothyroidism, and find that PVP accurately identifies causative variants in whole exome and whole genome sequencing datasets and provides a powerful resource for the discovery of causal variants.
Collapse
Affiliation(s)
- Imane Boudellioua
- King Abdullah University of Science and Technology, Computer, Electrical & Mathematical Sciences and Engineering Division, Computational Bioscience Research Center, Thuwal, Saudi Arabia
| | - Rozaimi B. Mahamad Razali
- King Abdullah University of Science and Technology, Computer, Electrical & Mathematical Sciences and Engineering Division, Computational Bioscience Research Center, Thuwal, Saudi Arabia
| | - Maxat Kulmanov
- King Abdullah University of Science and Technology, Computer, Electrical & Mathematical Sciences and Engineering Division, Computational Bioscience Research Center, Thuwal, Saudi Arabia
| | - Yasmeen Hashish
- King Abdullah University of Science and Technology, Computer, Electrical & Mathematical Sciences and Engineering Division, Computational Bioscience Research Center, Thuwal, Saudi Arabia
| | - Vladimir B. Bajic
- King Abdullah University of Science and Technology, Computer, Electrical & Mathematical Sciences and Engineering Division, Computational Bioscience Research Center, Thuwal, Saudi Arabia
| | - Eva Goncalves-Serra
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, United Kingdom
| | - Nadia Schoenmakers
- University of Cambridge Metabolic Research Laboratories, Wellcome Trust—Medical Research Council, Institute of Metabolic Science, Addenbrooke’s Hospital, Cambridge, United Kingdom
| | - Georgios V. Gkoutos
- College of Medical and Dental Sciences, Institute of Cancer and Genomic Sciences, Centre for Computational Biology, University of Birmingham, Birmingham, United Kingdom
- Institute of Translational Medicine, University Hospitals Birmingham, NHS Foundation Trust, Birmingham, United Kingdom
- Institute of Biological, Environmental and Rural Sciences, Aberystwyth University, Aberystwyth, United Kingdom
- * E-mail: (GVG); (PNS); (RH)
| | - Paul N. Schofield
- Department of Physiology, Development & Neuroscience, University of Cambridge, Cambridge, United Kingdom
- * E-mail: (GVG); (PNS); (RH)
| | - Robert Hoehndorf
- King Abdullah University of Science and Technology, Computer, Electrical & Mathematical Sciences and Engineering Division, Computational Bioscience Research Center, Thuwal, Saudi Arabia
- * E-mail: (GVG); (PNS); (RH)
| |
Collapse
|
10
|
Hochheiser H, Castine M, Harris D, Savova G, Jacobson RS. An information model for computable cancer phenotypes. BMC Med Inform Decis Mak 2016; 16:121. [PMID: 27629872 PMCID: PMC5024416 DOI: 10.1186/s12911-016-0358-4] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2016] [Accepted: 09/01/2016] [Indexed: 12/27/2022] Open
Abstract
BACKGROUND Standards, methods, and tools supporting the integration of clinical data and genomic information are an area of significant need and rapid growth in biomedical informatics. Integration of cancer clinical data and cancer genomic information poses unique challenges, because of the high volume and complexity of clinical data, as well as the heterogeneity and instability of cancer genome data when compared with germline data. Current information models of clinical and genomic data are not sufficiently expressive to represent individual observations and to aggregate those observations into longitudinal summaries over the course of cancer care. These models are acutely needed to support the development of systems and tools for generating the so called clinical "deep phenotype" of individual cancer patients, a process which remains almost entirely manual in cancer research and precision medicine. METHODS Reviews of existing ontologies and interviews with cancer researchers were used to inform iterative development of a cancer phenotype information model. We translated a subset of the Fast Healthcare Interoperability Resources (FHIR) models into the OWL 2 Description Logic (DL) representation, and added extensions as needed for modeling cancer phenotypes with terms derived from the NCI Thesaurus. Models were validated with domain experts and evaluated against competency questions. RESULTS The DeepPhe Information model represents cancer phenotype data at increasing levels of abstraction from mention level in clinical documents to summaries of key events and findings. We describe the model using breast cancer as an example, depicting methods to represent phenotypic features of cancers, tumors, treatment regimens, and specific biologic behaviors that span the entire course of a patient's disease. CONCLUSIONS We present a multi-scale information model for representing individual document mentions, document level classifications, episodes along a disease course, and phenotype summarization, linking individual observations to high-level summaries in support of subsequent integration and analysis.
Collapse
Affiliation(s)
- Harry Hochheiser
- Department of Biomedical Informatics, University of Pittsburgh School of Medicine, 5607 Baum Boulevard, Rm 523, Pittsburgh, 15206-3701, PA, USA. .,Intelligent Systems Program, University of Pittsburgh, Pittsburgh, PA, USA.
| | - Melissa Castine
- Department of Biomedical Informatics, University of Pittsburgh School of Medicine, 5607 Baum Boulevard, Rm 523, Pittsburgh, 15206-3701, PA, USA
| | - David Harris
- Boston Children's Hospital and Harvard Medical School, Boston, MA, USA
| | - Guergana Savova
- Boston Children's Hospital and Harvard Medical School, Boston, MA, USA
| | - Rebecca S Jacobson
- Department of Biomedical Informatics, University of Pittsburgh School of Medicine, 5607 Baum Boulevard, Rm 523, Pittsburgh, 15206-3701, PA, USA.,Intelligent Systems Program, University of Pittsburgh, Pittsburgh, PA, USA.,University of Pittsburgh Cancer Institute, Pittsburgh, PA, USA
| |
Collapse
|
11
|
Diehl AD, Meehan TF, Bradford YM, Brush MH, Dahdul WM, Dougall DS, He Y, Osumi-Sutherland D, Ruttenberg A, Sarntivijai S, Van Slyke CE, Vasilevsky NA, Haendel MA, Blake JA, Mungall CJ. The Cell Ontology 2016: enhanced content, modularization, and ontology interoperability. J Biomed Semantics 2016; 7:44. [PMID: 27377652 PMCID: PMC4932724 DOI: 10.1186/s13326-016-0088-7] [Citation(s) in RCA: 145] [Impact Index Per Article: 16.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2016] [Accepted: 06/23/2016] [Indexed: 12/04/2022] Open
Abstract
BACKGROUND The Cell Ontology (CL) is an OBO Foundry candidate ontology covering the domain of canonical, natural biological cell types. Since its inception in 2005, the CL has undergone multiple rounds of revision and expansion, most notably in its representation of hematopoietic cells. For in vivo cells, the CL focuses on vertebrates but provides general classes that can be used for other metazoans, which can be subtyped in species-specific ontologies. CONSTRUCTION AND CONTENT Recent work on the CL has focused on extending the representation of various cell types, and developing new modules in the CL itself, and in related ontologies in coordination with the CL. For example, the Kidney and Urinary Pathway Ontology was used as a template to populate the CL with additional cell types. In addition, subtypes of the class 'cell in vitro' have received improved definitions and labels to provide for modularity with the representation of cells in the Cell Line Ontology and Reagent Ontology. Recent changes in the ontology development methodology for CL include a switch from OBO to OWL for the primary encoding of the ontology, and an increasing reliance on logical definitions for improved reasoning. UTILITY AND DISCUSSION The CL is now mandated as a metadata standard for large functional genomics and transcriptomics projects, and is used extensively for annotation, querying, and analyses of cell type specific data in sequencing consortia such as FANTOM5 and ENCODE, as well as for the NIAID ImmPort database and the Cell Image Library. The CL is also a vital component used in the modular construction of other biomedical ontologies-for example, the Gene Ontology and the cross-species anatomy ontology, Uberon, use CL to support the consistent representation of cell types across different levels of anatomical granularity, such as tissues and organs. CONCLUSIONS The ongoing improvements to the CL make it a valuable resource to both the OBO Foundry community and the wider scientific community, and we continue to experience increased interest in the CL both among developers and within the user community.
Collapse
Affiliation(s)
- Alexander D. Diehl
- />Department of Neurology, University at Buffalo School of Medicine and Biomedical Sciences, Buffalo, NY 14203 USA
| | - Terrence F. Meehan
- />European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, Cambridge, CB10 1SD UK
| | - Yvonne M. Bradford
- />ZFIN, the Zebrafish Model Organism Database, 5291 University of Oregon, Eugene, OR 97403 USA
| | - Matthew H. Brush
- />Ontology Development Group, Library, Oregon Health and Science University, Portland, Oregon 97239 USA
| | - Wasila M. Dahdul
- />Department of Biology, University of South Dakota, Vermillion, SD 57069 USA
- />National Evolutionary Synthesis Center, Durham, NC 27705 USA
| | - David S. Dougall
- />Southwestern Medical Center, University of Texas, Dallas, TX 75235 USA
| | - Yongqun He
- />Unit for Laboratory Animal Medicine, University of Michigan Medical School, Ann Arbor, MI 48109 USA
| | - David Osumi-Sutherland
- />European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, Cambridge, CB10 1SD UK
| | - Alan Ruttenberg
- />Oral Diagnostics Sciences, University at Buffalo School of Dental Medicine, Buffalo, NY 14210 USA
| | - Sirarat Sarntivijai
- />European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, Cambridge, CB10 1SD UK
| | - Ceri E. Van Slyke
- />ZFIN, the Zebrafish Model Organism Database, 5291 University of Oregon, Eugene, OR 97403 USA
| | - Nicole A. Vasilevsky
- />Ontology Development Group, Library, Oregon Health and Science University, Portland, Oregon 97239 USA
| | - Melissa A. Haendel
- />Ontology Development Group, Library, Oregon Health and Science University, Portland, Oregon 97239 USA
| | | | | |
Collapse
|
12
|
Sarntivijai S, Zhang S, Jagannathan DG, Zaman S, Burkhart KK, Omenn GS, He Y, Athey BD, Abernethy DR. Linking MedDRA(®)-Coded Clinical Phenotypes to Biological Mechanisms by the Ontology of Adverse Events: A Pilot Study on Tyrosine Kinase Inhibitors. Drug Saf 2016; 39:697-707. [PMID: 27003817 PMCID: PMC4933310 DOI: 10.1007/s40264-016-0414-0] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
Abstract
INTRODUCTION A translational bioinformatics challenge exists in connecting population and individual clinical phenotypes in various formats to biological mechanisms. The Medical Dictionary for Regulatory Activities (MedDRA(®)) is the default dictionary for adverse event (AE) reporting in the US Food and Drug Administration Adverse Event Reporting System (FAERS). The ontology of adverse events (OAE) represents AEs as pathological processes occurring after drug exposures. OBJECTIVES The aim of this work was to establish a semantic framework to link biological mechanisms to phenotypes of AEs by combining OAE with MedDRA(®) in FAERS data analysis. We investigated the AEs associated with tyrosine kinase inhibitors (TKIs) and monoclonal antibodies (mAbs) targeting tyrosine kinases. The five selected TKIs/mAbs (i.e., dasatinib, imatinib, lapatinib, cetuximab, and trastuzumab) are known to induce impaired ventricular function (non-QT) cardiotoxicity. RESULTS Statistical analysis of FAERS data identified 1053 distinct MedDRA(®) terms significantly associated with TKIs/mAbs, where 884 did not have corresponding OAE terms. We manually annotated these terms, added them to OAE by the standard OAE development strategy, and mapped them to MedDRA(®). The data integration to provide insights into molecular mechanisms of drug-associated AEs was performed by including linkages in OAE for all related AE terms to MedDRA(®) and the existing ontologies, including the human phenotype ontology (HP), Uber anatomy ontology (UBERON), and gene ontology (GO). Sixteen AEs were shared by all five TKIs/mAbs, and each of 17 cardiotoxicity AEs was associated with at least one TKI/mAb. As an example, we analyzed "cardiac failure" using the relations established in OAE with other ontologies and demonstrated that one of the biological processes associated with cardiac failure maps to the genes associated with heart contraction. CONCLUSION By expanding the existing OAE ontological design, our TKI use case demonstrated that the combination of OAE and MedDRA(®) provides a semantic framework to link clinical phenotypes of adverse drug events to biological mechanisms.
Collapse
Affiliation(s)
- Sirarat Sarntivijai
- Office of Clinical Pharmacology, Office of Translational Science, Center for Drug Evaluation and Research, Food and Drug Administration, Silver Spring, MD, USA.
- Center for Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, 48109, USA.
| | - Shelley Zhang
- College of Literature, Science, and the Arts, University of Michigan, Ann Arbor, MI, USA
| | | | - Shadia Zaman
- Office of Clinical Pharmacology, Office of Translational Science, Center for Drug Evaluation and Research, Food and Drug Administration, Silver Spring, MD, USA
| | - Keith K Burkhart
- Office of Clinical Pharmacology, Office of Translational Science, Center for Drug Evaluation and Research, Food and Drug Administration, Silver Spring, MD, USA
| | - Gilbert S Omenn
- Department of Internal Medicine and Human Genetics and School of Public Health, University of Michigan, Ann Arbor, MI, USA
- Center for Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, 48109, USA
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Yongqun He
- Unit of Laboratory Animal Medicine, University of Michigan, Ann Arbor, MI, USA
- Center for Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, 48109, USA
- Department of Microbiology and Immunology, University of Michigan, Ann Arbor, MI, USA
| | - Brian D Athey
- Center for Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, 48109, USA
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
- Psychiatry Department, University of Michigan, Ann Arbor, MI, USA
| | - Darrell R Abernethy
- Office of Clinical Pharmacology, Office of Translational Science, Center for Drug Evaluation and Research, Food and Drug Administration, Silver Spring, MD, USA
| |
Collapse
|
13
|
Robinson PN, Mungall CJ, Haendel M. Capturing phenotypes for precision medicine. Cold Spring Harb Mol Case Stud 2016; 1:a000372. [PMID: 27148566 PMCID: PMC4850887 DOI: 10.1101/mcs.a000372] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
Deep phenotyping followed by integrated computational analysis of genotype and phenotype is becoming ever more important for many areas of genomic diagnostics and translational research. The overwhelming majority of clinical descriptions in the medical literature are available only as natural language text, meaning that searching, analysis, and integration of medically relevant information in databases such as PubMed is challenging. The new journal Cold Spring Harbor Molecular Case Studies will require authors to select Human Phenotype Ontology terms for research papers that will be displayed alongside the manuscript, thereby providing a foundation for ontology-based indexing and searching of articles that contain descriptions of phenotypic abnormalities-an important step toward improving the ability of researchers and clinicians to get biomedical information that is critical for clinical care or translational research.
Collapse
Affiliation(s)
- Peter N Robinson
- Institute for Medical Genetics and Human Genetics, Charité-Universitätsmedizin Berlin, 10117 Berlin, Germany;; Max Planck Institute for Molecular Genetics, 14195 Berlin, Germany;; Berlin Brandenburg Center for Regenerative Therapies (BCRT), Charité-Universitätsmedizin Berlin, 13353 Berlin, Germany;; Institute for Bioinformatics, Department of Mathematics and Computer Science, Freie Universität Berlin, 14195 Berlin, Germany
| | | | - Melissa Haendel
- Oregon Health and Science University, Portland, Oregon 97239, USA
| |
Collapse
|
14
|
Oellrich A, Meehan TF, Parkinson H, Sarntivijai S, White JK, Karp NA. Reporting phenotypes in mouse models when considering body size as a potential confounder. J Biomed Semantics 2016; 7:2. [PMID: 26865945 PMCID: PMC4748495 DOI: 10.1186/s13326-016-0050-8] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2015] [Accepted: 02/02/2016] [Indexed: 01/09/2023] Open
Abstract
Genotype-phenotype studies aim to identify causative relationships between genes and phenotypes. The International Mouse Phenotyping Consortium is a high throughput phenotyping program whose goal is to collect phenotype data for a knockout mouse strain of every protein coding gene. The scale of the project requires an automatic analysis pipeline to detect abnormal phenotypes, and disseminate the resulting gene-phenotype annotation data into public resources. A body weight phenotype is a common result of knockout studies. As body weight correlates with many other biological traits, this challenges the interpretation of related gene-phenotype associations. Co-correlation can lead to gene-phenotype associations that are potentially misleading. Here we use statistical modelling to account for body weight as a potential confounder to assess the impact. We find that there is a considerable impact on previously established gene-phenotype associations due to an increase in sensitivity as well as the confounding effect. We investigated the existing ontologies to represent this phenotypic information and we explored ways to ontologically represent the results of the influence of confounders on gene-phenotype associations. With the scale of data being disseminated within the high throughput programs and the range of downstream studies that utilise these data, it is critical to consider how we improve the quality of the disseminated data and provide a robust ontological representation.
Collapse
Affiliation(s)
- Anika Oellrich
- />Mouse Informatics Group, Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire UK
- />Social Genetic & Developmental Psychiatry, King’s College London, London, UK
| | - Terrence F. Meehan
- />Samples, Phenotypes and Ontologies, European Molecular Biology Laboratory—European Bioinformatics Institute, Hinxton, Cambridge UK
| | - Helen Parkinson
- />Samples, Phenotypes and Ontologies, European Bioinformatics Institute (EMBL-EBI), European Molecular Biology Laboratory, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD UK
| | - Sirarat Sarntivijai
- />Samples, Phenotypes and Ontologies, European Bioinformatics Institute (EMBL-EBI), European Molecular Biology Laboratory, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD UK
- />The Centre for Therapeutic Target Validation, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD UK
| | - Jacqueline K. White
- />Mouse Genetics Project, Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire UK
| | - Natasha A. Karp
- />Mouse Informatics Group, Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire UK
| |
Collapse
|
15
|
Ceusters W, Nasri-Heir C, Alnaas D, Cairns BE, Michelotti A, Ohrbach R. Perspectives on next steps in classification of oro-facial pain - Part 3: biomarkers of chronic oro-facial pain - from research to clinic. J Oral Rehabil 2015; 42:956-66. [PMID: 26200973 PMCID: PMC4715524 DOI: 10.1111/joor.12324] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/31/2015] [Indexed: 11/28/2022]
Abstract
The purpose of this study was to review the current status of biomarkers used in oro-facial pain conditions. Specifically, we critically appraise their relative strengths and weaknesses for assessing mechanisms associated with the oro-facial pain conditions and interpret that information in the light of their current value for use in diagnosis. In the third section, we explore biomarkers through the perspective of ontological realism. We discuss ontological problems of biomarkers as currently widely conceptualised and implemented. This leads to recommendations for research practice aimed to a better understanding of the potential contribution that biomarkers might make to oro-facial pain diagnosis and thereby fulfil our goal for an expanded multidimensional framework for oro-facial pain conditions that would include a third axis.
Collapse
Affiliation(s)
- Werner Ceusters
- Department of Biomedical Informatics, University at Buffalo, NY, USA
| | | | | | - Brian E Cairns
- Faculty of Pharmaceutical Sciences, University of British Columbia, Vancouver, Canada
| | - Ambra Michelotti
- Section of Orthodontics, School of Dentistry, University of Naples Federico II, Naples, Italy
| | - Richard Ohrbach
- Department of Oral Diagnostic Sciences, University at Buffalo, NY, USA
| |
Collapse
|
16
|
Smedley D, Jacobsen JOB, Jäger M, Köhler S, Holtgrewe M, Schubach M, Siragusa E, Zemojtel T, Buske OJ, Washington NL, Bone WP, Haendel MA, Robinson PN. Next-generation diagnostics and disease-gene discovery with the Exomiser. Nat Protoc 2015; 10:2004-15. [PMID: 26562621 DOI: 10.1038/nprot.2015.124] [Citation(s) in RCA: 247] [Impact Index Per Article: 24.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Abstract
Exomiser is an application that prioritizes genes and variants in next-generation sequencing (NGS) projects for novel disease-gene discovery or differential diagnostics of Mendelian disease. Exomiser comprises a suite of algorithms for prioritizing exome sequences using random-walk analysis of protein interaction networks, clinical relevance and cross-species phenotype comparisons, as well as a wide range of other computational filters for variant frequency, predicted pathogenicity and pedigree analysis. In this protocol, we provide a detailed explanation of how to install Exomiser and use it to prioritize exome sequences in a number of scenarios. Exomiser requires ∼3 GB of RAM and roughly 15-90 s of computing time on a standard desktop computer to analyze a variant call format (VCF) file. Exomiser is freely available for academic use from http://www.sanger.ac.uk/science/tools/exomiser.
Collapse
Affiliation(s)
- Damian Smedley
- Skarnes Faculty Group, Wellcome Trust Sanger Institute, Hinxton, UK
| | | | - Marten Jäger
- Institute for Medical and Human Genetics, Charité-Universitätsmedizin Berlin, Berlin, Germany.,Berlin Brandenburg Center for Regenerative Therapies (BCRT), Charité-Universitätsmedizin Berlin, Berlin, Germany
| | - Sebastian Köhler
- Institute for Medical and Human Genetics, Charité-Universitätsmedizin Berlin, Berlin, Germany
| | - Manuel Holtgrewe
- Institute for Medical and Human Genetics, Charité-Universitätsmedizin Berlin, Berlin, Germany.,Berlin Institute for Health, Berlin, Germany
| | - Max Schubach
- Institute for Medical and Human Genetics, Charité-Universitätsmedizin Berlin, Berlin, Germany
| | - Enrico Siragusa
- Institute for Medical and Human Genetics, Charité-Universitätsmedizin Berlin, Berlin, Germany.,Berlin Institute for Health, Berlin, Germany.,Max Planck Institute for Molecular Genetics, Berlin, Germany
| | - Tomasz Zemojtel
- Institute for Medical and Human Genetics, Charité-Universitätsmedizin Berlin, Berlin, Germany.,Institute of Bioorganic Chemistry, Polish Academy of Sciences, Poznan, Poland.,Labor Berlin - Charité Vivantes, Humangenetik, Berlin, Germany
| | - Orion J Buske
- Department of Computer Science, University of Toronto, Toronto, Ontario, Canada.,Genetics and Genome Biology, The Hospital for Sick Children, Toronto, Ontario, Canada
| | - Nicole L Washington
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, California, USA
| | - William P Bone
- The National Institutes of Health (NIH) Undiagnosed Diseases Program, Common Fund, Office of the Director, NIH, Bethesda, Maryland, USA
| | - Melissa A Haendel
- Department of Medical Informatics and Clinical Epidemiology, Oregon Health &Science University, Portland, Oregon, USA
| | - Peter N Robinson
- Institute for Medical and Human Genetics, Charité-Universitätsmedizin Berlin, Berlin, Germany.,Berlin Brandenburg Center for Regenerative Therapies (BCRT), Charité-Universitätsmedizin Berlin, Berlin, Germany.,Max Planck Institute for Molecular Genetics, Berlin, Germany.,Department of Mathematics and Computer Science, Institute for Bioinformatics, Freie Universität Berlin, Berlin, Germany
| |
Collapse
|
17
|
Collier N, Groza T, Smedley D, Robinson PN, Oellrich A, Rebholz-Schuhmann D. PhenoMiner: from text to a database of phenotypes associated with OMIM diseases. Database (Oxford) 2015; 2015:bav104. [PMID: 26507285 PMCID: PMC4622021 DOI: 10.1093/database/bav104] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2015] [Revised: 07/24/2015] [Accepted: 08/27/2015] [Indexed: 11/27/2022]
Abstract
Analysis of scientific and clinical phenotypes reported in the experimental literature has been curated manually to build high-quality databases such as the Online Mendelian Inheritance in Man (OMIM). However, the identification and harmonization of phenotype descriptions struggles with the diversity of human expressivity. We introduce a novel automated extraction approach called PhenoMiner that exploits full parsing and conceptual analysis. Apriori association mining is then used to identify relationships to human diseases. We applied PhenoMiner to the BMC open access collection and identified 13,636 phenotype candidates. We identified 28,155 phenotype-disorder hypotheses covering 4898 phenotypes and 1659 Mendelian disorders. Analysis showed: (i) the semantic distribution of the extracted terms against linked ontologies; (ii) a comparison of term overlap with the Human Phenotype Ontology (HP); (iii) moderate support for phenotype-disorder pairs in both OMIM and the literature; (iv) strong associations of phenotype-disorder pairs to known disease-genes pairs using PhenoDigm. The full list of PhenoMiner phenotypes (S1), phenotype-disorder associations (S2), association-filtered linked data (S3) and user database documentation (S5) is available as supplementary data and can be downloaded at http://github.com/nhcollier/PhenoMiner under a Creative Commons Attribution 4.0 license. Database URL: phenominer.mml.cam.ac.uk.
Collapse
Affiliation(s)
- Nigel Collier
- The University of Cambridge, Cambridge, CB3 9DB, UK, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, UK,
| | - Tudor Groza
- Garvan Institute of Medical Research, Darlinghurst, Sydney, NSW 2010, Australia, School of ITEE, The University of Queensland, St. Lucia, QLD 4072, Australia
| | - Damian Smedley
- Mouse Informatics Group, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton CB10 1SA, UK
| | - Peter N Robinson
- Institute for Medical Genetics and Human Genetics, Charité-Universitatsmedizin Berlin, 13353 Berlin, Germany and
| | - Anika Oellrich
- Mouse Informatics Group, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton CB10 1SA, UK
| | | |
Collapse
|
18
|
Oellrich A, Collier N, Groza T, Rebholz-Schuhmann D, Shah N, Bodenreider O, Boland MR, Georgiev I, Liu H, Livingston K, Luna A, Mallon AM, Manda P, Robinson PN, Rustici G, Simon M, Wang L, Winnenburg R, Dumontier M. The digital revolution in phenotyping. Brief Bioinform 2015; 17:819-30. [PMID: 26420780 PMCID: PMC5036847 DOI: 10.1093/bib/bbv083] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2015] [Indexed: 12/22/2022] Open
Abstract
Phenotypes have gained increased notoriety in the clinical and biological domain owing to their application in numerous areas such as the discovery of disease genes and drug targets, phylogenetics and pharmacogenomics. Phenotypes, defined as observable characteristics of organisms, can be seen as one of the bridges that lead to a translation of experimental findings into clinical applications and thereby support 'bench to bedside' efforts. However, to build this translational bridge, a common and universal understanding of phenotypes is required that goes beyond domain-specific definitions. To achieve this ambitious goal, a digital revolution is ongoing that enables the encoding of data in computer-readable formats and the data storage in specialized repositories, ready for integration, enabling translational research. While phenome research is an ongoing endeavor, the true potential hidden in the currently available data still needs to be unlocked, offering exciting opportunities for the forthcoming years. Here, we provide insights into the state-of-the-art in digital phenotyping, by means of representing, acquiring and analyzing phenotype data. In addition, we provide visions of this field for future research work that could enable better applications of phenotype data.
Collapse
|
19
|
Collier N, Oellrich A, Groza T. Toward knowledge support for analysis and interpretation of complex traits. Genome Biol 2015; 14:214. [PMID: 24079802 PMCID: PMC4053827 DOI: 10.1186/gb-2013-14-9-214] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2022] Open
Abstract
The systematic description of complex traits, from the organism to the cellular level, is important for hypothesis generation about underlying disease mechanisms. We discuss how intelligent algorithms might provide support, leading to faster throughput.
Collapse
|
20
|
Deans AR, Lewis SE, Huala E, Anzaldo SS, Ashburner M, Balhoff JP, Blackburn DC, Blake JA, Burleigh JG, Chanet B, Cooper LD, Courtot M, Csösz S, Cui H, Dahdul W, Das S, Dececchi TA, Dettai A, Diogo R, Druzinsky RE, Dumontier M, Franz NM, Friedrich F, Gkoutos GV, Haendel M, Harmon LJ, Hayamizu TF, He Y, Hines HM, Ibrahim N, Jackson LM, Jaiswal P, James-Zorn C, Köhler S, Lecointre G, Lapp H, Lawrence CJ, Le Novère N, Lundberg JG, Macklin J, Mast AR, Midford PE, Mikó I, Mungall CJ, Oellrich A, Osumi-Sutherland D, Parkinson H, Ramírez MJ, Richter S, Robinson PN, Ruttenberg A, Schulz KS, Segerdell E, Seltmann KC, Sharkey MJ, Smith AD, Smith B, Specht CD, Squires RB, Thacker RW, Thessen A, Fernandez-Triana J, Vihinen M, Vize PD, Vogt L, Wall CE, Walls RL, Westerfeld M, Wharton RA, Wirkner CS, Woolley JB, Yoder MJ, Zorn AM, Mabee P. Finding our way through phenotypes. PLoS Biol 2015; 13:e1002033. [PMID: 25562316 PMCID: PMC4285398 DOI: 10.1371/journal.pbio.1002033] [Citation(s) in RCA: 124] [Impact Index Per Article: 12.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023] Open
Abstract
Despite a large and multifaceted effort to understand the vast landscape of phenotypic data, their current form inhibits productive data analysis. The lack of a community-wide, consensus-based, human- and machine-interpretable language for describing phenotypes and their genomic and environmental contexts is perhaps the most pressing scientific bottleneck to integration across many key fields in biology, including genomics, systems biology, development, medicine, evolution, ecology, and systematics. Here we survey the current phenomics landscape, including data resources and handling, and the progress that has been made to accurately capture relevant data descriptions for phenotypes. We present an example of the kind of integration across domains that computable phenotypes would enable, and we call upon the broader biology community, publishers, and relevant funding agencies to support efforts to surmount today's data barriers and facilitate analytical reproducibility.
Collapse
Affiliation(s)
- Andrew R. Deans
- Department of Entomology, Pennsylvania State University, University Park, Pennsylvania, United States of America
| | - Suzanna E. Lewis
- Genome Division, Lawrence Berkeley National Lab, Berkeley, California, United States of America
| | - Eva Huala
- Department of Plant Biology, Carnegie Institution for Science, Stanford, California, United States of America
- Phoenix Bioinformatics, Palo Alto, California, United States of America
| | - Salvatore S. Anzaldo
- School of Life Sciences, Arizona State University, Tempe, Arizona, United States of America
| | - Michael Ashburner
- Department of Genetics, University of Cambridge, Cambridge, United Kingdom
| | - James P. Balhoff
- National Evolutionary Synthesis Center, Durham, North Carolina, United States of America
| | - David C. Blackburn
- Department of Vertebrate Zoology and Anthropology, California Academy of Sciences, San Francisco, California, United States of America
| | - Judith A. Blake
- The Jackson Laboratory, Bar Harbor, Maine, United States of America
| | - J. Gordon Burleigh
- Department of Biology, University of Florida, Gainesville, Florida, United States of America
| | - Bruno Chanet
- Muséum national d'Histoire naturelle, Département Systématique et Evolution, Paris, France
| | - Laurel D. Cooper
- Department of Botany and Plant Pathology, Oregon State University, Corvallis, Oregon, United States of America
| | - Mélanie Courtot
- Molecular Biology and Biochemistry Department, Simon Fraser University, Burnaby, British Columbia, Canada
| | - Sándor Csösz
- MTA-ELTE-MTM, Ecology Research Group, Pázmány Péter sétány 1C, Budapest, Hungary
| | - Hong Cui
- School of Information Resources and Library Science, University of Arizona, Tucson, Arizona, United States of America
| | - Wasila Dahdul
- Department of Biology, University of South Dakota, Vermillion, South Dakota, United States of America
| | - Sandip Das
- Department of Botany, University of Delhi, Delhi, India
| | - T. Alexander Dececchi
- Department of Biology, University of South Dakota, Vermillion, South Dakota, United States of America
| | - Agnes Dettai
- Muséum national d'Histoire naturelle, Département Systématique et Evolution, Paris, France
| | - Rui Diogo
- Department of Anatomy, Howard University College of Medicine, Washington D.C., United States of America
| | - Robert E. Druzinsky
- Department of Oral Biology, College of Dentistry, University of Illinois, Chicago, Illinois, United States of America
| | - Michel Dumontier
- Stanford Center for Biomedical Informatics Research, Stanford, California, United States of America
| | - Nico M. Franz
- School of Life Sciences, Arizona State University, Tempe, Arizona, United States of America
| | - Frank Friedrich
- Biocenter Grindel and Zoological Museum, Hamburg University, Hamburg, Germany
| | - George V. Gkoutos
- Department of Computer Science, Aberystwyth University, Aberystwyth, Ceredigion, United Kingdom
| | - Melissa Haendel
- Department of Medical Informatics & Epidemiology, Oregon Health & Science University, Portland, Oregon, United States of America
| | - Luke J. Harmon
- Department of Biological Sciences, University of Idaho, Moscow, Idaho, United States of America
| | - Terry F. Hayamizu
- Mouse Genome Informatics, The Jackson Laboratory, Bar Harbor, Maine, United States of America
| | - Yongqun He
- Unit for Laboratory Animal Medicine, Department of Microbiology and Immunology, Center for Computational Medicine and Bioinformatics, and Comprehensive Cancer Center, University of Michigan Medical School, Ann Arbor, Michigan, United States of America
| | - Heather M. Hines
- Department of Entomology, Pennsylvania State University, University Park, Pennsylvania, United States of America
| | - Nizar Ibrahim
- Department of Organismal Biology and Anatomy, University of Chicago, Chicago, Illinois, United States of America
| | - Laura M. Jackson
- Department of Biology, University of South Dakota, Vermillion, South Dakota, United States of America
| | - Pankaj Jaiswal
- Department of Botany and Plant Pathology, Oregon State University, Corvallis, Oregon, United States of America
| | - Christina James-Zorn
- Cincinnati Children's Hospital, Division of Developmental Biology, Cincinnati, Ohio, United States of America
| | - Sebastian Köhler
- Institute for Medical Genetics and Human Genetics, Charité-Universitätsmedizin Berlin, Berlin, Germany
| | - Guillaume Lecointre
- Muséum national d'Histoire naturelle, Département Systématique et Evolution, Paris, France
| | - Hilmar Lapp
- National Evolutionary Synthesis Center, Durham, North Carolina, United States of America
| | - Carolyn J. Lawrence
- Department of Genetics, Development and Cell Biology and Department of Agronomy, Iowa State University, Ames, Iowa, United States of America
| | | | - John G. Lundberg
- Department of Ichthyology, The Academy of Natural Sciences, Philadelphia, Pennsylvania, United States of America
| | - James Macklin
- Eastern Cereal and Oilseed Research Centre, Ottawa, Ontario, Canada
| | - Austin R. Mast
- Department of Biological Science, Florida State University, Tallahassee, Florida, United States of America
| | | | - István Mikó
- Department of Entomology, Pennsylvania State University, University Park, Pennsylvania, United States of America
| | - Christopher J. Mungall
- Genome Division, Lawrence Berkeley National Lab, Berkeley, California, United States of America
| | - Anika Oellrich
- European Molecular Biology Laboratory - European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, United Kingdom
| | - David Osumi-Sutherland
- European Molecular Biology Laboratory - European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, United Kingdom
| | - Helen Parkinson
- European Molecular Biology Laboratory - European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, United Kingdom
| | - Martín J. Ramírez
- Division of Arachnology, Museo Argentino de Ciencias Naturales - CONICET, Buenos Aires, Argentina
| | - Stefan Richter
- Allgemeine & Spezielle Zoologie, Institut für Biowissenschaften, Universität Rostock, Universitätsplatz 2, Rostock, Germany
| | - Peter N. Robinson
- Institut für Medizinische Genetik und Humangenetik Charité – Universitätsmedizin Berlin, Berlin, Germany
| | - Alan Ruttenberg
- School of Dental Medicine, University at Buffalo, Buffalo, New York, United States of America
| | - Katja S. Schulz
- Smithsonian Institution, National Museum of Natural History, Washington, D.C., United States of America
| | - Erik Segerdell
- Knight Cancer Institute, Oregon Health & Science University, Portland, Oregon, United States of America
| | - Katja C. Seltmann
- Division of Invertebrate Zoology, American Museum of Natural History, New York, New York, United States of America
| | - Michael J. Sharkey
- Department of Entomology, University of Kentucky, Lexington, Kentucky, United States of America
| | - Aaron D. Smith
- Department of Biological Sciences, Northern Arizona University, Flagstaff, Arizona, United States of America
| | - Barry Smith
- Department of Philosophy, University at Buffalo, Buffalo, New York, United States of America
| | - Chelsea D. Specht
- Department of Plant and Microbial Biology, Integrative Biology, and the University and Jepson Herbaria, University of California, Berkeley, California, United States of America
| | - R. Burke Squires
- Bioinformatics and Computational Biosciences Branch, Office of Cyber Infrastructure and Computational Biology, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, Maryland, United States of America
| | - Robert W. Thacker
- Department of Biology, University of Alabama at Birmingham, Birmingham, Alabama, United States of America
| | - Anne Thessen
- The Data Detektiv, 1412 Stearns Hill Road, Waltham, Massachusetts, United States of America
| | | | - Mauno Vihinen
- Department of Experimental Medical Science, Lund University, Lund, Sweden
| | - Peter D. Vize
- Department of Biological Sciences, University of Calgary, Calgary, Alberta, Canada
| | - Lars Vogt
- Universität Bonn, Institut für Evolutionsbiologie und Ökologie, Bonn, Germany
| | - Christine E. Wall
- Department of Evolutionary Anthropology, Duke University, Durham, North Carolina, United States of America
| | - Ramona L. Walls
- iPlant Collaborative University of Arizona, Thomas J. Keating Bioresearch Building, Tucson, Arizona, United States of America
| | - Monte Westerfeld
- Institute of Neuroscience, University of Oregon, Eugene, Oregon, United States of America
| | - Robert A. Wharton
- Department of Entomology, Texas A & M University, College, Station, Texas, United States of America
| | - Christian S. Wirkner
- Allgemeine & Spezielle Zoologie, Institut für Biowissenschaften, Universität Rostock, Universitätsplatz 2, Rostock, Germany
| | - James B. Woolley
- Department of Entomology, Texas A & M University, College, Station, Texas, United States of America
| | - Matthew J. Yoder
- Illinois Natural History Survey, University of Illinois, Champaign, Illinois, United States of America
| | - Aaron M. Zorn
- Cincinnati Children's Hospital, Division of Developmental Biology, Cincinnati, Ohio, United States of America
| | - Paula Mabee
- Department of Biology, University of South Dakota, Vermillion, South Dakota, United States of America
| |
Collapse
|
21
|
Hancock JM. Commentary on Shimoyama et al. (2012): three ontologies to define phenotype measurement data. Front Genet 2014; 5:93. [PMID: 24795755 PMCID: PMC4006037 DOI: 10.3389/fgene.2014.00093] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2014] [Accepted: 04/03/2014] [Indexed: 01/17/2023] Open
Affiliation(s)
- John M Hancock
- Department of Physiology, Development and Neuroscience, University of Cambridge Cambridge, UK
| |
Collapse
|
22
|
Abstract
The use of model organisms as tools for the investigation of human genetic variation has significantly and rapidly advanced our understanding of the aetiologies underlying hereditary traits. However, while equivalences in the DNA sequence of two species may be readily inferred through evolutionary models, the identification of equivalence in the phenotypic consequences resulting from comparable genetic variation is far from straightforward, limiting the value of the modelling paradigm. In this review, we provide an overview of the emerging statistical and computational approaches to objectively identify phenotypic equivalence between human and model organisms with examples from the vertebrate models, mouse and zebrafish. Firstly, we discuss enrichment approaches, which deem the most frequent phenotype among the orthologues of a set of genes associated with a common human phenotype as the orthologous phenotype, or phenolog, in the model species. Secondly, we introduce and discuss computational reasoning approaches to identify phenotypic equivalences made possible through the development of intra- and interspecies ontologies. Finally, we consider the particular challenges involved in modelling neuropsychiatric disorders, which illustrate many of the remaining difficulties in developing comprehensive and unequivocal interspecies phenotype mappings.
Collapse
Affiliation(s)
- Peter N. Robinson
- Institute for Medical Genetics and Human Genetics, Charité-Universitätsmedizin Berlin, Berlin, Germany
- Berlin Brandenburg Center for Regenerative Therapies (BCRT), Charité-Universitätsmedizin Berlin, Berlin, Germany
- Max Planck Institute for Molecular Genetics, Berlin, Germany
- Institute for Bioinformatics, Department of Mathematics and Computer Science, Freie Universität Berlin, Berlin, Germany
- * E-mail: (PNR); (CW)
| | - Caleb Webber
- MRC Functional Genomics Unit, Department of Physiology, Anatomy and Genetics, University of Oxford, Oxford, United Kingdom
- * E-mail: (PNR); (CW)
| |
Collapse
|
23
|
InterMOD: integrated data and tools for the unification of model organism research. Sci Rep 2014; 3:1802. [PMID: 23652793 PMCID: PMC3647165 DOI: 10.1038/srep01802] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2012] [Accepted: 04/05/2013] [Indexed: 11/26/2022] Open
Abstract
Model organisms are widely used for understanding basic biology, and have significantly contributed to the study of human disease. In recent years, genomic analysis has provided extensive evidence of widespread conservation of gene sequence and function amongst eukaryotes, allowing insights from model organisms to help decipher gene function in a wider range of species. The InterMOD consortium is developing an infrastructure based around the InterMine data warehouse system to integrate genomic and functional data from a number of key model organisms, leading the way to improved cross-species research. So far including budding yeast, nematode worm, fruit fly, zebrafish, rat and mouse, the project has set up data warehouses, synchronized data models, and created analysis tools and links between data from different species. The project unites a number of major model organism databases, improving both the consistency and accessibility of comparative research, to the benefit of the wider scientific community.
Collapse
|
24
|
Cook DL, Neal ML, Bookstein FL, Gennari JH. Ontology of physics for biology: representing physical dependencies as a basis for biological processes. J Biomed Semantics 2013; 4:41. [PMID: 24295137 PMCID: PMC3904761 DOI: 10.1186/2041-1480-4-41] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2013] [Accepted: 11/19/2013] [Indexed: 12/24/2022] Open
Abstract
BACKGROUND In prior work, we presented the Ontology of Physics for Biology (OPB) as a computational ontology for use in the annotation and representations of biophysical knowledge encoded in repositories of physics-based biosimulation models. We introduced OPB:Physical entity and OPB:Physical property classes that extend available spatiotemporal representations of physical entities and processes to explicitly represent the thermodynamics and dynamics of physiological processes. Our utilitarian, long-term aim is to develop computational tools for creating and querying formalized physiological knowledge for use by multiscale "physiome" projects such as the EU's Virtual Physiological Human (VPH) and NIH's Virtual Physiological Rat (VPR). RESULTS Here we describe the OPB:Physical dependency taxonomy of classes that represent of the laws of classical physics that are the "rules" by which physical properties of physical entities change during occurrences of physical processes. For example, the fluid analog of Ohm's law (as for electric currents) is used to describe how a blood flow rate depends on a blood pressure gradient. Hooke's law (as in elastic deformations of springs) is used to describe how an increase in vascular volume increases blood pressure. We classify such dependencies according to the flow, transformation, and storage of thermodynamic energy that occurs during processes governed by the dependencies. CONCLUSIONS We have developed the OPB and annotation methods to represent the meaning-the biophysical semantics-of the mathematical statements of physiological analysis and the biophysical content of models and datasets. Here we describe and discuss our approach to an ontological representation of physical laws (as dependencies) and properties as encoded for the mathematical analysis of biophysical processes.
Collapse
Affiliation(s)
- Daniel L Cook
- Department of Physiology & Biophysics, University of Washington, Seattle 98195, USA.
| | | | | | | |
Collapse
|
25
|
Köhler S, Doelken SC, Mungall CJ, Bauer S, Firth HV, Bailleul-Forestier I, Black GCM, Brown DL, Brudno M, Campbell J, FitzPatrick DR, Eppig JT, Jackson AP, Freson K, Girdea M, Helbig I, Hurst JA, Jähn J, Jackson LG, Kelly AM, Ledbetter DH, Mansour S, Martin CL, Moss C, Mumford A, Ouwehand WH, Park SM, Riggs ER, Scott RH, Sisodiya S, Van Vooren S, Wapner RJ, Wilkie AOM, Wright CF, Vulto-van Silfhout AT, de Leeuw N, de Vries BBA, Washingthon NL, Smith CL, Westerfield M, Schofield P, Ruef BJ, Gkoutos GV, Haendel M, Smedley D, Lewis SE, Robinson PN. The Human Phenotype Ontology project: linking molecular biology and disease through phenotype data. Nucleic Acids Res 2013; 42:D966-74. [PMID: 24217912 PMCID: PMC3965098 DOI: 10.1093/nar/gkt1026] [Citation(s) in RCA: 531] [Impact Index Per Article: 44.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
The Human Phenotype Ontology (HPO) project, available at http://www.human-phenotype-ontology.org, provides a structured, comprehensive and well-defined set of 10,088 classes (terms) describing human phenotypic abnormalities and 13,326 subclass relations between the HPO classes. In addition we have developed logical definitions for 46% of all HPO classes using terms from ontologies for anatomy, cell types, function, embryology, pathology and other domains. This allows interoperability with several resources, especially those containing phenotype information on model organisms such as mouse and zebrafish. Here we describe the updated HPO database, which provides annotations of 7,278 human hereditary syndromes listed in OMIM, Orphanet and DECIPHER to classes of the HPO. Various meta-attributes such as frequency, references and negations are associated with each annotation. Several large-scale projects worldwide utilize the HPO for describing phenotype information in their datasets. We have therefore generated equivalence mappings to other phenotype vocabularies such as LDDB, Orphanet, MedDRA, UMLS and phenoDB, allowing integration of existing datasets and interoperability with multiple biomedical resources. We have created various ways to access the HPO database content using flat files, a MySQL database, and Web-based tools. All data and documentation on the HPO project can be found online.
Collapse
Affiliation(s)
- Sebastian Köhler
- Institute for Medical Genetics and Human Genetics, Charité-Universitätsmedizin Berlin, Augustenburger Platz 1, 13353 Berlin, Germany, Berlin-Brandenburg Center for Regenerative Therapies, Charité-Universitätsmedizin Berlin, Augustenburger Platz 1, 13353 Berlin, Germany, Lawrence Berkeley National Laboratory, Mail Stop 84R0171, Berkeley, CA 94720, USA, The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SA, UK, Department of Medical Genetics, Cambridge University Addenbrooke's Hospital, Cambridge CB2 2QQ, UK, Université Paul Sabatier, Faculté de Chirurgie Dentaire, CHU Toulouse, France, Centre for Genomic Medicine, Central Manchester University Hospitals NHS Foundation Trust, Manchester Academic Health Sciences Centre (MAHSC), Manchester, UK, Centre for Genomic Medicine, Institute of Human Development, Faculty of Medical and Human Sciences, University of Manchester, MAHSC, Manchester M13 9WL, UK, Institute of Genetic Medicine. Newcastle University, Central Parkway, Newcastle upon Tyne, NE1 3BZ, UK, Department of Computer Science, University of Toronto, Ontario, Canada, Centre for Computational Medicine, Hospital for Sick Children, Toronto, Ontario, Canada, Department of Clinical Genetics, Leeds Teaching Hospitals NHS Trust, Leeds LS2 9NS, UK, MRC Human Genetics Unit, MRC Institute of Genetic and Molecular Medicine, University of Edinburgh, Edinburgh EH4 2XU, UK, The Jackson Laboratory, Bar Harbor, ME 04609, USA, Center for Molecular and Vascular Biology, University of Leuven, Belgium, Department of Neuropediatrics, University Medical Center Schleswig-Holstein, Kiel Campus, 24105 Kiel, Germany, NE Thames Genetics Service, Great Ormond Street Hospital, London WC1N 3JH, UK, Drexel University College of Medicine, Philadelphia, PA 19102, USA, Department of Haematology, University of Cambridge and NHS Blood and Transplant Cambridge, CB2 0PT Cambridge, UK, Autism and Developmental Medicine Institute, Geisinger Health System
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
26
|
Costa M, Reeve S, Grumbling G, Osumi-Sutherland D. The Drosophila anatomy ontology. J Biomed Semantics 2013; 4:32. [PMID: 24139062 PMCID: PMC4015547 DOI: 10.1186/2041-1480-4-32] [Citation(s) in RCA: 51] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2013] [Accepted: 10/11/2013] [Indexed: 11/28/2022] Open
Abstract
BACKGROUND Anatomy ontologies are query-able classifications of anatomical structures. They provide a widely-used means for standardising the annotation of phenotypes and expression in both human-readable and programmatically accessible forms. They are also frequently used to group annotations in biologically meaningful ways. Accurate annotation requires clear textual definitions for terms, ideally accompanied by images. Accurate grouping and fruitful programmatic usage requires high-quality formal definitions that can be used to automate classification and check for errors. The Drosophila anatomy ontology (DAO) consists of over 8000 classes with broad coverage of Drosophila anatomy. It has been used extensively for annotation by a range of resources, but until recently it was poorly formalised and had few textual definitions. RESULTS We have transformed the DAO into an ontology rich in formal and textual definitions in which the majority of classifications are automated and extensive error checking ensures quality. Here we present an overview of the content of the DAO, the patterns used in its formalisation, and the various uses it has been put to. CONCLUSIONS As a result of the work described here, the DAO provides a high-quality, queryable reference for the wild-type anatomy of Drosophila melanogaster and a set of terms to annotate data related to that anatomy. Extensive, well referenced textual definitions make it both a reliable and useful reference and ensure accurate use in annotation. Wide use of formal axioms allows a large proportion of classification to be automated and the use of consistency checking to eliminate errors. This increased formalisation has resulted in significant improvements to the completeness and accuracy of classification. The broad use of both formal and informal definitions make further development of the ontology sustainable and scalable. The patterns of formalisation used in the DAO are likely to be useful to developers of other anatomy ontologies.
Collapse
Affiliation(s)
- Marta Costa
- FlyBase, Department of Genetics, University of Cambridge, Downing Street, Cambridge, UK
| | - Simon Reeve
- FlyBase, Department of Genetics, University of Cambridge, Downing Street, Cambridge, UK
| | - Gary Grumbling
- FlyBase, Department of Biology, Indiana University, 1001 E 3rd Street, Bloomington, IN, 47405-7005, USA
| | | |
Collapse
|
27
|
Osumi-Sutherland D, Marygold SJ, Millburn GH, McQuilton PA, Ponting L, Stefancsik R, Falls K, Brown NH, Gkoutos GV. The Drosophila phenotype ontology. J Biomed Semantics 2013; 4:30. [PMID: 24138933 PMCID: PMC3816596 DOI: 10.1186/2041-1480-4-30] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2013] [Accepted: 10/11/2013] [Indexed: 01/22/2023] Open
Abstract
BACKGROUND Phenotype ontologies are queryable classifications of phenotypes. They provide a widely-used means for annotating phenotypes in a form that is human-readable, programatically accessible and that can be used to group annotations in biologically meaningful ways. Accurate manual annotation requires clear textual definitions for terms. Accurate grouping and fruitful programatic usage require high-quality formal definitions that can be used to automate classification. The Drosophila phenotype ontology (DPO) has been used to annotate over 159,000 phenotypes in FlyBase to date, but until recently lacked textual or formal definitions. RESULTS We have composed textual definitions for all DPO terms and formal definitions for 77% of them. Formal definitions reference terms from a range of widely-used ontologies including the Phenotype and Trait Ontology (PATO), the Gene Ontology (GO) and the Cell Ontology (CL). We also describe a generally applicable system, devised for the DPO, for recording and reasoning about the timing of death in populations. As a result of the new formalisations, 85% of classifications in the DPO are now inferred rather than asserted, with much of this classification leveraging the structure of the GO. This work has significantly improved the accuracy and completeness of classification and made further development of the DPO more sustainable. CONCLUSIONS The DPO provides a set of well-defined terms for annotating Drosophila phenotypes and for grouping and querying the resulting annotation sets in biologically meaningful ways. Such queries have already resulted in successful function predictions from phenotype annotation. Moreover, such formalisations make extended queries possible, including cross-species queries via the external ontologies used in formal definitions. The DPO is openly available under an open source license in both OBO and OWL formats. There is good potential for it to be used more broadly by the Drosophila community, which may ultimately result in its extension to cover a broader range of phenotypes.
Collapse
Affiliation(s)
| | - Steven J Marygold
- FlyBase, Department of Genetics, University of Cambridge, Downing Street, Cambridge, UK
| | - Gillian H Millburn
- FlyBase, Department of Genetics, University of Cambridge, Downing Street, Cambridge, UK
| | - Peter A McQuilton
- FlyBase, Department of Genetics, University of Cambridge, Downing Street, Cambridge, UK
| | - Laura Ponting
- FlyBase, Department of Genetics, University of Cambridge, Downing Street, Cambridge, UK
| | - Raymund Stefancsik
- FlyBase, Department of Genetics, University of Cambridge, Downing Street, Cambridge, UK
| | - Kathleen Falls
- The Biological Laboratories, Harvard University, 16 Divinity Avenue, Cambridge, MA, USA
| | - Nicholas H Brown
- FlyBase, Department of Genetics, University of Cambridge, Downing Street, Cambridge, UK
- Gurdon Institute & Department of Physiology, Development and Neuroscience, University of Cambridge, Tennis Court Road, Cambridge, UK
| | - Georgios V Gkoutos
- Department of Genetics, University of Cambridge, Downing Street, Cambridge, UK
| |
Collapse
|
28
|
Schofield PN, Sundberg JP, Sundberg BA, McKerlie C, Gkoutos GV. The mouse pathology ontology, MPATH; structure and applications. J Biomed Semantics 2013; 4:18. [PMID: 24033988 PMCID: PMC3851164 DOI: 10.1186/2041-1480-4-18] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2013] [Accepted: 08/19/2013] [Indexed: 12/18/2022] Open
Abstract
BACKGROUND The capture and use of disease-related anatomic pathology data for both model organism phenotyping and human clinical practice requires a relatively simple nomenclature and coding system that can be integrated into data collection platforms (such as computerized medical record-keeping systems) to enable the pathologist to rapidly screen and accurately record observations. The MPATH ontology was originally constructed in 2,000 by a committee of pathologists for the annotation of rodent histopathology images, but is now widely used for coding and analysis of disease and phenotype data for rodents, humans and zebrafish. CONSTRUCTION AND CONTENT MPATH is divided into two main branches describing pathological processes and structures based on traditional histopathological principles. It does not aim to include definitive diagnoses, which would generally be regarded as disease concepts. It contains 888 core pathology terms in an almost exclusively is_a hierarchy nine layers deep. Currently, 86% of the terms have textual definitions and contain relationships as well as logical axioms to other ontologies such the Gene Ontology. APPLICATION AND UTILITY MPATH was originally devised for the annotation of histopathological images from mice but is now being used much more widely in the recording of diagnostic and phenotypic data from both mice and humans, and in the construction of logical definitions for phenotype and disease ontologies. We discuss the use of MPATH to generate cross-products with qualifiers derived from a subset of the Phenotype and Trait Ontology (PATO) and its application to large-scale high-throughput phenotyping studies. MPATH provides a largely species-agnostic ontology for the descriptions of anatomic pathology, which can be applied to most amniotes and is now finding extensive use in species other than mice. It enables investigators to interrogate large datasets at a variety of depths, use semantic analysis to identify the relations between diseases in different species and integrate pathology data with other data types, such as pharmacogenomics.
Collapse
Affiliation(s)
- Paul N Schofield
- Department of Physiology, Development and Neuroscience, University of Cambridge, Downing Street, CB2 3EG, Cambridge, UK.
| | | | | | | | | |
Collapse
|
29
|
Pujar A, Menda N, Bombarely A, Edwards JD, Strickler SR, Mueller LA. From manual curation to visualization of gene families and networks across Solanaceae plant species. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2013; 2013:bat028. [PMID: 23681907 PMCID: PMC3655285 DOI: 10.1093/database/bat028] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
Abstract
High-quality manual annotation methods and practices need to be scaled to the increased rate of genomic data production. Curation based on gene families and gene networks is one approach that can significantly increase both curation efficiency and quality. The Sol Genomics Network (SGN; http://solgenomics.net) is a comparative genomics platform, with genetic, genomic and phenotypic information of the Solanaceae family and its closely related species that incorporates a community-based gene and phenotype curation system. In this article, we describe a manual curation system for gene families aimed at facilitating curation, querying and visualization of gene interaction patterns underlying complex biological processes, including an interface for efficiently capturing information from experiments with large data sets reported in the literature. Well-annotated multigene families are useful for further exploration of genome organization and gene evolution across species. As an example, we illustrate the system with the multigene transcription factor families, WRKY and Small Auxin Up-regulated RNA (SAUR), which both play important roles in responding to abiotic stresses in plants. Database URL:http://solgenomics.net/
Collapse
Affiliation(s)
- Anuradha Pujar
- Boyce Thompson Institute for Plant Research, 533, Tower Road, Ithaca, NY 14853, USA
| | | | | | | | | | | |
Collapse
|
30
|
Maynard SM, Mungall CJ, Lewis SE, Imam FT, Martone ME. A knowledge based approach to matching human neurodegenerative disease and animal models. Front Neuroinform 2013; 7:7. [PMID: 23717278 PMCID: PMC3653101 DOI: 10.3389/fninf.2013.00007] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2012] [Accepted: 04/09/2013] [Indexed: 12/19/2022] Open
Abstract
Neurodegenerative diseases present a wide and complex range of biological and clinical features. Animal models are key to translational research, yet typically only exhibit a subset of disease features rather than being precise replicas of the disease. Consequently, connecting animal to human conditions using direct data-mining strategies has proven challenging, particularly for diseases of the nervous system, with its complicated anatomy and physiology. To address this challenge we have explored the use of ontologies to create formal descriptions of structural phenotypes across scales that are machine processable and amenable to logical inference. As proof of concept, we built a Neurodegenerative Disease Phenotype Ontology (NDPO) and an associated Phenotype Knowledge Base (PKB) using an entity-quality model that incorporates descriptions for both human disease phenotypes and those of animal models. Entities are drawn from community ontologies made available through the Neuroscience Information Framework (NIF) and qualities are drawn from the Phenotype and Trait Ontology (PATO). We generated ~1200 structured phenotype statements describing structural alterations at the subcellular, cellular and gross anatomical levels observed in 11 human neurodegenerative conditions and associated animal models. PhenoSim, an open source tool for comparing phenotypes, was used to issue a series of competency questions to compare individual phenotypes among organisms and to determine which animal models recapitulate phenotypic aspects of the human disease in aggregate. Overall, the system was able to use relationships within the ontology to bridge phenotypes across scales, returning non-trivial matches based on common subsumers that were meaningful to a neuroscientist with an advanced knowledge of neuroanatomy. The system can be used both to compare individual phenotypes and also phenotypes in aggregate. This proof of concept suggests that expressing complex phenotypes using formal ontologies provides considerable benefit for comparing phenotypes across scales and species.
Collapse
Affiliation(s)
- Sarah M Maynard
- Department of Neurosciences, Center for Research in Biological Systems, University of California San Diego, San Diego, CA, USA
| | | | | | | | | |
Collapse
|
31
|
Abstract
Motivation: To provide consistent computable descriptions of phenotype data, PomBase is developing a formal ontology of phenotypes observed in fission yeast. Results: The fission yeast phenotype ontology (FYPO) is a modular ontology that uses several existing ontologies from the open biological and biomedical ontologies (OBO) collection as building blocks, including the phenotypic quality ontology PATO, the Gene Ontology and Chemical Entities of Biological Interest. Modular ontology development facilitates partially automated effective organization of detailed phenotype descriptions with complex relationships to each other and to underlying biological phenomena. As a result, FYPO supports sophisticated querying, computational analysis and comparison between different experiments and even between species. Availability: FYPO releases are available from the Subversion repository at the PomBase SourceForge project page (https://sourceforge.net/p/pombase/code/HEAD/tree/phenotype_ontology/). The current version of FYPO is also available on the OBO Foundry Web site (http://obofoundry.org/). Contact:mah79@cam.ac.uk or vw253@cam.ac.uk
Collapse
Affiliation(s)
- Midori A Harris
- Cambridge Systems Biology Centre and Department of Biochemistry, University of Cambridge, Sanger Building, 80 Tennis Court Road, Cambridge CB2 1GA, UK.
| | | | | | | | | |
Collapse
|
32
|
Groza T, Hunter J, Zankl A. Mining skeletal phenotype descriptions from scientific literature. PLoS One 2013; 8:e55656. [PMID: 23409017 PMCID: PMC3568099 DOI: 10.1371/journal.pone.0055656] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2012] [Accepted: 12/28/2012] [Indexed: 12/02/2022] Open
Abstract
Phenotype descriptions are important for our understanding of genetics, as they enable the computation and analysis of a varied range of issues related to the genetic and developmental bases of correlated characters. The literature contains a wealth of such phenotype descriptions, usually reported as free-text entries, similar to typical clinical summaries. In this paper, we focus on creating and making available an annotated corpus of skeletal phenotype descriptions. In addition, we present and evaluate a hybrid Machine Learning approach for mining phenotype descriptions from free text. Our hybrid approach uses an ensemble of four classifiers and experiments with several aggregation techniques. The best scoring technique achieves an F-1 score of 71.52%, which is close to the state-of-the-art in other domains, where training data exists in abundance. Finally, we discuss the influence of the features chosen for the model on the overall performance of the method.
Collapse
Affiliation(s)
- Tudor Groza
- School of ITEE, The University of Queensland, Australia.
| | | | | |
Collapse
|
33
|
Groza T, Hunter J, Zankl A. Decomposing phenotype descriptions for the human skeletal phenome. BIOMEDICAL INFORMATICS INSIGHTS 2013; 6:1-14. [PMID: 23440304 PMCID: PMC3572876 DOI: 10.4137/bii.s10729] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/17/2023]
Abstract
Over the course of the last few years there has been a significant amount of research performed on ontology-based formalization of phenotype descriptions. The intrinsic value and knowledge captured within such descriptions can only be expressed by taking advantage of their inner structure that implicitly combines qualities and anatomical entities. We present a meta-model (the Phenotype Fragment Ontology) and a processing pipeline that enable together the automatic decomposition and conceptualization of phenotype descriptions for the human skeletal phenome. We use this approach to showcase the usefulness of the generic concept of phenotype decomposition by performing an experimental study on all skeletal phenotype concepts defined in the Human Phenotype Ontology.
Collapse
Affiliation(s)
- Tudor Groza
- School of ITEE, The University of Queensland, Australia
| | | | | |
Collapse
|
34
|
Köhler S, Doelken SC, Ruef BJ, Bauer S, Washington N, Westerfield M, Gkoutos G, Schofield P, Smedley D, Lewis SE, Robinson PN, Mungall CJ. Construction and accessibility of a cross-species phenotype ontology along with gene annotations for biomedical research. F1000Res 2013; 2:30. [PMID: 24358873 DOI: 10.12688/f1000research.2-30.v1] [Citation(s) in RCA: 58] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 01/22/2013] [Indexed: 12/30/2022] Open
Abstract
Phenotype analyses, e.g. investigating metabolic processes, tissue formation, or organism behavior, are an important element of most biological and medical research activities. Biomedical researchers are making increased use of ontological standards and methods to capture the results of such analyses, with one focus being the comparison and analysis of phenotype information between species. We have generated a cross-species phenotype ontology for human, mouse and zebrafish that contains classes from the Human Phenotype Ontology, Mammalian Phenotype Ontology, and generated classes for zebrafish phenotypes. We also provide up-to-date annotation data connecting human genes to phenotype classes from the generated ontology. We have included the data generation pipeline into our continuous integration system ensuring stable and up-to-date releases. This article describes the data generation process and is intended to help interested researchers access both the phenotype annotation data and the associated cross-species phenotype ontology. The resource described here can be used in sophisticated semantic similarity and gene set enrichment analyses for phenotype data across species. The stable releases of this resource can be obtained from http://purl.obolibrary.org/obo/hp/uberpheno/.
Collapse
Affiliation(s)
- Sebastian Köhler
- Institute for Medical and Human Genetics, Charité-Universitatsmedizin Berlin, Berlin, 13353, Germany ; Berlin-Brandenberg Center for Regenerative Therapies (BCRT), Charité-Universitatsmedizin Berlin, Berlin, 13353, Germany
| | - Sandra C Doelken
- Institute for Medical and Human Genetics, Charité-Universitatsmedizin Berlin, Berlin, 13353, Germany
| | - Barbara J Ruef
- ZFIN, Institute of Neuroscience, University of Oregon, Eugene OR, 97403-5291, USA
| | - Sebastian Bauer
- Institute for Medical and Human Genetics, Charité-Universitatsmedizin Berlin, Berlin, 13353, Germany
| | | | - Monte Westerfield
- ZFIN, Institute of Neuroscience, University of Oregon, Eugene OR, 97403-5291, USA
| | - George Gkoutos
- Department of Computer Science, University of Aberystwyth, Aberystwyth, SY23 2AX, UK
| | - Paul Schofield
- Department of Genetics, University of Cambridge, Cambridge, CB2 3EH, UK
| | - Damian Smedley
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Cambridge, CB10 1SA, UK
| | - Suzanna E Lewis
- Lawrence Berkeley National Laboratory, Berkeley CA, 94720, USA
| | - Peter N Robinson
- Institute for Medical and Human Genetics, Charité-Universitatsmedizin Berlin, Berlin, 13353, Germany ; Berlin-Brandenberg Center for Regenerative Therapies (BCRT), Charité-Universitatsmedizin Berlin, Berlin, 13353, Germany ; Max Planck Institute for Molecular Genetics, Berlin, 14195, Germany
| | | |
Collapse
|
35
|
Köhler S, Doelken SC, Ruef BJ, Bauer S, Washington N, Westerfield M, Gkoutos G, Schofield P, Smedley D, Lewis SE, Robinson PN, Mungall CJ. Construction and accessibility of a cross-species phenotype ontology along with gene annotations for biomedical research. F1000Res 2013; 2:30. [PMID: 24358873 PMCID: PMC3799545 DOI: 10.12688/f1000research.2-30.v2] [Citation(s) in RCA: 55] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 01/20/2014] [Indexed: 12/11/2022] Open
Abstract
Phenotype analyses, e.g. investigating metabolic processes, tissue formation, or organism behavior, are an important element of most biological and medical research activities. Biomedical researchers are making increased use of ontological standards and methods to capture the results of such analyses, with one focus being the comparison and analysis of phenotype information between species. We have generated a cross-species phenotype ontology for human, mouse and zebrafish that contains classes from the Human Phenotype Ontology, Mammalian Phenotype Ontology, and generated classes for zebrafish phenotypes. We also provide up-to-date annotation data connecting human genes to phenotype classes from the generated ontology. We have included the data generation pipeline into our continuous integration system ensuring stable and up-to-date releases. This article describes the data generation process and is intended to help interested researchers access both the phenotype annotation data and the associated cross-species phenotype ontology. The resource described here can be used in sophisticated semantic similarity and gene set enrichment analyses for phenotype data across species. The stable releases of this resource can be obtained from
http://purl.obolibrary.org/obo/hp/uberpheno/.
Collapse
Affiliation(s)
- Sebastian Köhler
- Institute for Medical and Human Genetics, Charité-Universitatsmedizin Berlin, Berlin, 13353, Germany ; Berlin-Brandenberg Center for Regenerative Therapies (BCRT), Charité-Universitatsmedizin Berlin, Berlin, 13353, Germany
| | - Sandra C Doelken
- Institute for Medical and Human Genetics, Charité-Universitatsmedizin Berlin, Berlin, 13353, Germany
| | - Barbara J Ruef
- ZFIN, Institute of Neuroscience, University of Oregon, Eugene OR, 97403-5291, USA
| | - Sebastian Bauer
- Institute for Medical and Human Genetics, Charité-Universitatsmedizin Berlin, Berlin, 13353, Germany
| | | | - Monte Westerfield
- ZFIN, Institute of Neuroscience, University of Oregon, Eugene OR, 97403-5291, USA
| | - George Gkoutos
- Department of Computer Science, University of Aberystwyth, Aberystwyth, SY23 2AX, UK
| | - Paul Schofield
- Department of Genetics, University of Cambridge, Cambridge, CB2 3EH, UK
| | - Damian Smedley
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Cambridge, CB10 1SA, UK
| | - Suzanna E Lewis
- Lawrence Berkeley National Laboratory, Berkeley CA, 94720, USA
| | - Peter N Robinson
- Institute for Medical and Human Genetics, Charité-Universitatsmedizin Berlin, Berlin, 13353, Germany ; Berlin-Brandenberg Center for Regenerative Therapies (BCRT), Charité-Universitatsmedizin Berlin, Berlin, 13353, Germany ; Max Planck Institute for Molecular Genetics, Berlin, 14195, Germany
| | | |
Collapse
|
36
|
Beck T, Free RC, Thorisson GA, Brookes AJ. Semantically enabling a genome-wide association study database. J Biomed Semantics 2012; 3:9. [PMID: 23244533 PMCID: PMC3579732 DOI: 10.1186/2041-1480-3-9] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2012] [Accepted: 08/22/2012] [Indexed: 01/03/2023] Open
Abstract
Background The amount of data generated from genome-wide association studies (GWAS) has grown rapidly, but considerations for GWAS phenotype data reuse and interchange have not kept pace. This impacts on the work of GWAS Central – a free and open access resource for the advanced querying and comparison of summary-level genetic association data. The benefits of employing ontologies for standardising and structuring data are widely accepted. The complex spectrum of observed human phenotypes (and traits), and the requirement for cross-species phenotype comparisons, calls for reflection on the most appropriate solution for the organisation of human phenotype data. The Semantic Web provides standards for the possibility of further integration of GWAS data and the ability to contribute to the web of Linked Data. Results A pragmatic consideration when applying phenotype ontologies to GWAS data is the ability to retrieve all data, at the most granular level possible, from querying a single ontology graph. We found the Medical Subject Headings (MeSH) terminology suitable for describing all traits (diseases and medical signs and symptoms) at various levels of granularity and the Human Phenotype Ontology (HPO) most suitable for describing phenotypic abnormalities (medical signs and symptoms) at the most granular level. Diseases within MeSH are mapped to HPO to infer the phenotypic abnormalities associated with diseases. Building on the rich semantic phenotype annotation layer, we are able to make cross-species phenotype comparisons and publish a core subset of GWAS data as RDF nanopublications. Conclusions We present a methodology for applying phenotype annotations to a comprehensive genome-wide association dataset and for ensuring compatibility with the Semantic Web. The annotations are used to assist with cross-species genotype and phenotype comparisons. However, further processing and deconstructions of terms may be required to facilitate automatic phenotype comparisons. The provision of GWAS nanopublications enables a new dimension for exploring GWAS data, by way of intrinsic links to related data resources within the Linked Data web. The value of such annotation and integration will grow as more biomedical resources adopt the standards of the Semantic Web.
Collapse
Affiliation(s)
- Tim Beck
- Department of Genetics, University of Leicester, University Road, Leicester, UK.
| | | | | | | |
Collapse
|
37
|
Li D, Berardini TZ, Muller RJ, Huala E. Building an efficient curation workflow for the Arabidopsis literature corpus. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2012; 2012:bas047. [PMID: 23221298 PMCID: PMC3515862 DOI: 10.1093/database/bas047] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/05/2022]
Abstract
TAIR (The Arabidopsis Information Resource) is the model organism database (MOD) for Arabidopsis thaliana, a model plant with a literature corpus of about 39 000 articles in PubMed, with over 4300 new articles added in 2011. We have developed a literature curation workflow incorporating both automated and manual elements to cope with this flood of new research articles. The current workflow can be divided into two phases: article selection and curation. Structured controlled vocabularies, such as the Gene Ontology and Plant Ontology are used to capture free text information in the literature as succinct ontology-based annotations suitable for the application of computational analysis methods. We also describe our curation platform and the use of text mining tools in our workflow. Database URL:www.arabidopsis.org
Collapse
Affiliation(s)
- Donghui Li
- Department of Plant Biology, The Arabidopsis Information Resource, Carnegie Institution for Science, Stanford, CA 94305, USA
| | | | | | | |
Collapse
|
38
|
Groza T, Hunter J, Zankl A. Supervised segmentation of phenotype descriptions for the human skeletal phenome using hybrid methods. BMC Bioinformatics 2012; 13:265. [PMID: 23061930 PMCID: PMC3495645 DOI: 10.1186/1471-2105-13-265] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2012] [Accepted: 10/09/2012] [Indexed: 11/23/2022] Open
Abstract
Background Over the course of the last few years there has been a significant amount of research performed on ontology-based formalization of phenotype descriptions. In order to fully capture the intrinsic value and knowledge expressed within them, we need to take advantage of their inner structure, which implicitly combines qualities and anatomical entities. The first step in this process is the segmentation of the phenotype descriptions into their atomic elements. Results We present a two-phase hybrid segmentation method that combines a series individual classifiers using different aggregation schemes (set operations and simple majority voting). The approach is tested on a corpus comprised of skeletal phenotype descriptions emerged from the Human Phenotype Ontology. Experimental results show that the best hybrid method achieves an F-Score of 97.05% in the first phase and F-Scores of 97.16% / 94.50% in the second phase. Conclusions The performance of the initial segmentation of anatomical entities and qualities (phase I) is not affected by the presence / absence of external resources, such as domain dictionaries. From a generic perspective, hybrid methods may not always improve the segmentation accuracy as they are heavily dependent on the goal and data characteristics.
Collapse
Affiliation(s)
- Tudor Groza
- School of ITEE, The University of Queensland, Brisbane, Australia.
| | | | | |
Collapse
|
39
|
Bard J. A new ontology (structured hierarchy) of human developmental anatomy for the first 7 weeks (Carnegie stages 1-20). J Anat 2012; 221:406-16. [PMID: 22973865 DOI: 10.1111/j.1469-7580.2012.01566.x] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/15/2012] [Indexed: 12/01/2022] Open
Abstract
This paper describes a new ontology of human developmental anatomy covering the first 49 days [Carnegie stages (CS)1-20], primarily structured around the parts of organ systems and their development. The ontology includes more than 2000 anatomical entities (AEs) that range from the whole embryo, through organ systems and organ parts down to simple or leaf tissues (groups of cells with the same morphological phenotype), as well as features such as cavities. Each AE has assigned to it a set of facts of the form <AE><relationship><parent>, with the relationships including starts_at and ends_at (CSs), part_of (there can be several parents) and is_a (this gives the type of tissue, from an organ system down to one of ~ 80 simple tissues predominantly composed of a single cell kind, which is also specified). Leaf tissues also have a develops_from link to its parent tissue. The ontology includes ~14 000 such facts, which are mainly from the literature and an earlier ontology of human developmental anatomy (EHDAA, now withdrawn). The relationships enable these facts to be integrated into a single, complex hierarchy (or mathematical graph) that was made and can be viewed in the OBO-Edit browser (oboedit.org). Each AE has an EHDAA2 ID that may be useful in an informatics context, while the ontology as a whole can be used for organizing databases of human development. It is also a knowledge resource: a user can trace the lineage of any tissue back to the egg, study the changes in cell phenotype that occur as a tissue develops, and use the structure to add further (e.g. molecular) information. The ontology may be downloaded from www.obofoundry.org. Queries and corrections should be sent to j.bard@ed.ac.uk.
Collapse
Affiliation(s)
- Jonathan Bard
- Department of Physiology, Anatomy and Genetics, University of Oxford, Oxford, UK.
| |
Collapse
|
40
|
ThesauForm—Traits: A web based collaborative tool to develop a thesaurus for plant functional diversity research. ECOL INFORM 2012. [DOI: 10.1016/j.ecoinf.2012.04.004] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
|
41
|
Abstract
In medical contexts, the word "phenotype" is used to refer to some deviation from normal morphology, physiology, or behavior. The analysis of phenotype plays a key role in clinical practice and medical research, and yet phenotypic descriptions in clinical notes and medical publications are often imprecise. Deep phenotyping can be defined as the precise and comprehensive analysis of phenotypic abnormalities in which the individual components of the phenotype are observed and described. The emerging field of precision medicine aims to provide the best available care for each patient based on stratification into disease subclasses with a common biological basis of disease. The comprehensive discovery of such subclasses, as well as the translation of this knowledge into clinical care, will depend critically upon computational resources to capture, store, and exchange phenotypic data, and upon sophisticated algorithms to integrate it with genomic variation, omics profiles, and other clinical information. This special issue of Human Mutation offers a number of articles describing computational solutions for current challenges in deep phenotyping, including semantic and technical standards for phenotype and disease data, digital imaging for facial phenotype analysis, model organism phenotypes, and databases for correlating phenotypes with genomic variation.
Collapse
Affiliation(s)
- Peter N Robinson
- Institut für Medizinische Genetik und Humangenetik, Charité-Universitätsmedizin Berlin, Berlin, Germany.
| |
Collapse
|
42
|
Improving disease gene prioritization by comparing the semantic similarity of phenotypes in mice with those of human diseases. PLoS One 2012; 7:e38937. [PMID: 22719993 PMCID: PMC3375301 DOI: 10.1371/journal.pone.0038937] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2011] [Accepted: 05/16/2012] [Indexed: 12/14/2022] Open
Abstract
Despite considerable progress in understanding the molecular origins of hereditary human diseases, the molecular basis of several thousand genetic diseases still remains unknown. High-throughput phenotype studies are underway to systematically assess the phenotype outcome of targeted mutations in model organisms. Thus, comparing the similarity between experimentally identified phenotypes and the phenotypes associated with human diseases can be used to suggest causal genes underlying a disease. In this manuscript, we present a method for disease gene prioritization based on comparing phenotypes of mouse models with those of human diseases. For this purpose, either human disease phenotypes are “translated” into a mouse-based representation (using the Mammalian Phenotype Ontology), or mouse phenotypes are “translated” into a human-based representation (using the Human Phenotype Ontology). We apply a measure of semantic similarity and rank experimentally identified phenotypes in mice with respect to their phenotypic similarity to human diseases. Our method is evaluated on manually curated and experimentally verified gene–disease associations for human and for mouse. We evaluate our approach using a Receiver Operating Characteristic (ROC) analysis and obtain an area under the ROC curve of up to . Furthermore, we are able to confirm previous results that the Vax1 gene is involved in Septo-Optic Dysplasia and suggest Gdf6 and Marcks as further potential candidates. Our method significantly outperforms previous phenotype-based approaches of prioritizing gene–disease associations. To enable the adaption of our method to the analysis of other phenotype data, our software and prioritization results are freely available under a BSD licence at http://code.google.com/p/phenomeblast/wiki/CAMP. Furthermore, our method has been integrated in PhenomeNET and the results can be explored using the PhenomeBrowser at http://phenomebrowser.net.
Collapse
|
43
|
Wain KE, Riggs E, Hanson K, Savage M, Riethmaier D, Muirhead A, Mitchell E, Packard BS, Faucett WA. The laboratory-clinician team: a professional call to action to improve communication and collaboration for optimal patient care in chromosomal microarray testing. J Genet Couns 2012; 21:631-7. [PMID: 22610653 DOI: 10.1007/s10897-012-9507-9] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2011] [Accepted: 04/24/2012] [Indexed: 11/25/2022]
Abstract
The International Standards for Cytogenomic Arrays (ISCA) Consortium is a worldwide collaborative effort dedicated to optimizing patient care by improving the quality of chromosomal microarray testing. The primary effort of the ISCA Consortium has been the development of a database of copy number variants (CNVs) identified during the course of clinical microarray testing. This database is a powerful resource for clinicians, laboratories, and researchers, and can be utilized for a variety of applications, such as facilitating standardized interpretations of certain CNVs across laboratories or providing phenotypic information for counseling purposes when published data is sparse. A recognized limitation to the clinical utility of this database, however, is the quality of clinical information available for each patient. Clinical genetic counselors are uniquely suited to facilitate the communication of this information to the laboratory by virtue of their existing clinical responsibilities, case management skills, and appreciation of the evolving nature of scientific knowledge. We intend to highlight the critical role that genetic counselors play in ensuring optimal patient care through contributing to the clinical utility of the ISCA Consortium's database, as well as the quality of individual patient microarray reports provided by contributing laboratories. Current tools, paper and electronic forms, created to maximize this collaboration are shared. In addition to making a professional commitment to providing complete clinical information, genetic counselors are invited to become ISCA members and to become involved in the discussions and initiatives within the Consortium.
Collapse
|
44
|
Chen CK, Mungall CJ, Gkoutos GV, Doelken SC, Köhler S, Ruef BJ, Smith C, Westerfield M, Robinson PN, Lewis SE, Schofield PN, Smedley D. MouseFinder: Candidate disease genes from mouse phenotype data. Hum Mutat 2012; 33:858-66. [PMID: 22331800 PMCID: PMC3327758 DOI: 10.1002/humu.22051] [Citation(s) in RCA: 50] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2011] [Accepted: 01/20/2012] [Indexed: 12/23/2022]
Abstract
Mouse phenotype data represents a valuable resource for the identification of disease-associated genes, especially where the molecular basis is unknown and there is no clue to the candidate gene's function, pathway involvement or expression pattern. However, until recently these data have not been systematically used due to difficulties in mapping between clinical features observed in humans and mouse phenotype annotations. Here, we describe a semantic approach to solve this problem and demonstrate highly significant recall of known disease-gene associations and orthology relationships. A Web application (MouseFinder; www.mousemodels.org) has been developed to allow users to search the results of our whole-phenome comparison of human and mouse. We demonstrate its use in identifying ARTN as a strong candidate gene within the 1p34.1-p32 mapped locus for a hereditary form of ptosis.
Collapse
Affiliation(s)
- Chao-Kung Chen
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | | | - Georgios V Gkoutos
- Department of Genetics, University of Cambridge, Downing Street, Cambridge CB2 3EG, UK
| | - Sandra C Doelken
- Institute for Medical and Human Genetics, Charité-Universitätsmedizin Berlin, Augustenburger Platz 1, 13353 Berlin, Germany
- Max Planck Institute for Molecular Genetics, Ihnestr. 63 73, 14195 Berlin, Germany
| | - Sebastian Köhler
- Institute for Medical and Human Genetics, Charité-Universitätsmedizin Berlin, Augustenburger Platz 1, 13353 Berlin, Germany
- Berlin Center for Regenerative Therapies (BCRT), Charité-Universitätsmedizin Berlin, Augustenburger Platz 1, 13353 Berlin, Germany
| | | | - Cynthia Smith
- The Jackson Laboratory, 600, Main Street, Bar Harbor, ME 04609-1500, USA
| | | | - Peter N Robinson
- Institute for Medical and Human Genetics, Charité-Universitätsmedizin Berlin, Augustenburger Platz 1, 13353 Berlin, Germany
- Max Planck Institute for Molecular Genetics, Ihnestr. 63 73, 14195 Berlin, Germany
- Berlin Center for Regenerative Therapies (BCRT), Charité-Universitätsmedizin Berlin, Augustenburger Platz 1, 13353 Berlin, Germany
| | - Suzanna E Lewis
- Genomics Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Paul N Schofield
- The Jackson Laboratory, 600, Main Street, Bar Harbor, ME 04609-1500, USA
- Department of Physiology, Development and Neuroscience, University of Cambridge, Downing Street, Cambridge CB2 3EG, UK
| | - Damian Smedley
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| |
Collapse
|
45
|
Hoehndorf R, Harris MA, Herre H, Rustici G, Gkoutos GV. Semantic integration of physiology phenotypes with an application to the Cellular Phenotype Ontology. ACTA ACUST UNITED AC 2012; 28:1783-9. [PMID: 22539675 DOI: 10.1093/bioinformatics/bts250] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
MOTIVATION The systematic observation of phenotypes has become a crucial tool of functional genomics, and several large international projects are currently underway to identify and characterize the phenotypes that are associated with genotypes in several species. To integrate phenotype descriptions within and across species, phenotype ontologies have been developed. Applying ontologies to unify phenotype descriptions in the domain of physiology has been a particular challenge due to the high complexity of the underlying domain. RESULTS In this study, we present the outline of a theory and its implementation for an ontology of physiology-related phenotypes. We provide a formal description of process attributes and relate them to the attributes of their temporal parts and participants. We apply our theory to create the Cellular Phenotype Ontology (CPO). The CPO is an ontology of morphological and physiological phenotypic characteristics of cells, cell components and cellular processes. Its prime application is to provide terms and uniform definition patterns for the annotation of cellular phenotypes. The CPO can be used for the annotation of observed abnormalities in domains, such as systems microscopy, in which cellular abnormalities are observed and for which no phenotype ontology has been created. AVAILABILITY AND IMPLEMENTATION The CPO and the source code we generated to create the CPO are freely available on http://cell-phenotype.googlecode.com.
Collapse
Affiliation(s)
- Robert Hoehndorf
- Department of Genetics, University of Cambridge, Downing Street, Cambridge, Cambridge CB2 3EH, UK.
| | | | | | | | | |
Collapse
|
46
|
Groza T, Hunter J, Zankl A. The Bone Dysplasia Ontology: integrating genotype and phenotype information in the skeletal dysplasia domain. BMC Bioinformatics 2012; 13:50. [PMID: 22449239 PMCID: PMC3338382 DOI: 10.1186/1471-2105-13-50] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2011] [Accepted: 03/26/2012] [Indexed: 11/19/2022] Open
Abstract
BACKGROUND Skeletal dysplasias are a rare and heterogeneous group of genetic disorders affecting skeletal development. Patients with skeletal dysplasias suffer from many complex medical issues including degenerative joint disease and neurological complications. Because the data and expertise associated with this field is both sparse and disparate, significant benefits will potentially accrue from the availability of an ontology that provides a shared conceptualisation of the domain knowledge and enables data integration, cross-referencing and advanced reasoning across the relevant but distributed data sources. RESULTS We introduce the design considerations and implementation details of the Bone Dysplasia Ontology. We also describe the different components of the ontology, including a comprehensive and formal representation of the skeletal dysplasia domain as well as the related genotypes and phenotypes. We then briefly describe SKELETOME, a community-driven knowledge curation platform that is underpinned by the Bone Dysplasia Ontology. SKELETOME enables domain experts to use, refine and extend and apply the ontology without any prior ontology engineering experience--to advance the body of knowledge in the skeletal dysplasia field. CONCLUSIONS The Bone Dysplasia Ontology represents the most comprehensive structured knowledge source for the skeletal dysplasias domain. It provides the means for integrating and annotating clinical and research data, not only at the generic domain knowledge level, but also at the level of individual patient case studies. It enables links between individual cases and publicly available genotype and phenotype resources based on a community-driven curation process that ensures a shared conceptualisation of the domain knowledge and its continuous incremental evolution.
Collapse
Affiliation(s)
- Tudor Groza
- School of ITEE, The University of Queensland, St. Lucia, Australia
| | - Jane Hunter
- School of ITEE, The University of Queensland, St. Lucia, Australia
| | - Andreas Zankl
- Bone Dysplasia Research Group, UQ Centre for Clinical Research (UQCCR), The University of Queensland, Herston, Australia
- Genetic Health Queensland, Royal Brisbane and Women's Hospital, Herston, Australia
| |
Collapse
|
47
|
Riggs ER, Jackson L, Miller DT, Van Vooren S. Phenotypic information in genomic variant databases enhances clinical care and research: the International Standards for Cytogenomic Arrays Consortium experience. Hum Mutat 2012; 33:787-96. [PMID: 22331816 DOI: 10.1002/humu.22052] [Citation(s) in RCA: 45] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2011] [Accepted: 01/22/2012] [Indexed: 11/06/2022]
Abstract
Whole-genome analysis, now including whole-genome sequencing, is moving rapidly into the clinical setting, leading to detection of human variation on a broader scale than ever before. Interpreting this information will depend on the availability of thorough and accurate phenotype information, and the ability to curate, store, and access data on genotype-phenotype relationships. This idea has already been demonstrated within the context of chromosomal microarray (CMA) testing. The International Standards for Cytogenomic Arrays (ISCA) Consortium promotes standardization of variant interpretation for this technology through its initiatives, including the formation of a publicly available database housing clinical CMA data. Recognizing that phenotypic data are essential for the interpretation of genomic variants, the ISCA Consortium has developed tools to facilitate the collection of these data and its deposition in a standardized structured format within the ISCA Consortium database. This rich source of phenotypic data can also be used within broader applications such as developing phenotypic profiles of emerging genomic disorders, identification of candidate regions for particular phenotypes, or creation of tools for use in clinical practice. We summarize the ISCA experience as a model for ongoing efforts incorporating phenotype data with genotype data to improve the quality of research and clinical care in human genetics.
Collapse
Affiliation(s)
- Erin Rooney Riggs
- Department of Human Genetics, Emory University School of Medicine, Atlanta, Georgia, USA.
| | | | | | | |
Collapse
|
48
|
Hoehndorf R, Dumontier M, Gennari JH, Wimalaratne S, de Bono B, Cook DL, Gkoutos GV. Integrating systems biology models and biomedical ontologies. BMC SYSTEMS BIOLOGY 2011; 5:124. [PMID: 21835028 PMCID: PMC3170340 DOI: 10.1186/1752-0509-5-124] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/06/2011] [Accepted: 08/11/2011] [Indexed: 01/30/2023]
Abstract
BACKGROUND Systems biology is an approach to biology that emphasizes the structure and dynamic behavior of biological systems and the interactions that occur within them. To succeed, systems biology crucially depends on the accessibility and integration of data across domains and levels of granularity. Biomedical ontologies were developed to facilitate such an integration of data and are often used to annotate biosimulation models in systems biology. RESULTS We provide a framework to integrate representations of in silico systems biology with those of in vivo biology as described by biomedical ontologies and demonstrate this framework using the Systems Biology Markup Language. We developed the SBML Harvester software that automatically converts annotated SBML models into OWL and we apply our software to those biosimulation models that are contained in the BioModels Database. We utilize the resulting knowledge base for complex biological queries that can bridge levels of granularity, verify models based on the biological phenomenon they represent and provide a means to establish a basic qualitative layer on which to express the semantics of biosimulation models. CONCLUSIONS We establish an information flow between biomedical ontologies and biosimulation models and we demonstrate that the integration of annotated biosimulation models and biomedical ontologies enables the verification of models as well as expressive queries. Establishing a bi-directional information flow between systems biology and biomedical ontologies has the potential to enable large-scale analyses of biological systems that span levels of granularity from molecules to organisms.
Collapse
Affiliation(s)
- Robert Hoehndorf
- Department of Genetics, University of Cambridge, Downing Street, Cambridge, CB2 3EH, UK
| | - Michel Dumontier
- Department of Biology, Carleton University, 1125 Colonel By Drive, Ottawa, K1S 5B6, Canada
- School of Computer Science, Carleton University, 1125 Colonel By Drive, Ottawa, K1S 5B6, Canada
| | - John H Gennari
- Biomedical & Health Informatics, Department of Medical Education and Biomedical Informatics, University of Washington, 1959 NE Pacific Street, Box 357420, Seattle, Washington 98195, USA
| | - Sarala Wimalaratne
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Bernard de Bono
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Daniel L Cook
- Department of Physiology & Biophysics, University of Washington, 1705 NE Pacific Street, Box 357290, Seattle, Washington 98195, USA
- Department of Biological Structure, University of Washington, 1959 NE Pacific Street, Box 357420, Seattle, Washington 98195, USA
| | - Georgios V Gkoutos
- Department of Genetics, University of Cambridge, Downing Street, Cambridge, CB2 3EH, UK
| |
Collapse
|
49
|
Hoehndorf R, Dumontier M, Oellrich A, Rebholz-Schuhmann D, Schofield PN, Gkoutos GV. Interoperability between biomedical ontologies through relation expansion, upper-level ontologies and automatic reasoning. PLoS One 2011; 6:e22006. [PMID: 21789201 PMCID: PMC3138764 DOI: 10.1371/journal.pone.0022006] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2011] [Accepted: 06/12/2011] [Indexed: 11/18/2022] Open
Abstract
Researchers design ontologies as a means to accurately annotate and integrate experimental data across heterogeneous and disparate data- and knowledge bases. Formal ontologies make the semantics of terms and relations explicit such that automated reasoning can be used to verify the consistency of knowledge. However, many biomedical ontologies do not sufficiently formalize the semantics of their relations and are therefore limited with respect to automated reasoning for large scale data integration and knowledge discovery. We describe a method to improve automated reasoning over biomedical ontologies and identify several thousand contradictory class definitions. Our approach aligns terms in biomedical ontologies with foundational classes in a top-level ontology and formalizes composite relations as class expressions. We describe the semi-automated repair of contradictions and demonstrate expressive queries over interoperable ontologies. Our work forms an important cornerstone for data integration, automatic inference and knowledge discovery based on formal representations of knowledge. Our results and analysis software are available at http://bioonto.de/pmwiki.php/Main/ReasonableOntologies.
Collapse
Affiliation(s)
- Robert Hoehndorf
- Department of Genetics, University of Cambridge, Cambridge, United Kingdom
| | - Michel Dumontier
- Department of Biology, Institute of Biochemistry and School of Computer Science, Carleton University, Ottawa, Ontario, Canada
| | - Anika Oellrich
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom
| | | | - Paul N. Schofield
- Department of Physiology, Development and Neuroscience, University of Cambridge, Cambridge, United Kingdom
- The Jackson Laboratory, Bar Harbor, Maine, United States of America
| | | |
Collapse
|
50
|
Meehan TF, Masci AM, Abdulla A, Cowell LG, Blake JA, Mungall CJ, Diehl AD. Logical development of the cell ontology. BMC Bioinformatics 2011; 12:6. [PMID: 21208450 PMCID: PMC3024222 DOI: 10.1186/1471-2105-12-6] [Citation(s) in RCA: 98] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2010] [Accepted: 01/05/2011] [Indexed: 12/03/2022] Open
Abstract
Background The Cell Ontology (CL) is an ontology for the representation of in vivo cell types. As biological ontologies such as the CL grow in complexity, they become increasingly difficult to use and maintain. By making the information in the ontology computable, we can use automated reasoners to detect errors and assist with classification. Here we report on the generation of computable definitions for the hematopoietic cell types in the CL. Results Computable definitions for over 340 CL classes have been created using a genus-differentia approach. These define cell types according to multiple axes of classification such as the protein complexes found on the surface of a cell type, the biological processes participated in by a cell type, or the phenotypic characteristics associated with a cell type. We employed automated reasoners to verify the ontology and to reveal mistakes in manual curation. The implementation of this process exposed areas in the ontology where new cell type classes were needed to accommodate species-specific expression of cellular markers. Our use of reasoners also inferred new relationships within the CL, and between the CL and the contributing ontologies. This restructured ontology can be used to identify immune cells by flow cytometry, supports sophisticated biological queries involving cells, and helps generate new hypotheses about cell function based on similarities to other cell types. Conclusion Use of computable definitions enhances the development of the CL and supports the interoperability of OBO ontologies.
Collapse
Affiliation(s)
- Terrence F Meehan
- Mouse Genome Informatics, The Jackson Laboratory, Bar Harbor, ME, USA.
| | | | | | | | | | | | | |
Collapse
|