1
|
Ruberte J, Schofield PN, Sundberg JP, Rodriguez-Baeza A, Carretero A, McKerlie C. Bridging mouse and human anatomies; a knowledge-based approach to comparative anatomy for disease model phenotyping. Mamm Genome 2023:10.1007/s00335-023-10005-4. [PMID: 37421464 PMCID: PMC10382392 DOI: 10.1007/s00335-023-10005-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2023] [Accepted: 06/13/2023] [Indexed: 07/10/2023]
Abstract
The laboratory mouse is the foremost mammalian model used for studying human diseases and is closely anatomically related to humans. Whilst knowledge about human anatomy has been collected throughout the history of mankind, the first comprehensive study of the mouse anatomy was published less than 60 years ago. This has been followed by the more recent publication of several books and resources on mouse anatomy. Nevertheless, to date, our understanding and knowledge of mouse anatomy is far from being at the same level as that of humans. In addition, the alignment between current mouse and human anatomy nomenclatures is far from being as developed as those existing between other species, such as domestic animals and humans. To close this gap, more in depth mouse anatomical research is needed and it will be necessary to extent and refine the current vocabulary of mouse anatomical terms.
Collapse
Affiliation(s)
- Jesús Ruberte
- Center for Animal Biotechnology and Gene Therapy, Universitat Autònoma de Barcelona, Barcelona, Spain.
- Department of Animal Health and Anatomy, Universitat Autònoma de Barcelona, Barcelona, Spain.
| | - Paul N Schofield
- The Jackson Laboratory, Bar Harbor, ME, USA
- Department of Physiology, Development and Neuroscience, University of Cambridge, Cambridge, UK
| | - John P Sundberg
- The Jackson Laboratory, Bar Harbor, ME, USA
- Department of Dermatology, Vanderbilt University Medical Center, Nashville, TN, USA
| | | | - Ana Carretero
- Center for Animal Biotechnology and Gene Therapy, Universitat Autònoma de Barcelona, Barcelona, Spain
- Department of Animal Health and Anatomy, Universitat Autònoma de Barcelona, Barcelona, Spain
| | - Colin McKerlie
- The Hospital for Sick Children, Toronto, Canada
- Department of Lab Medicine and Pathobiology, Faculty of Medicine, University of Toronto, Toronto, Canada
| |
Collapse
|
2
|
Garrett L, Da Silva-Buttkus P, Rathkolb B, Gerlini R, Becker L, Sanz-Moreno A, Seisenberger C, Zimprich A, Aguilar-Pimentel A, Amarie OV, Cho YL, Kraiger M, Spielmann N, Calzada-Wack J, Marschall S, Busch D, Schmitt-Weber C, Wolf E, Wurst W, Fuchs H, Gailus-Durner V, Hölter SM, de Angelis MH. Post-synaptic scaffold protein TANC2 in psychiatric and somatic disease risk. Dis Model Mech 2021; 15:273891. [PMID: 34964047 PMCID: PMC8906171 DOI: 10.1242/dmm.049205] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2021] [Accepted: 12/17/2021] [Indexed: 11/20/2022] Open
Abstract
Understanding the shared genetic aetiology of psychiatric and medical comorbidity in neurodevelopmental disorders (NDDs) could improve patient diagnosis, stratification and treatment options. Rare tetratricopeptide repeat, ankyrin repeat and coiled-coil containing 2 (TANC2)-disrupting variants were disease causing in NDD patients. The post-synaptic scaffold protein TANC2 is essential for dendrite formation in synaptic plasticity and plays an unclarified but critical role in development. We here report a novel homozygous-viable Tanc2-disrupted function model in which mutant mice were hyperactive and had impaired sensorimotor gating consistent with NDD patient psychiatric endophenotypes. Yet, a multi-systemic analysis revealed the pleiotropic effects of Tanc2 outside the brain, such as growth failure and hepatocellular damage. This was associated with aberrant liver function including altered hepatocellular metabolism. Integrative analysis indicates that these disrupted Tanc2 systemic effects relate to interaction with Hippo developmental signalling pathway proteins and will increase the risk for comorbid somatic disease. This highlights how NDD gene pleiotropy can augment medical comorbidity susceptibility, underscoring the benefit of holistic NDD patient diagnosis and treatment for which large-scale preclinical functional genomics can provide complementary pleiotropic gene function information. Summary: Disruption of mouse Tanc2 causes brain and liver abnormality, increasing psychiatric and somatic disease risk long term, highlighting the benefit of holistic diagnosis and treatment approaches for human neurodevelopmental disorder.
Collapse
Affiliation(s)
- Lillian Garrett
- Institute of Experimental Genetics and German Mouse Clinic, Helmholtz Zentrum München, German Research Center for Environmental Health, Neuherberg, Germany.,Institute of Developmental Genetics, Helmholtz Zentrum München, German Research Center for Environmental Health, Neuherberg, Germany
| | - Patricia Da Silva-Buttkus
- Institute of Experimental Genetics and German Mouse Clinic, Helmholtz Zentrum München, German Research Center for Environmental Health, Neuherberg, Germany
| | - Birgit Rathkolb
- Institute of Experimental Genetics and German Mouse Clinic, Helmholtz Zentrum München, German Research Center for Environmental Health, Neuherberg, Germany.,German Center for Diabetes Research (DZD), Ingolstädter Landstr. 1, 85764 Neuherberg, Germany.,Institute of Molecular Animal Breeding and Biotechnology, Gene Center, Ludwig-Maximilians University Munich, Munich, Germany
| | - Raffaele Gerlini
- Institute of Experimental Genetics and German Mouse Clinic, Helmholtz Zentrum München, German Research Center for Environmental Health, Neuherberg, Germany.,German Center for Diabetes Research (DZD), Ingolstädter Landstr. 1, 85764 Neuherberg, Germany
| | - Lore Becker
- Institute of Experimental Genetics and German Mouse Clinic, Helmholtz Zentrum München, German Research Center for Environmental Health, Neuherberg, Germany
| | - Adrian Sanz-Moreno
- Institute of Experimental Genetics and German Mouse Clinic, Helmholtz Zentrum München, German Research Center for Environmental Health, Neuherberg, Germany
| | - Claudia Seisenberger
- Institute of Developmental Genetics, Helmholtz Zentrum München, German Research Center for Environmental Health, Neuherberg, Germany
| | - Annemarie Zimprich
- Institute of Experimental Genetics and German Mouse Clinic, Helmholtz Zentrum München, German Research Center for Environmental Health, Neuherberg, Germany.,Institute of Developmental Genetics, Helmholtz Zentrum München, German Research Center for Environmental Health, Neuherberg, Germany.,Technische Universität München, Freising-Weihenstephan, Germany
| | - Antonio Aguilar-Pimentel
- Institute of Experimental Genetics and German Mouse Clinic, Helmholtz Zentrum München, German Research Center for Environmental Health, Neuherberg, Germany
| | - Oana V Amarie
- Institute of Experimental Genetics and German Mouse Clinic, Helmholtz Zentrum München, German Research Center for Environmental Health, Neuherberg, Germany
| | - Yi-Li Cho
- Institute of Experimental Genetics and German Mouse Clinic, Helmholtz Zentrum München, German Research Center for Environmental Health, Neuherberg, Germany
| | - Markus Kraiger
- Institute of Experimental Genetics and German Mouse Clinic, Helmholtz Zentrum München, German Research Center for Environmental Health, Neuherberg, Germany
| | - Nadine Spielmann
- Institute of Experimental Genetics and German Mouse Clinic, Helmholtz Zentrum München, German Research Center for Environmental Health, Neuherberg, Germany
| | - Julia Calzada-Wack
- Institute of Experimental Genetics and German Mouse Clinic, Helmholtz Zentrum München, German Research Center for Environmental Health, Neuherberg, Germany
| | - Susan Marschall
- Institute of Experimental Genetics and German Mouse Clinic, Helmholtz Zentrum München, German Research Center for Environmental Health, Neuherberg, Germany
| | - Dirk Busch
- Institute for Medical Microbiology, Immunology and Hygiene, Technische Universität München, Trogerstrasse 30, 81675 Munich, Germany
| | - Carsten Schmitt-Weber
- Center of Allergy & Environment (ZAUM), Technische Universität München, and Helmholtz Zentrum München, 85764 Neuherberg, Germany
| | - Eckhard Wolf
- Institute of Molecular Animal Breeding and Biotechnology, Gene Center, Ludwig-Maximilians University Munich, Munich, Germany
| | - Wolfgang Wurst
- Institute of Developmental Genetics, Helmholtz Zentrum München, German Research Center for Environmental Health, Neuherberg, Germany.,Chair of Developmental Genetics, TUM School of Life Sciences, Technische Universität München, Freising-Weihenstephan, Germany.,Deutsches Institut für Neurodegenerative Erkrankungen (DZNE) Site Munich, Feodor-Lynen-Str. 17, 81377 Munich, Germany.,Munich Cluster for Systems Neurology (SyNergy), Adolf-Butenandt-Institut, Ludwig-Maximilians-Universität München, Feodor-Lynen-Str. 17, 81377 Munich, Germany
| | - Helmut Fuchs
- Institute of Experimental Genetics and German Mouse Clinic, Helmholtz Zentrum München, German Research Center for Environmental Health, Neuherberg, Germany
| | - Valerie Gailus-Durner
- Institute of Experimental Genetics and German Mouse Clinic, Helmholtz Zentrum München, German Research Center for Environmental Health, Neuherberg, Germany
| | - Sabine M Hölter
- Institute of Experimental Genetics and German Mouse Clinic, Helmholtz Zentrum München, German Research Center for Environmental Health, Neuherberg, Germany.,Institute of Developmental Genetics, Helmholtz Zentrum München, German Research Center for Environmental Health, Neuherberg, Germany.,Technische Universität München, Freising-Weihenstephan, Germany
| | - Martin Hrabě de Angelis
- Institute of Experimental Genetics and German Mouse Clinic, Helmholtz Zentrum München, German Research Center for Environmental Health, Neuherberg, Germany.,Chair of Experimental Genetics, TUM School of Life Sciences, Technische Universität München, Alte Akademie 8, 85354 Freising, Germany.,German Center for Diabetes Research (DZD), Ingolstädter Landstr. 1, 85764 Neuherberg, Germany
| |
Collapse
|
3
|
Kafkas Ş, Althubaiti S, Gkoutos GV, Hoehndorf R, Schofield PN. Linking common human diseases to their phenotypes; development of a resource for human phenomics. J Biomed Semantics 2021; 12:17. [PMID: 34425897 PMCID: PMC8383460 DOI: 10.1186/s13326-021-00249-x] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2021] [Accepted: 07/30/2021] [Indexed: 11/11/2022] Open
Abstract
Background In recent years a large volume of clinical genomics data has become available due to rapid advances in sequencing technologies. Efficient exploitation of this genomics data requires linkage to patient phenotype profiles. Current resources providing disease-phenotype associations are not comprehensive, and they often do not have broad coverage of the disease terminologies, particularly ICD-10, which is still the primary terminology used in clinical settings. Methods We developed two approaches to gather disease-phenotype associations. First, we used a text mining method that utilizes semantic relations in phenotype ontologies, and applies statistical methods to extract associations between diseases in ICD-10 and phenotype ontology classes from the literature. Second, we developed a semi-automatic way to collect ICD-10–phenotype associations from existing resources containing known relationships. Results We generated four datasets. Two of them are independent datasets linking diseases to their phenotypes based on text mining and semi-automatic strategies. The remaining two datasets are generated from these datasets and cover a subset of ICD-10 classes of common diseases contained in UK Biobank. We extensively validated our text mined and semi-automatically curated datasets by: comparing them against an expert-curated validation dataset containing disease–phenotype associations, measuring their similarity to disease–phenotype associations found in public databases, and assessing how well they could be used to recover gene–disease associations using phenotype similarity. Conclusion We find that our text mining method can produce phenotype annotations of diseases that are correct but often too general to have significant information content, or too specific to accurately reflect the typical manifestations of the sporadic disease. On the other hand, the datasets generated from integrating multiple knowledgebases are more complete (i.e., cover more of the required phenotype annotations for a given disease). We make all data freely available at 10.5281/zenodo.4726713. Supplementary Information The online version contains supplementary material available at (10.1186/s13326-021-00249-x).
Collapse
Affiliation(s)
- Şenay Kafkas
- Computational Bioscience Research Center (CBRC), Computer, Electrical, and Mathematical Sciences & Engineering Division, King Abdullah University of Science and Technology, 4700 KAUST, Thuwal, 23955, Saudi Arabia
| | - Sara Althubaiti
- Computational Bioscience Research Center (CBRC), Computer, Electrical, and Mathematical Sciences & Engineering Division, King Abdullah University of Science and Technology, 4700 KAUST, Thuwal, 23955, Saudi Arabia
| | - Georgios V Gkoutos
- Health Data Research UK, Midlands site, Edgbaston, Birmingham, B15 2TT, United Kingdom.,Institute of Cancer and Genomic Sciences, University of Birmingham, Edgbaston, Birmingham, B15 2TT, United Kingdom
| | - Robert Hoehndorf
- Computational Bioscience Research Center (CBRC), Computer, Electrical, and Mathematical Sciences & Engineering Division, King Abdullah University of Science and Technology, 4700 KAUST, Thuwal, 23955, Saudi Arabia.
| | - Paul N Schofield
- Department of Physiology, Development & Neuroscience, University of Cambridge, Downing Street, Cambridge, CB2 3EG, United Kingdom
| |
Collapse
|
4
|
Gkoutos GV, Schofield PN, Hoehndorf R. The anatomy of phenotype ontologies: principles, properties and applications. Brief Bioinform 2018; 19:1008-1021. [PMID: 28387809 PMCID: PMC6169674 DOI: 10.1093/bib/bbx035] [Citation(s) in RCA: 48] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/01/2017] [Revised: 02/05/2017] [Indexed: 12/14/2022] Open
Abstract
The past decade has seen an explosion in the collection of genotype data in domains as diverse as medicine, ecology, livestock and plant breeding. Along with this comes the challenge of dealing with the related phenotype data, which is not only large but also highly multidimensional. Computational analysis of phenotypes has therefore become critical for our ability to understand the biological meaning of genomic data in the biological sciences. At the heart of computational phenotype analysis are the phenotype ontologies. A large number of these ontologies have been developed across many domains, and we are now at a point where the knowledge captured in the structure of these ontologies can be used for the integration and analysis of large interrelated data sets. The Phenotype And Trait Ontology framework provides a method for formal definitions of phenotypes and associated data sets and has proved to be key to our ability to develop methods for the integration and analysis of phenotype data. Here, we describe the development and products of the ontological approach to phenotype capture, the formal content of phenotype ontologies and how their content can be used computationally.
Collapse
Affiliation(s)
| | | | - Robert Hoehndorf
- Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology, King Abdullah University of Science and Technology, Thuwal
| |
Collapse
|
5
|
Sasaki S, Watanabe T, Nishimura S, Sugimoto Y. Genome-wide identification of copy number variation using high-density single-nucleotide polymorphism array in Japanese Black cattle. BMC Genet 2016; 17:26. [PMID: 26809925 PMCID: PMC4727303 DOI: 10.1186/s12863-016-0335-z] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2015] [Accepted: 01/14/2016] [Indexed: 12/12/2022] Open
Abstract
Background Copy number variation (CNV) is an important source of genetic variability associated with phenotypic variation and disease susceptibility. Comprehensive genome-wide CNV maps provide valuable information for genetic and functional studies. To identify CNV in Japanese Black cattle, we performed a genome-wide autosomal screen using genomic data from 1,481 animals analyzed with the Illumina Bovine High-Density (HD) BeadChip Array (735,293 single-nucleotide polymorphisms (SNPs) with an average marker interval of 3.4 kb on the autosomes). Results We identified a total of 861 CNV regions (CNVRs) across all autosomes, which covered 43.65 Mb of the UMD3.1 genome assembly and corresponded to 1.74 % of the 29 bovine autosomes. Overall, 35 % of the CNVRs were present at a frequency of > 1 % in 1,481 animals. The estimated lengths of CNVRs ranged from 1.1 kb to 1.4 Mb, with an average of 50.7 kb. The average number of CNVR events per animal was 35. Comparisons with previously reported cattle CNV showed that 72 % of the CNVR calls detected in this study were within or overlapped with known CNVRs. Experimentally, three CNVRs were validated using quantitative PCR, and one CNVR was validated using PCR with flanking primers for the deleted region. Out of the 861 CNVRs, 390 contained 717 Ensembl-annotated genes significantly enriched for stimulus response, cellular defense response, and immune response in the Gene Ontology (GO) database. To associate genes contained in CNVRs with phenotypes, we converted 560 bovine Ensembl gene IDs to their 438 orthologous associated mouse gene IDs, and 195 of these mouse orthologous genes were categorized into 1,627 phenotypes in the Mouse Genome Informatics (MGI) database. Conclusions We identified 861 CNVRs in 1,481 Japanese Black cattle using the Illumina BovineHD BeadChip Array. The genes contained in CNVRs were characterized using GO analysis and the mouse orthologous genes were characterized using the MGI database. The comprehensive genome-wide CNVRs map will facilitate identification of genetic variation and disease-susceptibility alleles in Japanese Black cattle. Electronic supplementary material The online version of this article (doi:10.1186/s12863-016-0335-z) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Shinji Sasaki
- National Livestock Breeding Center, Odakura, Nishigo, Fukushima, 961-8511, Japan.
| | - Toshio Watanabe
- National Livestock Breeding Center, Odakura, Nishigo, Fukushima, 961-8511, Japan.
| | - Shota Nishimura
- Shirakawa Institute of Animal Genetics, Japan Livestock Technology Association, Odakura, Nishigo, Fukushima, 961-8061, Japan.
| | - Yoshikazu Sugimoto
- Shirakawa Institute of Animal Genetics, Japan Livestock Technology Association, Odakura, Nishigo, Fukushima, 961-8061, Japan.
| |
Collapse
|
6
|
Berndt A, Ackert-Bicknell C, Silva KA, Kennedy VE, Sundberg BA, Cates JM, Schofield PN, Sundberg JP. Genetic determinants of fibro-osseous lesions in aged inbred mice. Exp Mol Pathol 2015; 100:92-100. [PMID: 26589134 DOI: 10.1016/j.yexmp.2015.11.018] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2015] [Accepted: 11/12/2015] [Indexed: 12/12/2022]
Abstract
Fibro-osseous lesions in mice are progressive aging changes in which the bone marrow is replaced to various degrees by fibrovascular stroma and bony trabeculae in a wide variety of bones. The frequency and severity varied greatly among 28 different inbred mouse stains, predominantly affecting females, ranging from 0% for 10 strains to 100% for KK/HlJ and NZW/LacJ female mice. Few lesions were observed in male mice and for 23 of the strains, no lesions were observed in males for any of the cohorts. There were no significant correlations between strain-specific severities of fibro-osseous lesions and ovarian (r=0.11; P=0.57) or endometrial (r=0.03; P=0.89) cyst formation frequency or abnormalities in parathyroid glands. Frequency of fibro-osseous lesions was most strongly associated (P<10(-6)) with genome variations on chromosome (Chr) 8 at 90.6 and 90.8Mb (rs33108071, rs33500669; P=5.0·10(-10), 1.3·10(-6)), Chr 15 at 23.6 and 23.8Mb (rs32087871, rs45770368; P=7.3·10(-7), 2.7·10(-6)), and Chr 19 at 33.2, 33.4, and 33.6Mb (rs311004232, rs30524929, rs30448815; P=2.8·10(-6), 2.8·10(-6), 2.8·10(-6)) in genome-wide association studies (GWAS). The relatively large number of candidate genes identified in the GWAS analyses suggests that this may be an extremely complex polygenic disease. These results indicate that fibro-osseous lesions are surprisingly common in many inbred strains of laboratory mice as they age. While this presents little problem in most studies that utilize young animals, it may complicate aging studies, particularly those focused on bone.
Collapse
Affiliation(s)
- Annerose Berndt
- Department of Medicine, University of Pittsburgh, Pittsburgh, PA, United States.
| | | | | | | | | | - Justin M Cates
- Department of Pathology, Microbiology and Immunology, Vanderbilt University School of Medicine, Nashville, TN, United States.
| | - Paul N Schofield
- The Jackson Laboratory, Bar Harbor, ME, United States; Department of Physiology, Development and Neuroscience, University of Cambridge, Cambridge, United Kingdom.
| | | |
Collapse
|
7
|
Schriml LM, Mitraka E. The Disease Ontology: fostering interoperability between biological and clinical human disease-related data. Mamm Genome 2015; 26:584-9. [PMID: 26093607 PMCID: PMC4602048 DOI: 10.1007/s00335-015-9576-9] [Citation(s) in RCA: 50] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2015] [Accepted: 06/08/2015] [Indexed: 12/15/2022]
Abstract
The Disease Ontology (DO) enables cross-domain data integration through a common standard of human disease terms and their etiological descriptions. Standardized disease descriptors that are integrated across mammalian genomic resources provide a human-readable, machine-interpretable, community-driven disease corpus that unifies the representation of human common and rare diseases. The DO is populated by consensus-driven disease data descriptors that incorporate disease terms utilized by genomic and genetic projects and resources engaged in studies to understand the genetics of human disease through the study of model organisms. The DO project serves multiple roles for the model organism community by providing: (1) a structured "backbone" of disease concepts represented among the model organism databases; (2) authoritative disease curation services to researchers and resource providers; and (3) development of subsets of the DO representative of human diseases annotated to animal models curated within the model organism databases.
Collapse
Affiliation(s)
- Lynn M Schriml
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD, 21201, USA.
| | - Elvira Mitraka
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD, 21201, USA
| |
Collapse
|
8
|
Collier N, Oellrich A, Groza T. Toward knowledge support for analysis and interpretation of complex traits. Genome Biol 2015; 14:214. [PMID: 24079802 PMCID: PMC4053827 DOI: 10.1186/gb-2013-14-9-214] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2022] Open
Abstract
The systematic description of complex traits, from the organism to the cellular level, is important for hypothesis generation about underlying disease mechanisms. We discuss how intelligent algorithms might provide support, leading to faster throughput.
Collapse
|
9
|
Blair DR, Wang K, Nestorov S, Evans JA, Rzhetsky A. Quantifying the impact and extent of undocumented biomedical synonymy. PLoS Comput Biol 2014; 10:e1003799. [PMID: 25255227 PMCID: PMC4177665 DOI: 10.1371/journal.pcbi.1003799] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2013] [Accepted: 06/26/2014] [Indexed: 12/14/2022] Open
Abstract
Synonymous relationships among biomedical terms are extensively annotated within specialized terminologies, implying that synonymy is important for practical computational applications within this field. It remains unclear, however, whether text mining actually benefits from documented synonymy and whether existing biomedical thesauri provide adequate coverage of these linguistic relationships. In this study, we examine the impact and extent of undocumented synonymy within a very large compendium of biomedical thesauri. First, we demonstrate that missing synonymy has a significant negative impact on named entity normalization, an important problem within the field of biomedical text mining. To estimate the amount synonymy currently missing from thesauri, we develop a probabilistic model for the construction of synonym terminologies that is capable of handling a wide range of potential biases, and we evaluate its performance using the broader domain of near-synonymy among general English words. Our model predicts that over 90% of these relationships are currently undocumented, a result that we support experimentally through “crowd-sourcing.” Finally, we apply our model to biomedical terminologies and predict that they are missing the vast majority (>90%) of the synonymous relationships they intend to document. Overall, our results expose the dramatic incompleteness of current biomedical thesauri and suggest the need for “next-generation,” high-coverage lexical terminologies. Automated systems that extract and integrate information from the research literature have become common in biomedicine. As the same meaning can be expressed in many distinct but synonymous ways, access to comprehensive thesauri may enable such systems to maximize their performance. Here, we establish the importance of synonymy for a specific text-mining task (named-entity normalization), and we suggest that current thesauri may be woefully inadequate in their documentation of this linguistic phenomenon. To test this claim, we develop a model for estimating the amount of missing synonymy. We apply our model to both biomedical terminologies and general-English thesauri, predicting massive amounts of missing synonymy for both lexicons. Furthermore, we verify some of our predictions for the latter domain through “crowd-sourcing.” Overall, our work highlights the dramatic incompleteness of current biomedical thesauri, and to mitigate this issue, we propose the creation of “living” terminologies, which would automatically harvest undocumented synonymy and help smart machines enrich biomedicine.
Collapse
Affiliation(s)
- David R. Blair
- Institute for Genomics and Systems Biology, University of Chicago, Chicago, Illinois, United States of America
- Committee on Genetics, Genomics, and Systems Biology, University of Chicago, Chicago, Illinois, United States of America
| | - Kanix Wang
- Institute for Genomics and Systems Biology, University of Chicago, Chicago, Illinois, United States of America
- Committee on Genetics, Genomics, and Systems Biology, University of Chicago, Chicago, Illinois, United States of America
| | - Svetlozar Nestorov
- Computation Institute, University of Chicago, Chicago, Illinois, United States of America
| | - James A. Evans
- Computation Institute, University of Chicago, Chicago, Illinois, United States of America
- Department of Sociology, University of Chicago, Chicago, Illinois, United States of America
| | - Andrey Rzhetsky
- Institute for Genomics and Systems Biology, University of Chicago, Chicago, Illinois, United States of America
- Committee on Genetics, Genomics, and Systems Biology, University of Chicago, Chicago, Illinois, United States of America
- Computation Institute, University of Chicago, Chicago, Illinois, United States of America
- Departments of Medicine and Human Genetics, University of Chicago, Chicago, Illinois, United States of America
- * E-mail:
| |
Collapse
|
10
|
Gibaud B, Forestier G, Benoit-Cattin H, Cervenansky F, Clarysse P, Friboulet D, Gaignard A, Hugonnard P, Lartizien C, Liebgott H, Montagnat J, Tabary J, Glatard T. OntoVIP: an ontology for the annotation of object models used for medical image simulation. J Biomed Inform 2014; 52:279-92. [PMID: 25038553 DOI: 10.1016/j.jbi.2014.07.008] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2013] [Revised: 05/16/2014] [Accepted: 07/09/2014] [Indexed: 11/15/2022]
Abstract
This paper describes the creation of a comprehensive conceptualization of object models used in medical image simulation, suitable for major imaging modalities and simulators. The goal is to create an application ontology that can be used to annotate the models in a repository integrated in the Virtual Imaging Platform (VIP), to facilitate their sharing and reuse. Annotations make the anatomical, physiological and pathophysiological content of the object models explicit. In such an interdisciplinary context we chose to rely on a common integration framework provided by a foundational ontology, that facilitates the consistent integration of the various modules extracted from several existing ontologies, i.e. FMA, PATO, MPATH, RadLex and ChEBI. Emphasis is put on methodology for achieving this extraction and integration. The most salient aspects of the ontology are presented, especially the organization in model layers, as well as its use to browse and query the model repository.
Collapse
Affiliation(s)
- Bernard Gibaud
- LTSI - Laboratoire Traitement du Signal et de l'Image, INSERM U1099 - Université de Rennes 1, Faculté de médecine, 2 av. Pr Léon Bernard, 35043 Rennes Cedex, France.
| | - Germain Forestier
- MIPS - Modélisation, Intelligence, Processus et Systèmes - MIPS EA2332 - Université de Haute-Alsace, 12, Rue des frères Lumière, 68093 Mulhouse, France
| | - Hugues Benoit-Cattin
- CREATIS - Centre de Recherche et d'Applications en Traitement de l'Image et du Signal, CNRS UMR 5220 - Inserm U1044 - INSA-Lyon - Univ. Lyon 1, Bâtiment Blaise Pascal, 7 av. Jean Capelle, 69621 Villeurbanne Cedex, France
| | - Frédéric Cervenansky
- CREATIS - Centre de Recherche et d'Applications en Traitement de l'Image et du Signal, CNRS UMR 5220 - Inserm U1044 - INSA-Lyon - Univ. Lyon 1, Bâtiment Blaise Pascal, 7 av. Jean Capelle, 69621 Villeurbanne Cedex, France
| | - Patrick Clarysse
- CREATIS - Centre de Recherche et d'Applications en Traitement de l'Image et du Signal, CNRS UMR 5220 - Inserm U1044 - INSA-Lyon - Univ. Lyon 1, Bâtiment Blaise Pascal, 7 av. Jean Capelle, 69621 Villeurbanne Cedex, France
| | - Denis Friboulet
- CREATIS - Centre de Recherche et d'Applications en Traitement de l'Image et du Signal, CNRS UMR 5220 - Inserm U1044 - INSA-Lyon - Univ. Lyon 1, Bâtiment Blaise Pascal, 7 av. Jean Capelle, 69621 Villeurbanne Cedex, France
| | - Alban Gaignard
- I3S - Laboratoire d'Informatique, Signaux et Systèmes de Sophia Antipolis, CNRS UMR 7271/Université Nice Sophia Antipolis, 2000, Route des Lucioles, Les Algorithmes - bât. Algorithm B, 06903 Sophia Antipolis Cedex, France
| | - Patrick Hugonnard
- CEA-LETI-MINATEC, Recherche technologique, 17, Rue des Martyrs, 38054 Grenoble Cedex 09, France
| | - Carole Lartizien
- CREATIS - Centre de Recherche et d'Applications en Traitement de l'Image et du Signal, CNRS UMR 5220 - Inserm U1044 - INSA-Lyon - Univ. Lyon 1, Bâtiment Blaise Pascal, 7 av. Jean Capelle, 69621 Villeurbanne Cedex, France
| | - Hervé Liebgott
- CREATIS - Centre de Recherche et d'Applications en Traitement de l'Image et du Signal, CNRS UMR 5220 - Inserm U1044 - INSA-Lyon - Univ. Lyon 1, Bâtiment Blaise Pascal, 7 av. Jean Capelle, 69621 Villeurbanne Cedex, France
| | - Johan Montagnat
- I3S - Laboratoire d'Informatique, Signaux et Systèmes de Sophia Antipolis, CNRS UMR 7271/Université Nice Sophia Antipolis, 2000, Route des Lucioles, Les Algorithmes - bât. Algorithm B, 06903 Sophia Antipolis Cedex, France
| | - Joachim Tabary
- CEA-LETI-MINATEC, Recherche technologique, 17, Rue des Martyrs, 38054 Grenoble Cedex 09, France
| | - Tristan Glatard
- CREATIS - Centre de Recherche et d'Applications en Traitement de l'Image et du Signal, CNRS UMR 5220 - Inserm U1044 - INSA-Lyon - Univ. Lyon 1, Bâtiment Blaise Pascal, 7 av. Jean Capelle, 69621 Villeurbanne Cedex, France
| |
Collapse
|
11
|
Cardiff RD, Miller CH, Munn RJ. Analysis of mouse model pathology: a primer for studying the anatomic pathology of genetically engineered mice. Cold Spring Harb Protoc 2014; 2014:561-80. [PMID: 24890215 DOI: 10.1101/pdb.top069922] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
This primer of pathology is intended to introduce investigators to the structure (morphology) of cancer with an emphasis on genetically engineered mouse (GEM) models (GEMMs). We emphasize the necessity of using the entire biological context for the interpretation of anatomic pathology. Because the primary investigator is responsible for almost all of the information and procedures leading up to microscopic examination, they should also be responsible for documentation of experiments so that the microscopic interpretation can be rendered in context of the biology. The steps involved in this process are outlined, discussed, and illustrated. Because GEMMs are unique experimental subjects, some of the more common pitfalls are discussed. Many of these errors can be avoided with attention to detail and continuous quality assurance.
Collapse
Affiliation(s)
- Robert D Cardiff
- Center for Comparative Medicine and Center for Genomic Pathology, University of California, Davis, Davis, California 95616
| | - Claramae H Miller
- Center for Comparative Medicine and Center for Genomic Pathology, University of California, Davis, Davis, California 95616
| | - Robert J Munn
- Center for Comparative Medicine and Center for Genomic Pathology, University of California, Davis, Davis, California 95616
| |
Collapse
|
12
|
Hancock JM. Commentary on Shimoyama et al. (2012): three ontologies to define phenotype measurement data. Front Genet 2014; 5:93. [PMID: 24795755 PMCID: PMC4006037 DOI: 10.3389/fgene.2014.00093] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2014] [Accepted: 04/03/2014] [Indexed: 01/17/2023] Open
Affiliation(s)
- John M Hancock
- Department of Physiology, Development and Neuroscience, University of Cambridge Cambridge, UK
| |
Collapse
|
13
|
InterMOD: integrated data and tools for the unification of model organism research. Sci Rep 2014; 3:1802. [PMID: 23652793 PMCID: PMC3647165 DOI: 10.1038/srep01802] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2012] [Accepted: 04/05/2013] [Indexed: 11/26/2022] Open
Abstract
Model organisms are widely used for understanding basic biology, and have significantly contributed to the study of human disease. In recent years, genomic analysis has provided extensive evidence of widespread conservation of gene sequence and function amongst eukaryotes, allowing insights from model organisms to help decipher gene function in a wider range of species. The InterMOD consortium is developing an infrastructure based around the InterMine data warehouse system to integrate genomic and functional data from a number of key model organisms, leading the way to improved cross-species research. So far including budding yeast, nematode worm, fruit fly, zebrafish, rat and mouse, the project has set up data warehouses, synchronized data models, and created analysis tools and links between data from different species. The project unites a number of major model organism databases, improving both the consistency and accessibility of comparative research, to the benefit of the wider scientific community.
Collapse
|
14
|
Hancock JM. Editorial: biological ontologies and semantic biology. Front Genet 2014; 5:18. [PMID: 24550936 PMCID: PMC3912459 DOI: 10.3389/fgene.2014.00018] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2014] [Accepted: 01/21/2014] [Indexed: 01/22/2023] Open
Affiliation(s)
- John M Hancock
- Department of Physiology, Development and Neuroscience, University of Cambridge Cambridge, UK
| |
Collapse
|
15
|
Cardiff RD, Miller CH, Munn RJ, Galvez JJ. Structured reporting in anatomic pathology for coclinical trials: the caELMIR model. Cold Spring Harb Protoc 2014; 2014:32-43. [PMID: 24173313 DOI: 10.1101/pdb.top078790] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023]
Abstract
Electronic media, with their tremendous potential for storing, retrieving, and integrating data, are an essential part of modern collaborative multidisciplinary science. Structured reporting is a fundamental aspect of keeping accurate, searchable electronic records. This discussion on structured reporting in anatomic pathology for pre- and coclinical trials in animal models provides background information for scientists who are not familiar with structured reporting. Practical examples are provided using a working database system for preclinical research-caELMIR (Cancer Electronic Laboratory Management Information and Retrieval)-developed by the U.S. National Cancer Institute's (NCI's) Mouse Models of Human Cancers Consortium (MMHCC).
Collapse
Affiliation(s)
- Robert D Cardiff
- Center for Comparative Medicine and Center for Genomic Pathology, University of California, Davis, Davis, California 95616
| | | | | | | |
Collapse
|
16
|
Rebholz-Schuhmann D, Kim JH, Yan Y, Dixit A, Friteyre C, Hoehndorf R, Backofen R, Lewin I. Evaluation and cross-comparison of lexical entities of biological interest (LexEBI). PLoS One 2013; 8:e75185. [PMID: 24124474 PMCID: PMC3790750 DOI: 10.1371/journal.pone.0075185] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2012] [Accepted: 08/14/2013] [Indexed: 01/12/2023] Open
Abstract
MOTIVATION Biomedical entities, their identifiers and names, are essential in the representation of biomedical facts and knowledge. In the same way, the complete set of biomedical and chemical terms, i.e. the biomedical "term space" (the "Lexeome"), forms a key resource to achieve the full integration of the scientific literature with biomedical data resources: any identified named entity can immediately be normalized to the correct database entry. This goal does not only require that we are aware of all existing terms, but would also profit from knowing all their senses and their semantic interpretation (ambiguities, nestedness). RESULT This study compiles a resource for lexical terms of biomedical interest in a standard format (called "LexEBI"), determines the overall number of terms, their reuse in different resources and the nestedness of terms. LexEBI comprises references for protein and gene entries and their term variants and chemical entities amongst other terms. In addition, disease terms have been identified from Medline and PubmedCentral and added to LexEBI. Our analysis demonstrates that the baseforms of terms from the different semantic types show only little polysemous use. Nonetheless, the term variants of protein and gene names (PGNs) frequently contain species mentions, which should have been avoided according to protein annotation guidelines. Furthermore, the protein and gene entities as well as the chemical entities, both do comprise enzymes leading to hierarchical polysemy, and a large portion of PGNs make reference to a chemical entity. Altogether, according to our analysis based on the Medline distribution, 401,869 unique PGNs in the documents contain a reference to 25,022 chemical entities, 3,125 disease terms or 1,576 species mentions. CONCLUSION LexEBI delivers the complete biomedical and chemical Lexeome in a standardized representation (http://www.ebi.ac.uk/Rebholz-srv/LexEBI/). The resource provides the disease terms as open source content, and fully interlinks terms across resources.
Collapse
Affiliation(s)
- Dietrich Rebholz-Schuhmann
- Department of Computational Linguistics, University of Zürich, Zürich, Switzerland
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom
- * E-mail:
| | - Jee-Hyub Kim
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom
| | - Ying Yan
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom
| | - Abhishek Dixit
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom
| | - Caroline Friteyre
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom
| | - Robert Hoehndorf
- Department of Genetics, University of Cambridge, Downing Street, Cambridge, United Kingdom
| | - Rolf Backofen
- Albert-Ludwigs-University Freiburg, Fahnenbergplatz, Freiburg, Germany
| | - Ian Lewin
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom
| |
Collapse
|
17
|
Makita Y, Kobayashi N, Yoshida Y, Doi K, Mochizuki Y, Nishikata K, Matsushima A, Takahashi S, Ishii M, Takatsuki T, Bhatia R, Khadbaatar Z, Watabe H, Masuya H, Toyoda T. PosMed: Ranking genes and bioresources based on Semantic Web Association Study. Nucleic Acids Res 2013; 41:W109-14. [PMID: 23761449 PMCID: PMC3692089 DOI: 10.1093/nar/gkt474] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
Abstract
Positional MEDLINE (PosMed; http://biolod.org/PosMed) is a powerful Semantic Web Association Study engine that ranks biomedical resources such as genes, metabolites, diseases and drugs, based on the statistical significance of associations between user-specified phenotypic keywords and resources connected directly or inferentially through a Semantic Web of biological databases such as MEDLINE, OMIM, pathways, co-expressions, molecular interactions and ontology terms. Since 2005, PosMed has long been used for in silico positional cloning studies to infer candidate disease-responsible genes existing within chromosomal intervals. PosMed is redesigned as a workbench to discover possible functional interpretations for numerous genetic variants found from exome sequencing of human disease samples. We also show that the association search engine enhances the value of mouse bioresources because most knockout mouse resources have no phenotypic annotation, but can be associated inferentially to phenotypes via genes and biomedical documents. For this purpose, we established text-mining rules to the biomedical documents by careful human curation work, and created a huge amount of correct linking between genes and documents. PosMed associates any phenotypic keyword to mouse resources with 20 public databases and four original data sets as of May 2013.
Collapse
Affiliation(s)
- Yuko Makita
- Bioinformatics and Systems Engineering Division, RIKEN, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
18
|
Spence AJ, Nicholson-Thomas G, Lampe R. Closing the loop in legged neuromechanics: An open-source computer vision controlled treadmill. J Neurosci Methods 2013; 215:164-9. [DOI: 10.1016/j.jneumeth.2013.03.009] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2012] [Revised: 03/11/2013] [Accepted: 03/12/2013] [Indexed: 01/19/2023]
|
19
|
Glatard T, Lartizien C, Gibaud B, da Silva RF, Forestier G, Cervenansky F, Alessandrini M, Benoit-Cattin H, Bernard O, Camarasu-Pop S, Cerezo N, Clarysse P, Gaignard A, Hugonnard P, Liebgott H, Marache S, Marion A, Montagnat J, Tabary J, Friboulet D. A virtual imaging platform for multi-modality medical image simulation. IEEE TRANSACTIONS ON MEDICAL IMAGING 2013; 32:110-118. [PMID: 23014715 DOI: 10.1109/tmi.2012.2220154] [Citation(s) in RCA: 44] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/01/2023]
Abstract
This paper presents the Virtual Imaging Platform (VIP), a platform accessible at http://vip.creatis.insa-lyon.fr to facilitate the sharing of object models and medical image simulators, and to provide access to distributed computing and storage resources. A complete overview is presented, describing the ontologies designed to share models in a common repository, the workflow template used to integrate simulators, and the tools and strategies used to exploit computing and storage resources. Simulation results obtained in four image modalities and with different models show that VIP is versatile and robust enough to support large simulations. The platform currently has 200 registered users who consumed 33 years of CPU time in 2011.
Collapse
Affiliation(s)
- Tristan Glatard
- Université de Lyon, CREATIS, CNRS UMR5220, INSERM U1044, Villeurbanne, France
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
20
|
Beck T, Free RC, Thorisson GA, Brookes AJ. Semantically enabling a genome-wide association study database. J Biomed Semantics 2012; 3:9. [PMID: 23244533 PMCID: PMC3579732 DOI: 10.1186/2041-1480-3-9] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2012] [Accepted: 08/22/2012] [Indexed: 01/03/2023] Open
Abstract
Background The amount of data generated from genome-wide association studies (GWAS) has grown rapidly, but considerations for GWAS phenotype data reuse and interchange have not kept pace. This impacts on the work of GWAS Central – a free and open access resource for the advanced querying and comparison of summary-level genetic association data. The benefits of employing ontologies for standardising and structuring data are widely accepted. The complex spectrum of observed human phenotypes (and traits), and the requirement for cross-species phenotype comparisons, calls for reflection on the most appropriate solution for the organisation of human phenotype data. The Semantic Web provides standards for the possibility of further integration of GWAS data and the ability to contribute to the web of Linked Data. Results A pragmatic consideration when applying phenotype ontologies to GWAS data is the ability to retrieve all data, at the most granular level possible, from querying a single ontology graph. We found the Medical Subject Headings (MeSH) terminology suitable for describing all traits (diseases and medical signs and symptoms) at various levels of granularity and the Human Phenotype Ontology (HPO) most suitable for describing phenotypic abnormalities (medical signs and symptoms) at the most granular level. Diseases within MeSH are mapped to HPO to infer the phenotypic abnormalities associated with diseases. Building on the rich semantic phenotype annotation layer, we are able to make cross-species phenotype comparisons and publish a core subset of GWAS data as RDF nanopublications. Conclusions We present a methodology for applying phenotype annotations to a comprehensive genome-wide association dataset and for ensuring compatibility with the Semantic Web. The annotations are used to assist with cross-species genotype and phenotype comparisons. However, further processing and deconstructions of terms may be required to facilitate automatic phenotype comparisons. The provision of GWAS nanopublications enables a new dimension for exploring GWAS data, by way of intrinsic links to related data resources within the Linked Data web. The value of such annotation and integration will grow as more biomedical resources adopt the standards of the Semantic Web.
Collapse
Affiliation(s)
- Tim Beck
- Department of Genetics, University of Leicester, University Road, Leicester, UK.
| | | | | | | |
Collapse
|
21
|
Doelken SC, Köhler S, Mungall CJ, Gkoutos GV, Ruef BJ, Smith C, Smedley D, Bauer S, Klopocki E, Schofield PN, Westerfield M, Robinson PN, Lewis SE. Phenotypic overlap in the contribution of individual genes to CNV pathogenicity revealed by cross-species computational analysis of single-gene mutations in humans, mice and zebrafish. Dis Model Mech 2012; 6:358-72. [PMID: 23104991 PMCID: PMC3597018 DOI: 10.1242/dmm.010322] [Citation(s) in RCA: 40] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
Abstract
Numerous disease syndromes are associated with regions of copy number variation (CNV) in the human genome and, in most cases, the pathogenicity of the CNV is thought to be related to altered dosage of the genes contained within the affected segment. However, establishing the contribution of individual genes to the overall pathogenicity of CNV syndromes is difficult and often relies on the identification of potential candidates through manual searches of the literature and online resources. We describe here the development of a computational framework to comprehensively search phenotypic information from model organisms and single-gene human hereditary disorders, and thus speed the interpretation of the complex phenotypes of CNV disorders. There are currently more than 5000 human genes about which nothing is known phenotypically but for which detailed phenotypic information for the mouse and/or zebrafish orthologs is available. Here, we present an ontology-based approach to identify similarities between human disease manifestations and the mutational phenotypes in characterized model organism genes; this approach can therefore be used even in cases where there is little or no information about the function of the human genes. We applied this algorithm to detect candidate genes for 27 recurrent CNV disorders and identified 802 gene-phenotype associations, approximately half of which involved genes that were previously reported to be associated with individual phenotypic features and half of which were novel candidates. A total of 431 associations were made solely on the basis of model organism phenotype data. Additionally, we observed a striking, statistically significant tendency for individual disease phenotypes to be associated with multiple genes located within a single CNV region, a phenomenon that we denote as pheno-clustering. Many of the clusters also display statistically significant similarities in protein function or vicinity within the protein-protein interaction network. Our results provide a basis for understanding previously un-interpretable genotype-phenotype correlations in pathogenic CNVs and for mobilizing the large amount of model organism phenotype data to provide insights into human genetic disorders.
Collapse
Affiliation(s)
- Sandra C Doelken
- Institute for Medical and Human Genetics, Charité-Universitätsmedizin Berlin, Augustenburger Platz 1, 13353 Berlin, Germany
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
22
|
Oellrich A, Gkoutos GV, Hoehndorf R, Rebholz-Schuhmann D. Quantitative comparison of mapping methods between Human and Mammalian Phenotype Ontology. J Biomed Semantics 2012; 3 Suppl 2:S1. [PMID: 23046555 PMCID: PMC3448526 DOI: 10.1186/2041-1480-3-s2-s1] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open
Abstract
Researchers use animal studies to better understand human diseases. In recent years, large-scale phenotype studies such as Phenoscape and EuroPhenome have been initiated to identify genetic causes of a species' phenome. Species-specific phenotype ontologies are required to capture and report about all findings and to automatically infer results relevant to human diseases. The integration of the different phenotype ontologies into a coherent framework is necessary to achieve interoperability for cross-species research. Here, we investigate the quality and completeness of two different methods to align the Human Phenotype Ontology and the Mammalian Phenotype Ontology. The first method combines lexical matching with inference over the ontologies' taxonomic structures, while the second method uses a mapping algorithm based on the formal definitions of the ontologies. Neither method could map all concepts. Despite the formal definitions method provides mappings for more concepts than does the lexical matching method, it does not outperform the lexical matching in a biological use case. Our results suggest that combining both approaches will yield a better mappings in terms of completeness, specificity and application purposes.
Collapse
Affiliation(s)
- Anika Oellrich
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, CB10 1SD, UK.
| | | | | | | |
Collapse
|
23
|
Ramírez-Solis R, Ryder E, Houghton R, White JK, Bottomley J. Large-scale mouse knockouts and phenotypes. WILEY INTERDISCIPLINARY REVIEWS-SYSTEMS BIOLOGY AND MEDICINE 2012; 4:547-63. [PMID: 22899600 DOI: 10.1002/wsbm.1183] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
Standardized phenotypic analysis of mutant forms of every gene in the mouse genome will provide fundamental insights into mammalian gene function and advance human and animal health. The availability of the human and mouse genome sequences, the development of embryonic stem cell mutagenesis technology, the standardization of phenotypic analysis pipelines, and the paradigm-shifting industrialization of these processes have made this a realistic and achievable goal. The size of this enterprise will require global coordination to ensure economies of scale in both the generation and primary phenotypic analysis of the mutant strains, and to minimize unnecessary duplication of effort. To provide more depth to the functional annotation of the genome, effective mechanisms will also need to be developed to disseminate the information and resources produced to the wider community. Better models of disease, potential new drug targets with novel mechanisms of action, and completely unsuspected genotype-phenotype relationships covering broad aspects of biology will become apparent. To reach these goals, solutions to challenges in mouse production and distribution, as well as development of novel, ever more powerful phenotypic analysis modalities will be necessary. It is a challenging and exciting time to work in mouse genetics.
Collapse
|
24
|
Schofield PN, Hoehndorf R, Gkoutos GV. Mouse genetic and phenotypic resources for human genetics. Hum Mutat 2012; 33:826-36. [PMID: 22422677 DOI: 10.1002/humu.22077] [Citation(s) in RCA: 58] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
The use of model organisms to provide information on gene function has proved to be a powerful approach to our understanding of both human disease and fundamental mammalian biology. Large-scale community projects using mice, based on forward and reverse genetics, and now the pan-genomic phenotyping efforts of the International Mouse Phenotyping Consortium, are generating resources on an unprecedented scale, which will be extremely valuable to human genetics and medicine. We discuss the nature and availability of data, mice and embryonic stem cells from these large-scale programmes, the use of these resources to help prioritize and validate candidate genes in human genetic association studies, and how they can improve our understanding of the underlying pathobiology of human disease.
Collapse
Affiliation(s)
- Paul N Schofield
- Department of Physiology, Development and Neuroscience, University of Cambridge, Cambridge, United Kingdom.
| | | | | |
Collapse
|
25
|
Chen CK, Mungall CJ, Gkoutos GV, Doelken SC, Köhler S, Ruef BJ, Smith C, Westerfield M, Robinson PN, Lewis SE, Schofield PN, Smedley D. MouseFinder: Candidate disease genes from mouse phenotype data. Hum Mutat 2012; 33:858-66. [PMID: 22331800 PMCID: PMC3327758 DOI: 10.1002/humu.22051] [Citation(s) in RCA: 50] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2011] [Accepted: 01/20/2012] [Indexed: 12/23/2022]
Abstract
Mouse phenotype data represents a valuable resource for the identification of disease-associated genes, especially where the molecular basis is unknown and there is no clue to the candidate gene's function, pathway involvement or expression pattern. However, until recently these data have not been systematically used due to difficulties in mapping between clinical features observed in humans and mouse phenotype annotations. Here, we describe a semantic approach to solve this problem and demonstrate highly significant recall of known disease-gene associations and orthology relationships. A Web application (MouseFinder; www.mousemodels.org) has been developed to allow users to search the results of our whole-phenome comparison of human and mouse. We demonstrate its use in identifying ARTN as a strong candidate gene within the 1p34.1-p32 mapped locus for a hereditary form of ptosis.
Collapse
Affiliation(s)
- Chao-Kung Chen
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | | | - Georgios V Gkoutos
- Department of Genetics, University of Cambridge, Downing Street, Cambridge CB2 3EG, UK
| | - Sandra C Doelken
- Institute for Medical and Human Genetics, Charité-Universitätsmedizin Berlin, Augustenburger Platz 1, 13353 Berlin, Germany
- Max Planck Institute for Molecular Genetics, Ihnestr. 63 73, 14195 Berlin, Germany
| | - Sebastian Köhler
- Institute for Medical and Human Genetics, Charité-Universitätsmedizin Berlin, Augustenburger Platz 1, 13353 Berlin, Germany
- Berlin Center for Regenerative Therapies (BCRT), Charité-Universitätsmedizin Berlin, Augustenburger Platz 1, 13353 Berlin, Germany
| | | | - Cynthia Smith
- The Jackson Laboratory, 600, Main Street, Bar Harbor, ME 04609-1500, USA
| | | | - Peter N Robinson
- Institute for Medical and Human Genetics, Charité-Universitätsmedizin Berlin, Augustenburger Platz 1, 13353 Berlin, Germany
- Max Planck Institute for Molecular Genetics, Ihnestr. 63 73, 14195 Berlin, Germany
- Berlin Center for Regenerative Therapies (BCRT), Charité-Universitätsmedizin Berlin, Augustenburger Platz 1, 13353 Berlin, Germany
| | - Suzanna E Lewis
- Genomics Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Paul N Schofield
- The Jackson Laboratory, 600, Main Street, Bar Harbor, ME 04609-1500, USA
- Department of Physiology, Development and Neuroscience, University of Cambridge, Downing Street, Cambridge CB2 3EG, UK
| | - Damian Smedley
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| |
Collapse
|
26
|
Schofield PN, Hancock JM. Integration of global resources for human genetic variation and disease. Hum Mutat 2012; 33:813-6. [DOI: 10.1002/humu.22079] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2012] [Accepted: 03/02/2012] [Indexed: 01/22/2023]
|
27
|
Schofield PN, Vogel P, Gkoutos GV, Sundberg JP. Exploring the elephant: histopathology in high-throughput phenotyping of mutant mice. Dis Model Mech 2012; 5:19-25. [PMID: 22028326 PMCID: PMC3255539 DOI: 10.1242/dmm.008334] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2023] Open
Abstract
Recent advances in gene knockout techniques and the in vivo analysis of mutant mice, together with the advent of large-scale projects for systematic mouse mutagenesis and genome-wide phenotyping, have allowed the creation of platforms for the most complete and systematic analysis of gene function ever undertaken in a vertebrate. The development of high-throughput phenotyping pipelines for these and other large-scale projects allows investigators to search and integrate large amounts of directly comparable phenotype data from many mutants, on a genomic scale, to help develop and test new hypotheses about the origins of disease and the normal functions of genes in the organism. Histopathology has a venerable history in the understanding of the pathobiology of human and animal disease, and presents complementary advantages and challenges to in vivo phenotyping. In this review, we present evidence for the unique contribution that histopathology can make to a large-scale phenotyping effort, using examples from past and current programmes at Lexicon Pharmaceuticals and The Jackson Laboratory, and critically assess the role of histopathology analysis in high-throughput phenotyping pipelines.
Collapse
Affiliation(s)
- Paul N Schofield
- Department of Physiology, Development and Neuroscience, University of Cambridge, Downing Street, Cambridge, CB2 3EG, UK.
| | | | | | | |
Collapse
|
28
|
Affiliation(s)
- J P Sundberg
- The Jackson Laboratory, 600 Main Street, Bar Harbor, ME 04609-1500, USA.
| | | |
Collapse
|
29
|
Schofield PN, Sundberg JP, Hoehndorf R, Gkoutos GV. New approaches to the representation and analysis of phenotype knowledge in human diseases and their animal models. Brief Funct Genomics 2011; 10:258-65. [PMID: 21987712 PMCID: PMC3189694 DOI: 10.1093/bfgp/elr031] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
The systematic investigation of the phenotypes associated with genotypes in model organisms holds the promise of revealing genotype-phenotype relations directly and without additional, intermediate inferences. Large-scale projects are now underway to catalog the complete phenome of a species, notably the mouse. With the increasing amount of phenotype information becoming available, a major challenge that biology faces today is the systematic analysis of this information and the translation of research results across species and into an improved understanding of human disease. The challenge is to integrate and combine phenotype descriptions within a species and to systematically relate them to phenotype descriptions in other species, in order to form a comprehensive understanding of the relations between those phenotypes and the genotypes involved in human disease. We distinguish between two major approaches for comparative phenotype analyses: the first relies on evolutionary relations to bridge the species gap, while the other approach compares phenotypes directly. In particular, the direct comparison of phenotypes relies heavily on the quality and coherence of phenotype and disease databases. We discuss major achievements and future challenges for these databases in light of their potential to contribute to the understanding of the molecular mechanisms underlying human disease. In particular, we discuss how the use of ontologies and automated reasoning can significantly contribute to the analysis of phenotypes and demonstrate their potential for enabling translational research.
Collapse
|
30
|
Yanicostas C, Soussi-Yanicostas N, El-Khoury R, Bénit P, Rustin P. Developmental aspects of respiratory chain from fetus to infancy. Semin Fetal Neonatal Med 2011; 16:175-80. [PMID: 21640674 DOI: 10.1016/j.siny.2011.05.005] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Abstract
Reviewing the recent literature on the role of mitochondria during fetal development paradoxically reveals two features: the importance of mitochondria in these early developmental phases, and the scarcity of information available for humans. Indeed, most of the available information on the role of mitochondria during development comes from studies of animal models that do not necessarily strictly apply to humans. In this paper, we attempted to collect information existing on humans, together with data from animal studies essentially presented as corroboration. This makes clear that a complex interacting network of energetic, genetic and epigenetic factors governs the impact of mitochondrial function on early development in humans. This complexity presumably also accounts for our poor understanding of the consequences of impaired mitochondrial function on prenatal development, or conversely, of the impact of development on the expression of such deficiencies.
Collapse
|
31
|
Hoehndorf R, Schofield PN, Gkoutos GV. PhenomeNET: a whole-phenome approach to disease gene discovery. Nucleic Acids Res 2011; 39:e119. [PMID: 21737429 PMCID: PMC3185433 DOI: 10.1093/nar/gkr538] [Citation(s) in RCA: 154] [Impact Index Per Article: 11.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022] Open
Abstract
Phenotypes are investigated in model organisms to understand and reveal the molecular mechanisms underlying disease. Phenotype ontologies were developed to capture and compare phenotypes within the context of a single species. Recently, these ontologies were augmented with formal class definitions that may be utilized to integrate phenotypic data and enable the direct comparison of phenotypes between different species. We have developed a method to transform phenotype ontologies into a formal representation, combine phenotype ontologies with anatomy ontologies, and apply a measure of semantic similarity to construct the PhenomeNET cross-species phenotype network. We demonstrate that PhenomeNET can identify orthologous genes, genes involved in the same pathway and gene–disease associations through the comparison of mutant phenotypes. We provide evidence that the Adam19 and Fgf15 genes in mice are involved in the tetralogy of Fallot, and, using zebrafish phenotypes, propose the hypothesis that the mammalian homologs of Cx36.7 and Nkx2.5 lie in a pathway controlling cardiac morphogenesis and electrical conductivity which, when defective, cause the tetralogy of Fallot phenotype. Our method implements a whole-phenome approach toward disease gene discovery and can be applied to prioritize genes for rare and orphan diseases for which the molecular basis is unknown.
Collapse
Affiliation(s)
- Robert Hoehndorf
- Department of Genetics, University of Cambridge, Downing Street, Cambridge CB2 3EG, UK.
| | | | | |
Collapse
|
32
|
Sundberg JP, Berndt A, Sundberg BA, Silva KA, Kennedy V, Bronson R, Yuan R, Paigen B, Harrison D, Schofield PN. The mouse as a model for understanding chronic diseases of aging: the histopathologic basis of aging in inbred mice. PATHOBIOLOGY OF AGING & AGE RELATED DISEASES 2011; 1:PBA-1-7179. [PMID: 22953031 PMCID: PMC3417678 DOI: 10.3402/pba.v1i0.7179] [Citation(s) in RCA: 58] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/07/2011] [Revised: 04/28/2011] [Accepted: 04/29/2011] [Indexed: 11/30/2022]
Abstract
Inbred mice provide a unique tool to study aging populations because of the genetic homogeneity within an inbred strain, their short life span, and the tools for analysis which are available. A large-scale longitudinal and cross-sectional aging study was conducted on 30 inbred strains to determine, using histopathology, the type and diversity of diseases mice develop as they age. These data provide tools that when linked with modern in silico genetic mapping tools, can begin to unravel the complex genetics of many of the common chronic diseases associated with aging in humans and other mammals. In addition, novel disease models were discovered in some strains, such as rhabdomyosarcoma in old A/J mice, to diseases affecting many but not all strains including pseudoxanthoma elasticum, pulmonary adenoma, alopecia areata, and many others. This extensive data set is now available online and provides a useful tool to help better understand strain-specific background diseases that can complicate interpretation of genetically engineered mice and other manipulatable mouse studies that utilize these strains.
Collapse
|
33
|
A gene-phenotype network for the laboratory mouse and its implications for systematic phenotyping. PLoS One 2011; 6:e19693. [PMID: 21625554 PMCID: PMC3098258 DOI: 10.1371/journal.pone.0019693] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2011] [Accepted: 04/11/2011] [Indexed: 01/22/2023] Open
Abstract
The laboratory mouse is the pre-eminent model organism for the dissection of human disease pathways. With the advent of a comprehensive panel of gene knockouts, projects to characterise the phenotypes of all knockout lines are being initiated. The range of genotype-phenotype associations can be represented using the Mammalian Phenotype ontology. Using publicly available data annotated with this ontology we have constructed gene and phenotype networks representing these associations. These networks show a scale-free, hierarchical and modular character and community structure. They also exhibit enrichment for gene coexpression, protein-protein interactions and Gene Ontology annotation similarity. Close association between gene communities and some high-level ontology terms suggests that systematic phenotyping can provide a direct insight into underlying pathways. However some phenotypes are distributed more diffusely across gene networks, likely reflecting the pleiotropic roles of many genes. Phenotype communities show a many-to-many relationship to human disease communities, but stronger overlap at more granular levels of description. This may suggest that systematic phenotyping projects should aim for high granularity annotations to maximise their relevance to human disease.
Collapse
|
34
|
Masuya H, Makita Y, Kobayashi N, Nishikata K, Yoshida Y, Mochizuki Y, Doi K, Takatsuki T, Waki K, Tanaka N, Ishii M, Matsushima A, Takahashi S, Hijikata A, Kozaki K, Furuichi T, Kawaji H, Wakana S, Nakamura Y, Yoshiki A, Murata T, Fukami-Kobayashi K, Mohan S, Ohara O, Hayashizaki Y, Mizoguchi R, Obata Y, Toyoda T. The RIKEN integrated database of mammals. Nucleic Acids Res 2010; 39:D861-70. [PMID: 21076152 PMCID: PMC3013680 DOI: 10.1093/nar/gkq1078] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
The RIKEN integrated database of mammals (http://scinets.org/db/mammal) is the official undertaking to integrate its mammalian databases produced from multiple large-scale programs that have been promoted by the institute. The database integrates not only RIKEN's original databases, such as FANTOM, the ENU mutagenesis program, the RIKEN Cerebellar Development Transcriptome Database and the Bioresource Database, but also imported data from public databases, such as Ensembl, MGI and biomedical ontologies. Our integrated database has been implemented on the infrastructure of publication medium for databases, termed SciNetS/SciNeS, or the Scientists' Networking System, where the data and metadata are structured as a semantic web and are downloadable in various standardized formats. The top-level ontology-based implementation of mammal-related data directly integrates the representative knowledge and individual data records in existing databases to ensure advanced cross-database searches and reduced unevenness of the data management operations. Through the development of this database, we propose a novel methodology for the development of standardized comprehensive management of heterogeneous data sets in multiple databases to improve the sustainability, accessibility, utility and publicity of the data of biomedical information.
Collapse
|