1
|
Bradshaw MS, Gibbs C, Martin S, Firman T, Gaskell A, Fosdick B, Layer R. Hypothesis generation for rare and undiagnosed diseases through clustering and classifying time-versioned biological ontologies. PLoS One 2024; 19:e0309205. [PMID: 39724242 DOI: 10.1371/journal.pone.0309205] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2023] [Accepted: 08/06/2024] [Indexed: 12/28/2024] Open
Abstract
Rare diseases affect 1-in-10 people in the United States and despite increased genetic testing, up to half never receive a diagnosis. Even when using advanced genome sequencing platforms to discover variants, if there is no connection between the variants found in the patient's genome and their phenotypes in the literature, then the patient will remain undiagnosed. When a direct variant-phenotype connection is not known, putting a patient's information in the larger context of phenotype relationships and protein-protein interactions may provide an opportunity to find an indirect explanation. Databases such as STRING contain millions of protein-protein interactions, and the Human Phenotype Ontology (HPO) contains the relations of thousands of phenotypes. By integrating these networks and clustering the entities within, we can potentially discover latent gene-to-phenotype connections. The historical records for STRING and HPO provide a unique opportunity to create a network time series for evaluating the cluster significance. Most excitingly, working with Children's Hospital Colorado, we have provided promising hypotheses about latent gene-to-phenotype connections for 38 patients. We also provide potential answers for 14 patients listed on MyGene2. Clusters our tool finds significant harbor 2.35 to 8.72 times as many gene-to-phenotype edges inferred from known drug interactions than clusters found to be insignificant. Our tool, BOCC, is available as a web app and command line tool.
Collapse
Affiliation(s)
- Michael S Bradshaw
- Department of Computer Science, University of Colorado Boulder, Boulder, CO, United States of America
| | - Connor Gibbs
- Department of Statistics, Colorado State University, Fort Collins, CO, United States of America
| | - Skylar Martin
- Department of Computer Science, University of Colorado Boulder, Boulder, CO, United States of America
| | - Taylor Firman
- Precision Medicine Institute, Children's Hospital Colorado, Aurora, CO, United States of America
| | - Alisa Gaskell
- Precision Medicine Institute, Children's Hospital Colorado, Aurora, CO, United States of America
| | - Bailey Fosdick
- Department of Biostatistics & Informatics, Colorado School of Public Health, Aurora, CO, United States of America
| | - Ryan Layer
- Department of Computer Science, University of Colorado Boulder, Boulder, CO, United States of America
| |
Collapse
|
2
|
Hyung D, Mallon AM, Kyung DS, Cho SY, Seong JK. TarGo: network based target gene selection system for human disease related mouse models. Lab Anim Res 2019; 35:23. [PMID: 32257911 PMCID: PMC7081697 DOI: 10.1186/s42826-019-0023-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2019] [Accepted: 10/21/2019] [Indexed: 11/25/2022] Open
Abstract
Genetically engineered mouse models are used in high-throughput phenotyping screens to understand genotype-phenotype associations and their relevance to human diseases. However, not all mutant mouse lines with detectable phenotypes are associated with human diseases. Here, we propose the “Target gene selection system for Genetically engineered mouse models” (TarGo). Using a combination of human disease descriptions, network topology, and genotype-phenotype correlations, novel genes that are potentially related to human diseases are suggested. We constructed a gene interaction network using protein-protein interactions, molecular pathways, and co-expression data. Several repositories for human disease signatures were used to obtain information on human disease-related genes. We calculated disease- or phenotype-specific gene ranks using network topology and disease signatures. In conclusion, TarGo provides many novel features for gene function prediction.
Collapse
Affiliation(s)
- Daejin Hyung
- 1National Cancer Center, 323 Ilsan-ro, Goyang-si, Kyeonggi-do 10408 Republic of Korea
| | - Ann-Marie Mallon
- 2MRC Harwell Institute, Mammalian Genetics Unit, Oxfordshire, OX11 0RD UK
| | - Dong Soo Kyung
- 3Laboratory of Developmental Biology and Genomics, Research Institute for Veterinary Science, and BK21 Plus Program for Creative Veterinary Science, College of Veterinary Medicine, Seoul National University, Seoul, 08826 Republic of Korea.,4Korea Mouse Phenotyping Center (KMPC), Seoul National University, Seoul, 08826 Republic of Korea.,5Interdisciplinary Program for Bioinformatics, Program for Cancer Biology and BIO-MAX institute, Seoul National University, Seoul, 08826 Republic of Korea
| | - Soo Young Cho
- 1National Cancer Center, 323 Ilsan-ro, Goyang-si, Kyeonggi-do 10408 Republic of Korea.,4Korea Mouse Phenotyping Center (KMPC), Seoul National University, Seoul, 08826 Republic of Korea
| | - Je Kyung Seong
- 3Laboratory of Developmental Biology and Genomics, Research Institute for Veterinary Science, and BK21 Plus Program for Creative Veterinary Science, College of Veterinary Medicine, Seoul National University, Seoul, 08826 Republic of Korea.,4Korea Mouse Phenotyping Center (KMPC), Seoul National University, Seoul, 08826 Republic of Korea.,5Interdisciplinary Program for Bioinformatics, Program for Cancer Biology and BIO-MAX institute, Seoul National University, Seoul, 08826 Republic of Korea
| |
Collapse
|
3
|
Shen F, Peng S, Fan Y, Wen A, Liu S, Wang Y, Wang L, Liu H. HPO2Vec+: Leveraging heterogeneous knowledge resources to enrich node embeddings for the Human Phenotype Ontology. J Biomed Inform 2019; 96:103246. [PMID: 31255713 DOI: 10.1016/j.jbi.2019.103246] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2019] [Revised: 06/25/2019] [Accepted: 06/26/2019] [Indexed: 11/25/2022]
Abstract
BACKGROUND In precision medicine, deep phenotyping is defined as the precise and comprehensive analysis of phenotypic abnormalities, aiming to acquire a better understanding of the natural history of a disease and its genotype-phenotype associations. Detecting phenotypic relevance is an important task when translating precision medicine into clinical practice, especially for patient stratification tasks based on deep phenotyping. In our previous work, we developed node embeddings for the Human Phenotype Ontology (HPO) to assist in phenotypic relevance measurement incorporating distributed semantic representations. However, the derived HPO embeddings hold only distributed representations for IS-A relationships among nodes, hampering the ability to fully explore the graph. METHODS In this study, we developed a framework, HPO2Vec+, to enrich the produced HPO embeddings with heterogeneous knowledge resources (i.e., DECIPHER, OMIM, and Orphanet) for detecting phenotypic relevance. Specifically, we parsed disease-phenotype associations contained in these three resources to enrich non-inheritance relationships among phenotypic nodes in the HPO. To generate node embeddings for the HPO, node2vec was applied to perform node sampling on the enriched HPO graphs based on random walk followed by feature learning over the sampled nodes to generate enriched node embeddings. Four HPO embeddings were generated based on different graph structures, which we hereafter label as HPOEmb-Original, HPOEmb-DECIPHER, HPOEmb-OMIM, and HPOEmb-Orphanet. We evaluated the derived embeddings quantitatively through an HPO link prediction task with four edge embeddings operations and six machine learning algorithms. The resulting best embeddings were then evaluated for patient stratification of 10 rare diseases using electronic health records (EHR) collected at Mayo Clinic. We assessed our framework qualitatively by visualizing phenotypic clusters and conducting a use case study on primary hyperoxaluria (PH), a rare disease, on the task of inferring relevant phenotypes given 22 annotated PH related phenotypes. RESULTS The quantitative link prediction task shows that HPOEmb-Orphanet achieved an optimal AUROC of 0.92 and an average precision of 0.94. In addition, HPOEmb-Orphanet achieved an optimal F1 score of 0.86. The quantitative patient similarity measurement task indicates that HPOEmb-Orphanet achieved the highest average detection rate for similar patients over 10 rare diseases and performed better than other similarity measures implemented by an existing tool, HPOSim, especially for pairwise patients with fewer shared common phenotypes. The qualitative evaluation shows that the enriched HPO embeddings are generally able to detect relationships among nodes with fine granularity and HPOEmb-Orphanet is particularly good at associating phenotypes across different disease systems. For the use case of detecting relevant phenotypic characterizations for given PH related phenotypes, HPOEmb-Orphanet outperformed the other three HPO embeddings by achieving the highest average P@5 of 0.81 and the highest P@10 of 0.79. Compared to seven conventional similarity measurements provided by HPOSim, HPOEmb-Orphanet is able to detect more relevant phenotypic pairs, especially for pairs not in inheritance relationships. CONCLUSION We drew the following conclusions based on the evaluation results. First, with additional non-inheritance edges, enriched HPO embeddings can detect more associations between fine granularity phenotypic nodes regardless of their topological structures in the HPO graph. Second, HPOEmb-Orphanet not only can achieve the optimal performance through link prediction and patient stratification based on phenotypic similarity, but is also able to detect relevant phenotypes closer to domain expert's judgments than other embeddings and conventional similarity measurements. Third, incorporating heterogeneous knowledge resources do not necessarily result in better performance for detecting relevant phenotypes. From a clinical perspective, in our use case study, clinical-oriented knowledge resources (e.g., Orphanet) can achieve better performance in detecting relevant phenotypic characterizations compared to biomedical-oriented knowledge resources (e.g., DECIPHER and OMIM).
Collapse
Affiliation(s)
- Feichen Shen
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA.
| | - Suyuan Peng
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA; The Second Clinical College Guangzhou University of Chinese Medicine, China
| | - Yadan Fan
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA; Institute for Health Informatics, University of Minnesota, Minneapolis, MN, USA
| | - Andrew Wen
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA
| | - Sijia Liu
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA
| | - Yanshan Wang
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA
| | - Liwei Wang
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA
| | - Hongfang Liu
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA.
| |
Collapse
|
4
|
Rodríguez-García MÁ, Gkoutos GV, Schofield PN, Hoehndorf R. Integrating phenotype ontologies with PhenomeNET. J Biomed Semantics 2017; 8:58. [PMID: 29258588 PMCID: PMC5735523 DOI: 10.1186/s13326-017-0167-4] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2017] [Accepted: 11/22/2017] [Indexed: 01/05/2023] Open
Abstract
BACKGROUND Integration and analysis of phenotype data from humans and model organisms is a key challenge in building our understanding of normal biology and pathophysiology. However, the range of phenotypes and anatomical details being captured in clinical and model organism databases presents complex problems when attempting to match classes across species and across phenotypes as diverse as behaviour and neoplasia. We have previously developed PhenomeNET, a system for disease gene prioritization that includes as one of its components an ontology designed to integrate phenotype ontologies. While not applicable to matching arbitrary ontologies, PhenomeNET can be used to identify related phenotypes in different species, including human, mouse, zebrafish, nematode worm, fruit fly, and yeast. RESULTS Here, we apply the PhenomeNET to identify related classes from two phenotype and two disease ontologies using automated reasoning. We demonstrate that we can identify a large number of mappings, some of which require automated reasoning and cannot easily be identified through lexical approaches alone. Combining automated reasoning with lexical matching further improves results in aligning ontologies. CONCLUSIONS PhenomeNET can be used to align and integrate phenotype ontologies. The results can be utilized for biomedical analyses in which phenomena observed in model organisms are used to identify causative genes and mutations underlying human disease.
Collapse
Affiliation(s)
- Miguel Ángel Rodríguez-García
- Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology, 4700 KAUST, Thuwal, 23955-6900, Saudi Arabia.,Computer, Electrical and Mathematical Sciences & Engineering Division (CEMSE), King Abdullah University of Science and Technology, 4700 KAUST, PO Box 2882, Thuwal, 23955-6900, Saudi Arabia
| | - Georgios V Gkoutos
- College of Medical and Dental Sciences, Institute of Cancer and Genomic Sciences, Centre for Computational Biology, University of Birmingham, Birmingham, B15 2TT, UK.,Institute of Translational Medicine, University Hospitals Birmingham, NHS Foundation Trust, Birmingham, B15 2TT, UK.,Institute of Biological, Environmental and Rural Sciences, Aberystwyth University, Aberystwyth, SY23 2AX, UK
| | - Paul N Schofield
- Department of Physiology, Development & Neuroscience, University of Cambridge, Downing Street, Cambridge, CB2 3EG, UK
| | - Robert Hoehndorf
- Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology, 4700 KAUST, Thuwal, 23955-6900, Saudi Arabia. .,Computer, Electrical and Mathematical Sciences & Engineering Division (CEMSE), King Abdullah University of Science and Technology, 4700 KAUST, PO Box 2882, Thuwal, 23955-6900, Saudi Arabia.
| |
Collapse
|
5
|
Collier N, Oellrich A, Groza T. Toward knowledge support for analysis and interpretation of complex traits. Genome Biol 2015; 14:214. [PMID: 24079802 PMCID: PMC4053827 DOI: 10.1186/gb-2013-14-9-214] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2022] Open
Abstract
The systematic description of complex traits, from the organism to the cellular level, is important for hypothesis generation about underlying disease mechanisms. We discuss how intelligent algorithms might provide support, leading to faster throughput.
Collapse
|
6
|
Mannil D, Vogt I, Prinz J, Campillos M. Organ system heterogeneity DB: a database for the visualization of phenotypes at the organ system level. Nucleic Acids Res 2014; 43:D900-6. [PMID: 25313158 PMCID: PMC4384019 DOI: 10.1093/nar/gku948] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022] Open
Abstract
Perturbations of mammalian organisms including diseases, drug treatments and gene perturbations in mice affect organ systems differently. Some perturbations impair relatively few organ systems while others lead to highly heterogeneous or systemic effects. Organ System Heterogeneity DB (http://mips.helmholtz-muenchen.de/Organ_System_Heterogeneity/) provides information on the phenotypic effects of 4865 human diseases, 1667 drugs and 5361 genetically modified mouse models on 26 different organ systems. Disease symptoms, drug side effects and mouse phenotypes are mapped to the System Organ Class (SOC) level of the Medical Dictionary of Regulatory Activities (MedDRA). Then, the organ system heterogeneity value, a measurement of the systemic impact of a perturbation, is calculated from the relative frequency of phenotypic features across all SOCs. For perturbations of interest, the database displays the distribution of phenotypic effects across organ systems along with the heterogeneity value and the distance between organ system distributions. In this way, it allows, in an easy and comprehensible fashion, the comparison of the phenotypic organ system distributions of diseases, drugs and their corresponding genetically modified mouse models of associated disease genes and drug targets. The Organ System Heterogeneity DB is thus a platform for the visualization and comparison of organ system level phenotypic effects of drugs, diseases and genes.
Collapse
Affiliation(s)
- Deepthi Mannil
- German Center for Diabetes Research, Neuherberg 85764, Germany Institute of Bioinformatics and Systems Biology, Helmholtz Zentrum München, Neuherberg 85764, Germany
| | - Ingo Vogt
- German Center for Diabetes Research, Neuherberg 85764, Germany Institute of Bioinformatics and Systems Biology, Helmholtz Zentrum München, Neuherberg 85764, Germany
| | - Jeanette Prinz
- German Center for Diabetes Research, Neuherberg 85764, Germany Institute of Bioinformatics and Systems Biology, Helmholtz Zentrum München, Neuherberg 85764, Germany
| | - Monica Campillos
- German Center for Diabetes Research, Neuherberg 85764, Germany Institute of Bioinformatics and Systems Biology, Helmholtz Zentrum München, Neuherberg 85764, Germany
| |
Collapse
|
7
|
Smedley D, Oellrich A, Köhler S, Ruef B, Westerfield M, Robinson P, Lewis S, Mungall C. PhenoDigm: analyzing curated annotations to associate animal models with human diseases. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2013; 2013:bat025. [PMID: 23660285 PMCID: PMC3649640 DOI: 10.1093/database/bat025] [Citation(s) in RCA: 96] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
The ultimate goal of studying model organisms is to translate what is learned into useful knowledge about normal human biology and disease to facilitate treatment and early screening for diseases. Recent advances in genomic technologies allow for rapid generation of models with a range of targeted genotypes as well as their characterization by high-throughput phenotyping. As an abundance of phenotype data become available, only systematic analysis will facilitate valid conclusions to be drawn from these data and transferred to human diseases. Owing to the volume of data, automated methods are preferable, allowing for a reliable analysis of the data and providing evidence about possible gene-disease associations. Here, we propose Phenotype comparisons for DIsease Genes and Models (PhenoDigm), as an automated method to provide evidence about gene-disease associations by analysing phenotype information. PhenoDigm integrates data from a variety of model organisms and, at the same time, uses several intermediate scoring methods to identify only strongly data-supported gene candidates for human genetic diseases. We show results of an automated evaluation as well as selected manually assessed examples that support the validity of PhenoDigm. Furthermore, we provide guidance on how to browse the data with PhenoDigm's web interface and illustrate its usefulness in supporting research. Database URL: http://www.sanger.ac.uk/resources/databases/phenodigm
Collapse
Affiliation(s)
- Damian Smedley
- Mouse Informatics Group, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK.
| | | | | | | | | | | | | | | | | |
Collapse
|
8
|
Bai JP, Abernethy DR. Systems Pharmacology to Predict Drug Toxicity: Integration Across Levels of Biological Organization. Annu Rev Pharmacol Toxicol 2013; 53:451-73. [DOI: 10.1146/annurev-pharmtox-011112-140248] [Citation(s) in RCA: 102] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Affiliation(s)
- Jane P.F. Bai
- Office of Clinical Pharmacology, Office of Translational Science, Center for Drug Evaluation and Research, US Food and Drug Administration, Silver Spring, Maryland 20993;
| | - Darrell R. Abernethy
- Office of Clinical Pharmacology, Office of Translational Science, Center for Drug Evaluation and Research, US Food and Drug Administration, Silver Spring, Maryland 20993;
| |
Collapse
|
9
|
Beck T, Free RC, Thorisson GA, Brookes AJ. Semantically enabling a genome-wide association study database. J Biomed Semantics 2012; 3:9. [PMID: 23244533 PMCID: PMC3579732 DOI: 10.1186/2041-1480-3-9] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2012] [Accepted: 08/22/2012] [Indexed: 01/03/2023] Open
Abstract
Background The amount of data generated from genome-wide association studies (GWAS) has grown rapidly, but considerations for GWAS phenotype data reuse and interchange have not kept pace. This impacts on the work of GWAS Central – a free and open access resource for the advanced querying and comparison of summary-level genetic association data. The benefits of employing ontologies for standardising and structuring data are widely accepted. The complex spectrum of observed human phenotypes (and traits), and the requirement for cross-species phenotype comparisons, calls for reflection on the most appropriate solution for the organisation of human phenotype data. The Semantic Web provides standards for the possibility of further integration of GWAS data and the ability to contribute to the web of Linked Data. Results A pragmatic consideration when applying phenotype ontologies to GWAS data is the ability to retrieve all data, at the most granular level possible, from querying a single ontology graph. We found the Medical Subject Headings (MeSH) terminology suitable for describing all traits (diseases and medical signs and symptoms) at various levels of granularity and the Human Phenotype Ontology (HPO) most suitable for describing phenotypic abnormalities (medical signs and symptoms) at the most granular level. Diseases within MeSH are mapped to HPO to infer the phenotypic abnormalities associated with diseases. Building on the rich semantic phenotype annotation layer, we are able to make cross-species phenotype comparisons and publish a core subset of GWAS data as RDF nanopublications. Conclusions We present a methodology for applying phenotype annotations to a comprehensive genome-wide association dataset and for ensuring compatibility with the Semantic Web. The annotations are used to assist with cross-species genotype and phenotype comparisons. However, further processing and deconstructions of terms may be required to facilitate automatic phenotype comparisons. The provision of GWAS nanopublications enables a new dimension for exploring GWAS data, by way of intrinsic links to related data resources within the Linked Data web. The value of such annotation and integration will grow as more biomedical resources adopt the standards of the Semantic Web.
Collapse
Affiliation(s)
- Tim Beck
- Department of Genetics, University of Leicester, University Road, Leicester, UK.
| | | | | | | |
Collapse
|
10
|
Smith CL, Eppig JT. The Mammalian Phenotype Ontology as a unifying standard for experimental and high-throughput phenotyping data. Mamm Genome 2012; 23:653-68. [PMID: 22961259 PMCID: PMC3463787 DOI: 10.1007/s00335-012-9421-3] [Citation(s) in RCA: 126] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2012] [Accepted: 07/24/2012] [Indexed: 01/16/2023]
Abstract
The Mammalian Phenotype Ontology (MP) is a structured vocabulary for describing mammalian phenotypes and serves as a critical tool for efficient annotation and comprehensive retrieval of phenotype data. Importantly, the ontology contains broad and specific terms, facilitating annotation of data from initial observations or screens and detailed data from subsequent experimental research. Using the ontology structure, data are retrieved inclusively, i.e., data annotated to chosen terms and to terms subordinate in the hierarchy. Thus, searching for "abnormal craniofacial morphology" also returns annotations to "megacephaly" and "microcephaly," more specific terms in the hierarchy path. The development and refinement of the MP is ongoing, with new terms and modifications to its organization undergoing continuous assessment as users and expert reviewers propose expansions and revisions. A wealth of phenotype data on mouse mutations and variants annotated to the MP already exists in the Mouse Genome Informatics database. These data, along with data curated to the MP by many mouse mutagenesis programs and mouse repositories, provide a platform for comparative analyses and correlative discoveries. The MP provides a standard underpinning to mouse phenotype descriptions for existing and future experimental and large-scale phenotyping projects. In this review we describe the MP as it presently exists, its application to phenotype annotations, the relationship of the MP to other ontologies, and the integration of the MP within large-scale phenotyping projects. Finally we discuss future application of the MP in providing standard descriptors of the phenotype pipeline test results from the International Mouse Phenotype Consortium projects.
Collapse
|
11
|
Schofield PN, Hoehndorf R, Gkoutos GV. Mouse genetic and phenotypic resources for human genetics. Hum Mutat 2012; 33:826-36. [PMID: 22422677 DOI: 10.1002/humu.22077] [Citation(s) in RCA: 58] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
The use of model organisms to provide information on gene function has proved to be a powerful approach to our understanding of both human disease and fundamental mammalian biology. Large-scale community projects using mice, based on forward and reverse genetics, and now the pan-genomic phenotyping efforts of the International Mouse Phenotyping Consortium, are generating resources on an unprecedented scale, which will be extremely valuable to human genetics and medicine. We discuss the nature and availability of data, mice and embryonic stem cells from these large-scale programmes, the use of these resources to help prioritize and validate candidate genes in human genetic association studies, and how they can improve our understanding of the underlying pathobiology of human disease.
Collapse
Affiliation(s)
- Paul N Schofield
- Department of Physiology, Development and Neuroscience, University of Cambridge, Cambridge, United Kingdom.
| | | | | |
Collapse
|
12
|
Gkoutos GV, Schofield PN, Hoehndorf R. Computational tools for comparative phenomics: the role and promise of ontologies. Mamm Genome 2012; 23:669-79. [PMID: 22814867 DOI: 10.1007/s00335-012-9404-4] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2012] [Accepted: 05/21/2012] [Indexed: 11/28/2022]
Abstract
A major aim of the biological sciences is to gain an understanding of human physiology and disease. One important step towards such a goal is the discovery of the function of genes that will lead to a better understanding of the physiology and pathophysiology of organisms, which will ultimately lead to better diagnosis and therapy. Our increasing ability to phenotypically characterise genetic variants of model organisms coupled with systematic and hypothesis-driven mutagenesis is resulting in a wealth of information that could potentially provide insight into the functions of all genes in an organism. The challenge we are now facing is to develop computational methods that can integrate and analyse such data. The introduction of formal ontologies that make their semantics explicit and accessible to automated reasoning provides the tantalizing possibility of standardizing biomedical knowledge allowing for novel, powerful queries that bridge multiple domains, disciplines, species, and levels of granularity. We review recent computational approaches that facilitate the integration of experimental data from model organisms with clinical observations in humans. These methods foster novel cross-species analysis approaches, thereby enabling comparative phenomics and leading to the potential of translating basic discoveries from the model systems into diagnostic and therapeutic advances at the clinical level.
Collapse
Affiliation(s)
- Georgios V Gkoutos
- Department of Genetics, University of Cambridge, Downing Street, Cambridge CB2 3EH, UK.
| | | | | |
Collapse
|
13
|
Improving disease gene prioritization by comparing the semantic similarity of phenotypes in mice with those of human diseases. PLoS One 2012; 7:e38937. [PMID: 22719993 PMCID: PMC3375301 DOI: 10.1371/journal.pone.0038937] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2011] [Accepted: 05/16/2012] [Indexed: 12/14/2022] Open
Abstract
Despite considerable progress in understanding the molecular origins of hereditary human diseases, the molecular basis of several thousand genetic diseases still remains unknown. High-throughput phenotype studies are underway to systematically assess the phenotype outcome of targeted mutations in model organisms. Thus, comparing the similarity between experimentally identified phenotypes and the phenotypes associated with human diseases can be used to suggest causal genes underlying a disease. In this manuscript, we present a method for disease gene prioritization based on comparing phenotypes of mouse models with those of human diseases. For this purpose, either human disease phenotypes are “translated” into a mouse-based representation (using the Mammalian Phenotype Ontology), or mouse phenotypes are “translated” into a human-based representation (using the Human Phenotype Ontology). We apply a measure of semantic similarity and rank experimentally identified phenotypes in mice with respect to their phenotypic similarity to human diseases. Our method is evaluated on manually curated and experimentally verified gene–disease associations for human and for mouse. We evaluate our approach using a Receiver Operating Characteristic (ROC) analysis and obtain an area under the ROC curve of up to . Furthermore, we are able to confirm previous results that the Vax1 gene is involved in Septo-Optic Dysplasia and suggest Gdf6 and Marcks as further potential candidates. Our method significantly outperforms previous phenotype-based approaches of prioritizing gene–disease associations. To enable the adaption of our method to the analysis of other phenotype data, our software and prioritization results are freely available under a BSD licence at http://code.google.com/p/phenomeblast/wiki/CAMP. Furthermore, our method has been integrated in PhenomeNET and the results can be explored using the PhenomeBrowser at http://phenomebrowser.net.
Collapse
|
14
|
Chen CK, Mungall CJ, Gkoutos GV, Doelken SC, Köhler S, Ruef BJ, Smith C, Westerfield M, Robinson PN, Lewis SE, Schofield PN, Smedley D. MouseFinder: Candidate disease genes from mouse phenotype data. Hum Mutat 2012; 33:858-66. [PMID: 22331800 PMCID: PMC3327758 DOI: 10.1002/humu.22051] [Citation(s) in RCA: 50] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2011] [Accepted: 01/20/2012] [Indexed: 12/23/2022]
Abstract
Mouse phenotype data represents a valuable resource for the identification of disease-associated genes, especially where the molecular basis is unknown and there is no clue to the candidate gene's function, pathway involvement or expression pattern. However, until recently these data have not been systematically used due to difficulties in mapping between clinical features observed in humans and mouse phenotype annotations. Here, we describe a semantic approach to solve this problem and demonstrate highly significant recall of known disease-gene associations and orthology relationships. A Web application (MouseFinder; www.mousemodels.org) has been developed to allow users to search the results of our whole-phenome comparison of human and mouse. We demonstrate its use in identifying ARTN as a strong candidate gene within the 1p34.1-p32 mapped locus for a hereditary form of ptosis.
Collapse
Affiliation(s)
- Chao-Kung Chen
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | | | - Georgios V Gkoutos
- Department of Genetics, University of Cambridge, Downing Street, Cambridge CB2 3EG, UK
| | - Sandra C Doelken
- Institute for Medical and Human Genetics, Charité-Universitätsmedizin Berlin, Augustenburger Platz 1, 13353 Berlin, Germany
- Max Planck Institute for Molecular Genetics, Ihnestr. 63 73, 14195 Berlin, Germany
| | - Sebastian Köhler
- Institute for Medical and Human Genetics, Charité-Universitätsmedizin Berlin, Augustenburger Platz 1, 13353 Berlin, Germany
- Berlin Center for Regenerative Therapies (BCRT), Charité-Universitätsmedizin Berlin, Augustenburger Platz 1, 13353 Berlin, Germany
| | | | - Cynthia Smith
- The Jackson Laboratory, 600, Main Street, Bar Harbor, ME 04609-1500, USA
| | | | - Peter N Robinson
- Institute for Medical and Human Genetics, Charité-Universitätsmedizin Berlin, Augustenburger Platz 1, 13353 Berlin, Germany
- Max Planck Institute for Molecular Genetics, Ihnestr. 63 73, 14195 Berlin, Germany
- Berlin Center for Regenerative Therapies (BCRT), Charité-Universitätsmedizin Berlin, Augustenburger Platz 1, 13353 Berlin, Germany
| | - Suzanna E Lewis
- Genomics Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Paul N Schofield
- The Jackson Laboratory, 600, Main Street, Bar Harbor, ME 04609-1500, USA
- Department of Physiology, Development and Neuroscience, University of Cambridge, Downing Street, Cambridge CB2 3EG, UK
| | - Damian Smedley
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| |
Collapse
|
15
|
Schofield PN, Sundberg JP, Hoehndorf R, Gkoutos GV. New approaches to the representation and analysis of phenotype knowledge in human diseases and their animal models. Brief Funct Genomics 2011; 10:258-65. [PMID: 21987712 PMCID: PMC3189694 DOI: 10.1093/bfgp/elr031] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
The systematic investigation of the phenotypes associated with genotypes in model organisms holds the promise of revealing genotype-phenotype relations directly and without additional, intermediate inferences. Large-scale projects are now underway to catalog the complete phenome of a species, notably the mouse. With the increasing amount of phenotype information becoming available, a major challenge that biology faces today is the systematic analysis of this information and the translation of research results across species and into an improved understanding of human disease. The challenge is to integrate and combine phenotype descriptions within a species and to systematically relate them to phenotype descriptions in other species, in order to form a comprehensive understanding of the relations between those phenotypes and the genotypes involved in human disease. We distinguish between two major approaches for comparative phenotype analyses: the first relies on evolutionary relations to bridge the species gap, while the other approach compares phenotypes directly. In particular, the direct comparison of phenotypes relies heavily on the quality and coherence of phenotype and disease databases. We discuss major achievements and future challenges for these databases in light of their potential to contribute to the understanding of the molecular mechanisms underlying human disease. In particular, we discuss how the use of ontologies and automated reasoning can significantly contribute to the analysis of phenotypes and demonstrate their potential for enabling translational research.
Collapse
|
16
|
Hoehndorf R, Schofield PN, Gkoutos GV. PhenomeNET: a whole-phenome approach to disease gene discovery. Nucleic Acids Res 2011; 39:e119. [PMID: 21737429 PMCID: PMC3185433 DOI: 10.1093/nar/gkr538] [Citation(s) in RCA: 154] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022] Open
Abstract
Phenotypes are investigated in model organisms to understand and reveal the molecular mechanisms underlying disease. Phenotype ontologies were developed to capture and compare phenotypes within the context of a single species. Recently, these ontologies were augmented with formal class definitions that may be utilized to integrate phenotypic data and enable the direct comparison of phenotypes between different species. We have developed a method to transform phenotype ontologies into a formal representation, combine phenotype ontologies with anatomy ontologies, and apply a measure of semantic similarity to construct the PhenomeNET cross-species phenotype network. We demonstrate that PhenomeNET can identify orthologous genes, genes involved in the same pathway and gene–disease associations through the comparison of mutant phenotypes. We provide evidence that the Adam19 and Fgf15 genes in mice are involved in the tetralogy of Fallot, and, using zebrafish phenotypes, propose the hypothesis that the mammalian homologs of Cx36.7 and Nkx2.5 lie in a pathway controlling cardiac morphogenesis and electrical conductivity which, when defective, cause the tetralogy of Fallot phenotype. Our method implements a whole-phenome approach toward disease gene discovery and can be applied to prioritize genes for rare and orphan diseases for which the molecular basis is unknown.
Collapse
Affiliation(s)
- Robert Hoehndorf
- Department of Genetics, University of Cambridge, Downing Street, Cambridge CB2 3EG, UK.
| | | | | |
Collapse
|
17
|
Congdon E, Poldrack RA, Freimer NB. Neurocognitive phenotypes and genetic dissection of disorders of brain and behavior. Neuron 2010; 68:218-30. [PMID: 20955930 DOI: 10.1016/j.neuron.2010.10.007] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/05/2010] [Indexed: 01/10/2023]
Abstract
Elucidating the molecular mechanisms underlying quantitative neurocognitive phenotypes will further our understanding of the brain's structural and functional architecture and advance the diagnosis and treatment of the psychiatric disorders that these traits underlie. Although many neurocognitive traits are highly heritable, little progress has been made in identifying genetic variants unequivocally associated with these phenotypes. A major obstacle to such progress is the difficulty in identifying heritable neurocognitive measures that are precisely defined and systematically assessed and represent unambiguous mental constructs, yet are also amenable to the high-throughput phenotyping necessary to obtain adequate power for genetic association studies. In this perspective we compare the current status of genetic investigations of neurocognitive phenotypes to that of other categories of biomedically relevant traits and suggest strategies for genetically dissecting traits that may underlie disorders of brain and behavior.
Collapse
Affiliation(s)
- Eliza Congdon
- Center for Neurobehavioral Genetics, Semel Institute for Neuroscience and Human Behavior, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | | | | |
Collapse
|
18
|
Davis AP, King BL, Mockus S, Murphy CG, Saraceni-Richards C, Rosenstein M, Wiegers T, Mattingly CJ. The Comparative Toxicogenomics Database: update 2011. Nucleic Acids Res 2010; 39:D1067-72. [PMID: 20864448 PMCID: PMC3013756 DOI: 10.1093/nar/gkq813] [Citation(s) in RCA: 183] [Impact Index Per Article: 12.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
Abstract
The Comparative Toxicogenomics Database (CTD) is a public resource that promotes understanding about the interaction of environmental chemicals with gene products, and their effects on human health. Biocurators at CTD manually curate a triad of chemical–gene, chemical–disease and gene–disease relationships from the literature. These core data are then integrated to construct chemical–gene–disease networks and to predict many novel relationships using different types of associated data. Since 2009, we dramatically increased the content of CTD to 1.4 million chemical–gene–disease data points and added many features, statistical analyses and analytical tools, including GeneComps and ChemComps (to find comparable genes and chemicals that share toxicogenomic profiles), enriched Gene Ontology terms associated with chemicals, statistically ranked chemical–disease inferences, Venn diagram tools to discover overlapping and unique attributes of any set of chemicals, genes or disease, and enhanced gene pathway data content, among other features. Together, this wealth of expanded chemical–gene–disease data continues to help users generate testable hypotheses about the molecular mechanisms of environmental diseases. CTD is freely available at http://ctd.mdibl.org.
Collapse
Affiliation(s)
- Allan Peter Davis
- Department of Bioinformatics, The Mount Desert Island Biological Laboratory, Salisbury Cove, ME 04672, USA
| | | | | | | | | | | | | | | |
Collapse
|