1
|
Althagafi A, Zhapa-Camacho F, Hoehndorf R. Prioritizing genomic variants through neuro-symbolic, knowledge-enhanced learning. Bioinformatics 2024; 40:btae301. [PMID: 38696757 PMCID: PMC11132820 DOI: 10.1093/bioinformatics/btae301] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2023] [Revised: 04/05/2024] [Accepted: 04/30/2024] [Indexed: 05/04/2024] Open
Abstract
MOTIVATION Whole-exome and genome sequencing have become common tools in diagnosing patients with rare diseases. Despite their success, this approach leaves many patients undiagnosed. A common argument is that more disease variants still await discovery, or the novelty of disease phenotypes results from a combination of variants in multiple disease-related genes. Interpreting the phenotypic consequences of genomic variants relies on information about gene functions, gene expression, physiology, and other genomic features. Phenotype-based methods to identify variants involved in genetic diseases combine molecular features with prior knowledge about the phenotypic consequences of altering gene functions. While phenotype-based methods have been successfully applied to prioritizing variants, such methods are based on known gene-disease or gene-phenotype associations as training data and are applicable to genes that have phenotypes associated, thereby limiting their scope. In addition, phenotypes are not assigned uniformly by different clinicians, and phenotype-based methods need to account for this variability. RESULTS We developed an Embedding-based Phenotype Variant Predictor (EmbedPVP), a computational method to prioritize variants involved in genetic diseases by combining genomic information and clinical phenotypes. EmbedPVP leverages a large amount of background knowledge from human and model organisms about molecular mechanisms through which abnormal phenotypes may arise. Specifically, EmbedPVP incorporates phenotypes linked to genes, functions of gene products, and the anatomical site of gene expression, and systematically relates them to their phenotypic effects through neuro-symbolic, knowledge-enhanced machine learning. We demonstrate EmbedPVP's efficacy on a large set of synthetic genomes and genomes matched with clinical information. AVAILABILITY AND IMPLEMENTATION EmbedPVP and all evaluation experiments are freely available at https://github.com/bio-ontology-research-group/EmbedPVP.
Collapse
Affiliation(s)
- Azza Althagafi
- Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), 4700 KAUST, Thuwal 23955, Saudi Arabia
- Computer Science Program, Computer, Electrical and Mathematical Sciences & Engineering Division (CEMSE), King Abdullah University of Science and Technology (KAUST), 4700 KAUST, Thuwal 23955, Saudi Arabia
- Computer Science Department, College of Computers and Information Technology, Taif University, Taif 26571, Saudi Arabia
| | - Fernando Zhapa-Camacho
- Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), 4700 KAUST, Thuwal 23955, Saudi Arabia
- Computer Science Program, Computer, Electrical and Mathematical Sciences & Engineering Division (CEMSE), King Abdullah University of Science and Technology (KAUST), 4700 KAUST, Thuwal 23955, Saudi Arabia
| | - Robert Hoehndorf
- Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), 4700 KAUST, Thuwal 23955, Saudi Arabia
- Computer Science Program, Computer, Electrical and Mathematical Sciences & Engineering Division (CEMSE), King Abdullah University of Science and Technology (KAUST), 4700 KAUST, Thuwal 23955, Saudi Arabia
- SDAIA-KAUST Center of Excellence in Data Science and Artificial Intelligence, King Abdullah University of Science and Technology (KAUST), 4700 KAUST, Thuwal 23955, Saudi Arabia
| |
Collapse
|
2
|
Alghamdi SM, Hoehndorf R. Improving the classification of cardinality phenotypes using collections. J Biomed Semantics 2023; 14:9. [PMID: 37550716 PMCID: PMC10405428 DOI: 10.1186/s13326-023-00290-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2023] [Accepted: 07/07/2023] [Indexed: 08/09/2023] Open
Abstract
MOTIVATION Phenotypes are observable characteristics of an organism and they can be highly variable. Information about phenotypes is collected in a clinical context to characterize disease, and is also collected in model organisms and stored in model organism databases where they are used to understand gene functions. Phenotype data is also used in computational data analysis and machine learning methods to provide novel insights into disease mechanisms and support personalized diagnosis of disease. For mammalian organisms and in a clinical context, ontologies such as the Human Phenotype Ontology and the Mammalian Phenotype Ontology are widely used to formally and precisely describe phenotypes. We specifically analyze axioms pertaining to phenotypes of collections of entities within a body, and we find that some of the axioms in phenotype ontologies lead to inferences that may not accurately reflect the underlying biological phenomena. RESULTS We reformulate the phenotypes of collections of entities using an ontological theory of collections. By reformulating phenotypes of collections in phenotypes ontologies, we avoid potentially incorrect inferences pertaining to the cardinality of these collections. We apply our method to two phenotype ontologies and show that the reformulation not only removes some problematic inferences but also quantitatively improves biological data analysis.
Collapse
Affiliation(s)
- Sarah M Alghamdi
- Computational Bioscience Research Center (CBRC), Computer, Electrical, and Mathematical Sciences & Engineering Division, King Abdullah University of Science and Technology, 4700 KAUST, 23955, Thuwal, Saudi Arabia.
- King Abdul-Aziz University, Faculty of Computing and Information Technology, 25732, Rabigh, Saudi Arabia.
| | - Robert Hoehndorf
- Computational Bioscience Research Center (CBRC), Computer, Electrical, and Mathematical Sciences & Engineering Division, King Abdullah University of Science and Technology, 4700 KAUST, 23955, Thuwal, Saudi Arabia.
| |
Collapse
|
3
|
Fisher ME, Segerdell E, Matentzoglu N, Nenni MJ, Fortriede JD, Chu S, Pells TJ, Osumi-Sutherland D, Chaturvedi P, James-Zorn C, Sundararaj N, Lotay VS, Ponferrada V, Wang DZ, Kim E, Agalakov S, Arshinoff BI, Karimi K, Vize PD, Zorn AM. The Xenopus phenotype ontology: bridging model organism phenotype data to human health and development. BMC Bioinformatics 2022; 23:99. [PMID: 35317743 PMCID: PMC8939077 DOI: 10.1186/s12859-022-04636-8] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2021] [Accepted: 03/08/2022] [Indexed: 11/10/2022] Open
Abstract
Background Ontologies of precisely defined, controlled vocabularies are essential to curate the results of biological experiments such that the data are machine searchable, can be computationally analyzed, and are interoperable across the biomedical research continuum. There is also an increasing need for methods to interrelate phenotypic data easily and accurately from experiments in animal models with human development and disease. Results Here we present the Xenopus phenotype ontology (XPO) to annotate phenotypic data from experiments in Xenopus, one of the major vertebrate model organisms used to study gene function in development and disease. The XPO implements design patterns from the Unified Phenotype Ontology (uPheno), and the principles outlined by the Open Biological and Biomedical Ontologies (OBO Foundry) to maximize interoperability with other species and facilitate ongoing ontology management. Constructed in Web Ontology Language (OWL) the XPO combines the existing uPheno library of ontology design patterns with additional terms from the Xenopus Anatomy Ontology (XAO), the Phenotype and Trait Ontology (PATO) and the Gene Ontology (GO). The integration of these different ontologies into the XPO enables rich phenotypic curation, whilst the uPheno bridging axioms allows phenotypic data from Xenopus experiments to be related to phenotype data from other model organisms and human disease. Moreover, the simple post-composed uPheno design patterns facilitate ongoing XPO development as the generation of new terms and classes of terms can be substantially automated. Conclusions The XPO serves as an example of current best practices to help overcome many of the inherent challenges in harmonizing phenotype data between different species. The XPO currently consists of approximately 22,000 terms and is being used to curate phenotypes by Xenbase, the Xenopus Model Organism Knowledgebase, forming a standardized corpus of genotype–phenotype data that can be directly related to other uPheno compliant resources. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-022-04636-8.
Collapse
Affiliation(s)
- Malcolm E Fisher
- Division of Developmental Biology, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, USA
| | - Erik Segerdell
- Division of Developmental Biology, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, USA
| | - Nicolas Matentzoglu
- Monarch Initiative, London, UK.,Semanticly Ltd, London, UK.,European Bioinformatics Institute (EMBL-EBI), Hinxton, UK
| | - Mardi J Nenni
- Division of Developmental Biology, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, USA
| | - Joshua D Fortriede
- Division of Developmental Biology, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, USA
| | - Stanley Chu
- Department of Biological Science, University of Calgary, Calgary, AB, Canada
| | - Troy J Pells
- Department of Biological Science, University of Calgary, Calgary, AB, Canada
| | | | - Praneet Chaturvedi
- Division of Developmental Biology, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, USA
| | - Christina James-Zorn
- Division of Developmental Biology, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, USA
| | - Nivitha Sundararaj
- Division of Developmental Biology, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, USA
| | - Vaneet S Lotay
- Department of Biological Science, University of Calgary, Calgary, AB, Canada
| | - Virgilio Ponferrada
- Division of Developmental Biology, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, USA
| | - Dong Zhuo Wang
- Department of Biological Science, University of Calgary, Calgary, AB, Canada
| | - Eugene Kim
- Department of Biological Science, University of Calgary, Calgary, AB, Canada
| | - Sergei Agalakov
- Department of Biological Science, University of Calgary, Calgary, AB, Canada
| | - Bradley I Arshinoff
- Department of Biological Science, University of Calgary, Calgary, AB, Canada
| | - Kamran Karimi
- Department of Biological Science, University of Calgary, Calgary, AB, Canada
| | - Peter D Vize
- Department of Biological Science, University of Calgary, Calgary, AB, Canada
| | - Aaron M Zorn
- Division of Developmental Biology, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, USA.
| |
Collapse
|
4
|
Masuya H, Usuda D, Nakata H, Yuhara N, Kurihara K, Namiki Y, Iwase S, Takada T, Tanaka N, Suzuki K, Yamagata Y, Kobayashi N, Yoshiki A, Kushida T. Establishment and application of information resource of mutant mice in RIKEN BioResource Research Center. Lab Anim Res 2021; 37:6. [PMID: 33455583 PMCID: PMC7811887 DOI: 10.1186/s42826-020-00068-8] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2020] [Accepted: 09/21/2020] [Indexed: 12/12/2022] Open
Abstract
Online databases are crucial infrastructures to facilitate the wide effective and efficient use of mouse mutant resources in life sciences. The number and types of mouse resources have been rapidly growing due to the development of genetic modification technology with associated information of genomic sequence and phenotypes. Therefore, data integration technologies to improve the findability, accessibility, interoperability, and reusability of mouse strain data becomes essential for mouse strain repositories. In 2020, the RIKEN BioResource Research Center released an integrated database of bioresources including, experimental mouse strains, Arabidopsis thaliana as a laboratory plant, cell lines, microorganisms, and genetic materials using Resource Description Framework-related technologies. The integrated database shows multiple advanced features for the dissemination of bioresource information. The current version of our online catalog of mouse strains which functions as a part of the integrated database of bioresources is available from search bars on the page of the Center (https://brc.riken.jp) and the Experimental Animal Division (https://mus.brc.riken.jp/) websites. The BioResource Research Center also released a genomic variation database of mouse strains established in Japan and Western Europe, MoG+ (https://molossinus.brc.riken.jp/mogplus/), and a database for phenotype-phenotype associations across the mouse phenome using data from the International Mouse Phenotyping Platform. In this review, we describe features of current version of databases related to mouse strain resources in RIKEN BioResource Research Center and discuss future views.
Collapse
Affiliation(s)
- Hiroshi Masuya
- Integrated Bioresource Information Division, RIKEN BioResource Research Center, 3-1-1 Koyadai, Tsukuba-shi, Ibaraki, 305-0074, Japan.
| | - Daiki Usuda
- Integrated Bioresource Information Division, RIKEN BioResource Research Center, 3-1-1 Koyadai, Tsukuba-shi, Ibaraki, 305-0074, Japan
| | - Hatsumi Nakata
- Experimental Animal Division, BioResource Research Center, RIKEN, Tsukuba, Japan
| | - Naomi Yuhara
- Integrated Bioresource Information Division, RIKEN BioResource Research Center, 3-1-1 Koyadai, Tsukuba-shi, Ibaraki, 305-0074, Japan
| | - Keiko Kurihara
- Integrated Bioresource Information Division, RIKEN BioResource Research Center, 3-1-1 Koyadai, Tsukuba-shi, Ibaraki, 305-0074, Japan
| | - Yuri Namiki
- Integrated Bioresource Information Division, RIKEN BioResource Research Center, 3-1-1 Koyadai, Tsukuba-shi, Ibaraki, 305-0074, Japan
| | - Shigeru Iwase
- Integrated Bioresource Information Division, RIKEN BioResource Research Center, 3-1-1 Koyadai, Tsukuba-shi, Ibaraki, 305-0074, Japan
| | - Toyoyuki Takada
- Integrated Bioresource Information Division, RIKEN BioResource Research Center, 3-1-1 Koyadai, Tsukuba-shi, Ibaraki, 305-0074, Japan
| | - Nobuhiko Tanaka
- Integrated Bioresource Information Division, RIKEN BioResource Research Center, 3-1-1 Koyadai, Tsukuba-shi, Ibaraki, 305-0074, Japan
| | - Kenta Suzuki
- Integrated Bioresource Information Division, RIKEN BioResource Research Center, 3-1-1 Koyadai, Tsukuba-shi, Ibaraki, 305-0074, Japan
| | - Yuki Yamagata
- Laboratory for Developmental Dynamics, Center for Biosystems Dynamics Research, RIKEN, Kobe, Japan
| | - Norio Kobayashi
- Integrated Bioresource Information Division, RIKEN BioResource Research Center, 3-1-1 Koyadai, Tsukuba-shi, Ibaraki, 305-0074, Japan.,Data Knowledge Organization Unit, Head Office for Information Systems and Cybersecurity, RIKEN, Wako, Japan
| | - Atsushi Yoshiki
- Experimental Animal Division, BioResource Research Center, RIKEN, Tsukuba, Japan
| | - Tatsuya Kushida
- Integrated Bioresource Information Division, RIKEN BioResource Research Center, 3-1-1 Koyadai, Tsukuba-shi, Ibaraki, 305-0074, Japan
| |
Collapse
|
5
|
Slater LT, Gkoutos GV, Hoehndorf R. Towards semantic interoperability: finding and repairing hidden contradictions in biomedical ontologies. BMC Med Inform Decis Mak 2020; 20:311. [PMID: 33319712 PMCID: PMC7736131 DOI: 10.1186/s12911-020-01336-2] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2020] [Accepted: 11/16/2020] [Indexed: 12/25/2022] Open
Abstract
Background Ontologies are widely used throughout the biomedical domain. These ontologies formally represent the classes and relations assumed to exist within a domain. As scientific domains are deeply interlinked, so too are their representations. While individual ontologies can be tested for consistency and coherency using automated reasoning methods, systematically combining ontologies of multiple domains together may reveal previously hidden contradictions. Methods We developed a method that tests for hidden unsatisfiabilities in an ontology that arise when combined with other ontologies. For this purpose, we combined sets of ontologies and use automated reasoning to determine whether unsatisfiable classes are present. In addition, we designed and implemented a novel algorithm that can determine justifications for contradictions across extremely large and complicated ontologies, and use these justifications to semi-automatically repair ontologies by identifying a small set of axioms that, when removed, result in a consistent and coherent set of ontologies.
Results We tested the mutual consistency of the OBO Foundry and the OBO ontologies and find that the combined OBO Foundry gives rise to at least 636 unsatisfiable classes, while the OBO ontologies give rise to more than 300,000 unsatisfiable classes. We also applied our semi-automatic repair algorithm to each combination of OBO ontologies that resulted in unsatisfiable classes, finding that only 117 axioms could be removed to account for all cases of unsatisfiability across all OBO ontologies. Conclusions We identified a large set of hidden unsatisfiability across a broad range of biomedical ontologies, and we find that this large set of unsatisfiable classes is the result of a relatively small amount of axiomatic disagreements. Our results show that hidden unsatisfiability is a serious problem in ontology interoperability; however, our results also provide a way towards more consistent ontologies by addressing the issues we identified.
Collapse
Affiliation(s)
- Luke T Slater
- College of Medical and Dental Sciences, Institute of Cancer and Genomic Sciences, University of Birmingham, Birmingham, B15 2TT, UK. .,Institute of Translational Medicine, University Hospitals Birmingham, NHS Foundation Trust, Birmingham, B15 2TT, UK.
| | - Georgios V Gkoutos
- College of Medical and Dental Sciences, Institute of Cancer and Genomic Sciences, University of Birmingham, Birmingham, B15 2TT, UK.,Institute of Translational Medicine, University Hospitals Birmingham, NHS Foundation Trust, Birmingham, B15 2TT, UK.,NIHR Experimental Cancer Medicine Centre, Birmingham, B15 2TT, UK.,NIHR Surgical Reconstruction and Microbiology Research Centre, Birmingham, B15 2TT, UK.,NIHR Biomedical Research Centre, Birmingham, B15 2TT, UK.,MRC Health Data Research UK (HDR UK Midlands, Birmingham, B15 2TT, UK
| | - Robert Hoehndorf
- Computer, Electrical and Mathematical Sciences and Engineering Division, Computational Bioscience Research Center, King Abdullah University of Science and Technology, Thuwal, 23955, Saudi Arabia
| |
Collapse
|
6
|
Thessen AE, Walls RL, Vogt L, Singer J, Warren R, Buttigieg PL, Balhoff JP, Mungall CJ, McGuinness DL, Stucky BJ, Yoder MJ, Haendel MA. Transforming the study of organisms: Phenomic data models and knowledge bases. PLoS Comput Biol 2020; 16:e1008376. [PMID: 33232313 PMCID: PMC7685442 DOI: 10.1371/journal.pcbi.1008376] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023] Open
Abstract
The rapidly decreasing cost of gene sequencing has resulted in a deluge of genomic data from across the tree of life; however, outside a few model organism databases, genomic data are limited in their scientific impact because they are not accompanied by computable phenomic data. The majority of phenomic data are contained in countless small, heterogeneous phenotypic data sets that are very difficult or impossible to integrate at scale because of variable formats, lack of digitization, and linguistic problems. One powerful solution is to represent phenotypic data using data models with precise, computable semantics, but adoption of semantic standards for representing phenotypic data has been slow, especially in biodiversity and ecology. Some phenotypic and trait data are available in a semantic language from knowledge bases, but these are often not interoperable. In this review, we will compare and contrast existing ontology and data models, focusing on nonhuman phenotypes and traits. We discuss barriers to integration of phenotypic data and make recommendations for developing an operationally useful, semantically interoperable phenotypic data ecosystem.
Collapse
Affiliation(s)
- Anne E. Thessen
- Environmental and Molecular Toxicology, Oregon State University, Corvallis, Oregon, United States of America
- Ronin Institute for Independent Scholarship, Monclair, New Jersey, United States of America
| | - Ramona L. Walls
- Bio5 Institute, University of Arizona, Tucson, Arizona, United States of America
| | - Lars Vogt
- TIB Leibniz Information Centre for Science and Technology, Hannover, Germany
| | | | | | - Pier Luigi Buttigieg
- Alfred-Wegener-Institut, Helmholtz-Zentrum für Polar- und Meeresforschung, Bremerhaven, Germany
| | - James P. Balhoff
- Renaissance Computing Institute, University of North Carolina, Chapel Hill, North Carolina, United States of America
| | - Christopher J. Mungall
- Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, California, United States of America
| | | | - Brian J. Stucky
- Florida Museum of Natural History, University of Florida, Gainesville, Florida, United States of America
| | - Matthew J. Yoder
- Illinois Natural History Survey, Champaign, Illinois, United States of America
| | - Melissa A. Haendel
- Environmental and Molecular Toxicology, Oregon State University, Corvallis, Oregon, United States of America
| |
Collapse
|
7
|
Smaili FZ, Gao X, Hoehndorf R. Formal axioms in biomedical ontologies improve analysis and interpretation of associated data. Bioinformatics 2020; 36:2229-2236. [PMID: 31821406 PMCID: PMC7141863 DOI: 10.1093/bioinformatics/btz920] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2019] [Revised: 10/16/2019] [Accepted: 12/06/2019] [Indexed: 12/30/2022] Open
Abstract
Motivation Over the past years, significant resources have been invested into formalizing biomedical ontologies. Formal axioms in ontologies have been developed and used to detect and ensure ontology consistency, find unsatisfiable classes, improve interoperability, guide ontology extension through the application of axiom-based design patterns and encode domain background knowledge. The domain knowledge of biomedical ontologies may have also the potential to provide background knowledge for machine learning and predictive modelling. Results We use ontology-based machine learning methods to evaluate the contribution of formal axioms and ontology meta-data to the prediction of protein–protein interactions and gene–disease associations. We find that the background knowledge provided by the Gene Ontology and other ontologies significantly improves the performance of ontology-based prediction models through provision of domain-specific background knowledge. Furthermore, we find that the labels, synonyms and definitions in ontologies can also provide background knowledge that may be exploited for prediction. The axioms and meta-data of different ontologies contribute to improving data analysis in a context-specific manner. Our results have implications on the further development of formal knowledge bases and ontologies in the life sciences, in particular as machine learning methods are more frequently being applied. Our findings motivate the need for further development, and the systematic, application-driven evaluation and improvement, of formal axioms in ontologies. Availability and implementation https://github.com/bio-ontology-research-group/tsoe. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Fatima Zohra Smaili
- Computer, Electrical & Mathematical Sciences and Engineering (CEMSE) Division, Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
| | - Xin Gao
- Computer, Electrical & Mathematical Sciences and Engineering (CEMSE) Division, Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
| | - Robert Hoehndorf
- Computer, Electrical & Mathematical Sciences and Engineering (CEMSE) Division, Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
| |
Collapse
|
8
|
Shefchek KA, Harris NL, Gargano M, Matentzoglu N, Unni D, Brush M, Keith D, Conlin T, Vasilevsky N, Zhang XA, Balhoff JP, Babb L, Bello SM, Blau H, Bradford Y, Carbon S, Carmody L, Chan LE, Cipriani V, Cuzick A, Della Rocca M, Dunn N, Essaid S, Fey P, Grove C, Gourdine JP, Hamosh A, Harris M, Helbig I, Hoatlin M, Joachimiak M, Jupp S, Lett KB, Lewis SE, McNamara C, Pendlington ZM, Pilgrim C, Putman T, Ravanmehr V, Reese J, Riggs E, Robb S, Roncaglia P, Seager J, Segerdell E, Similuk M, Storm AL, Thaxon C, Thessen A, Jacobsen JOB, McMurry JA, Groza T, Köhler S, Smedley D, Robinson PN, Mungall CJ, Haendel MA, Munoz-Torres MC, Osumi-Sutherland D. The Monarch Initiative in 2019: an integrative data and analytic platform connecting phenotypes to genotypes across species. Nucleic Acids Res 2020; 48:D704-D715. [PMID: 31701156 PMCID: PMC7056945 DOI: 10.1093/nar/gkz997] [Citation(s) in RCA: 152] [Impact Index Per Article: 30.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2019] [Revised: 10/09/2019] [Accepted: 10/14/2019] [Indexed: 12/14/2022] Open
Abstract
In biology and biomedicine, relating phenotypic outcomes with genetic variation and environmental factors remains a challenge: patient phenotypes may not match known diseases, candidate variants may be in genes that haven’t been characterized, research organisms may not recapitulate human or veterinary diseases, environmental factors affecting disease outcomes are unknown or undocumented, and many resources must be queried to find potentially significant phenotypic associations. The Monarch Initiative (https://monarchinitiative.org) integrates information on genes, variants, genotypes, phenotypes and diseases in a variety of species, and allows powerful ontology-based search. We develop many widely adopted ontologies that together enable sophisticated computational analysis, mechanistic discovery and diagnostics of Mendelian diseases. Our algorithms and tools are widely used to identify animal models of human disease through phenotypic similarity, for differential diagnostics and to facilitate translational research. Launched in 2015, Monarch has grown with regards to data (new organisms, more sources, better modeling); new API and standards; ontologies (new Mondo unified disease ontology, improvements to ontologies such as HPO and uPheno); user interface (a redesigned website); and community development. Monarch data, algorithms and tools are being used and extended by resources such as GA4GH and NCATS Translator, among others, to aid mechanistic discovery and diagnostics.
Collapse
Affiliation(s)
- Kent A Shefchek
- Center for Genome Research and Biocomputing, Environmental and Molecular Toxicology, Oregon State University, Corvallis, OR 97331, USA
| | - Nomi L Harris
- Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94710, USA
| | - Michael Gargano
- The Jackson Laboratory For Genomic Medicine, Farmington, CT 06032, USA
| | - Nicolas Matentzoglu
- European Bioinformatics Institute (EMBL-EBI), European Molecular Biology Laboratory, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Deepak Unni
- Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94710, USA
| | - Matthew Brush
- Oregon Clinical and Translational Research Institute, Oregon Health & Science University, Portland, OR 97239, USA
| | - Daniel Keith
- Center for Genome Research and Biocomputing, Environmental and Molecular Toxicology, Oregon State University, Corvallis, OR 97331, USA
| | - Tom Conlin
- Center for Genome Research and Biocomputing, Environmental and Molecular Toxicology, Oregon State University, Corvallis, OR 97331, USA
| | - Nicole Vasilevsky
- Oregon Clinical and Translational Research Institute, Oregon Health & Science University, Portland, OR 97239, USA
| | | | - James P Balhoff
- Renaissance Computing Institute at UNC, Chapel Hill, NC 27517, USA
| | - Larry Babb
- Broad Institute, Cambridge, MA 02142, USA
| | | | - Hannah Blau
- The Jackson Laboratory For Genomic Medicine, Farmington, CT 06032, USA
| | - Yvonne Bradford
- Institute of Neuroscience, University of Oregon, Eugene, OR 97401, USA
| | - Seth Carbon
- Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94710, USA
| | - Leigh Carmody
- The Jackson Laboratory For Genomic Medicine, Farmington, CT 06032, USA
| | - Lauren E Chan
- College of Public Health and Human Sciences, Oregon State University, Corvallis, OR 97331, USA
| | - Valentina Cipriani
- William Harvey Research Institute, Barts & The London School of Medicine & Dentistry, Queen Mary University of London, Charterhouse Square, London EC1M 6BQ, UK
| | | | - Maria Della Rocca
- Office of Rare Diseases Research (ORDR), National Center for Advancing Translational Sciences (NCATS), National Institutes of Health (NIH), Bethesda, MD 20892, USA
| | - Nathan Dunn
- Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94710, USA
| | - Shahim Essaid
- Oregon Clinical and Translational Research Institute, Oregon Health & Science University, Portland, OR 97239, USA
| | - Petra Fey
- dictyBase, Center for Genetic Medicine, Northwestern University, Chicago, IL 60611, USA
| | - Chris Grove
- California Institute of Technology, Pasadena, CA 91125, USA
| | - Jean-Phillipe Gourdine
- Oregon Clinical and Translational Research Institute, Oregon Health & Science University, Portland, OR 97239, USA
| | - Ada Hamosh
- McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University, Baltimore, MD 21205, USA
| | | | - Ingo Helbig
- Division of Neurology, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA.,Department of Biomedical and Health Informatics, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA.,Department of Neuropediatrics, Christian-Albrechts-University of Kiel, 24105 Kiel, Germany.,Department of Neurology, University of Pennsylvania, Perelman School of Medicine, Philadelphia, PA 19104, USA
| | - Maureen Hoatlin
- Department of Biochemistry and Molecular Biology, Oregon Health & Science University, Portland, OR 97239, USA
| | - Marcin Joachimiak
- Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94710, USA
| | - Simon Jupp
- European Bioinformatics Institute (EMBL-EBI), European Molecular Biology Laboratory, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Kenneth B Lett
- Center for Genome Research and Biocomputing, Environmental and Molecular Toxicology, Oregon State University, Corvallis, OR 97331, USA
| | - Suzanna E Lewis
- Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94710, USA
| | | | - Zoë M Pendlington
- European Bioinformatics Institute (EMBL-EBI), European Molecular Biology Laboratory, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | | | - Tim Putman
- Center for Genome Research and Biocomputing, Environmental and Molecular Toxicology, Oregon State University, Corvallis, OR 97331, USA
| | - Vida Ravanmehr
- The Jackson Laboratory For Genomic Medicine, Farmington, CT 06032, USA
| | - Justin Reese
- Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94710, USA
| | - Erin Riggs
- Autism & Developmental Medicine Institute, Geisinger, Danville, PA 17837, USA
| | - Sofia Robb
- Stowers Institute for Medical Research, Kansas City, MO 64110, USA
| | - Paola Roncaglia
- European Bioinformatics Institute (EMBL-EBI), European Molecular Biology Laboratory, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | | | - Erik Segerdell
- Xenbase, Cincinnati Children's Hospital, Cincinnati, OH 45229, USA
| | - Morgan Similuk
- National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD 20892, USA
| | - Andrea L Storm
- Office of Rare Diseases Research (ORDR), National Center for Advancing Translational Sciences (NCATS), National Institutes of Health (NIH), Bethesda, MD 20892, USA
| | - Courtney Thaxon
- University of North Carolina Medical School, University of North Carolina at Chapel Hill, Chapel Hill, NC 27516, USA
| | - Anne Thessen
- Center for Genome Research and Biocomputing, Environmental and Molecular Toxicology, Oregon State University, Corvallis, OR 97331, USA
| | - Julius O B Jacobsen
- William Harvey Research Institute, Barts & The London School of Medicine & Dentistry, Queen Mary University of London, Charterhouse Square, London EC1M 6BQ, UK
| | - Julie A McMurry
- College of Public Health and Human Sciences, Oregon State University, Corvallis, OR 97331, USA
| | | | - Sebastian Köhler
- Institute for Medical Genetics and Human Genetics, Charité-Universitätsmedizin Berlin, Augustenburger Platz 1, 13353 Berlin, Germany
| | - Damian Smedley
- William Harvey Research Institute, Barts & The London School of Medicine & Dentistry, Queen Mary University of London, Charterhouse Square, London EC1M 6BQ, UK
| | - Peter N Robinson
- The Jackson Laboratory For Genomic Medicine, Farmington, CT 06032, USA
| | - Christopher J Mungall
- Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94710, USA
| | - Melissa A Haendel
- Center for Genome Research and Biocomputing, Environmental and Molecular Toxicology, Oregon State University, Corvallis, OR 97331, USA.,Oregon Clinical and Translational Research Institute, Oregon Health & Science University, Portland, OR 97239, USA
| | - Monica C Munoz-Torres
- Center for Genome Research and Biocomputing, Environmental and Molecular Toxicology, Oregon State University, Corvallis, OR 97331, USA
| | - David Osumi-Sutherland
- European Bioinformatics Institute (EMBL-EBI), European Molecular Biology Laboratory, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| |
Collapse
|
9
|
Wang RL. Semantic characterization of adverse outcome pathways. AQUATIC TOXICOLOGY (AMSTERDAM, NETHERLANDS) 2020; 222:105478. [PMID: 32278258 PMCID: PMC7393770 DOI: 10.1016/j.aquatox.2020.105478] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/27/2019] [Revised: 03/17/2020] [Accepted: 03/23/2020] [Indexed: 05/09/2023]
Abstract
This study was undertaken to systematically assess the utilities and performance of ontology-based semantic analysis in adverse outcome pathway (AOP) research. With an increasing number of AOPs developed by scientific domain experts to organize toxicity information and facilitate chemical risk assessment, there is a pressing need for objective approaches to evaluate the biological coherence and quality of these AOPs. Powered by ontologies covering a wide range of biological domains, abundant phenotypic data annotated ontologically, and some sophisticated knowledge computing tools, semantic analysis has great potential in this area of application. With the events in the AOP-Wiki first annotated into logical definitions and then grouped into phenotypic profiles by individual AOPs, the coherence and quality of AOPs were assessed at several levels: paired key event relationships (KER), all possible event pair combinations within AOPs, and the phenotypic profiles of AOPs, genes, biological pathways, human diseases, and selected chemicals. The semantic similarities were assessed at all these levels based on a unified cross-species vertebrate phenotype ontology encompassing the logical definitions of AOP events as well as many other domain ontologies. A substantial number of KERs and AOPs in the AOP-Wiki were found to be semantically coherent. These same coherent AOPs also mapped to many more genes, pathways, and diseases biologically aligned with the intended chain of events therein leading to their respective adverse outcomes. Significantly, these findings imply that semantic analysis should also have utilities in developing future AOPs by selecting candidate events from either the existing AOP-Wiki events or a broader collection of ontology terms semantically similar to the molecular initiating events or adverse outcomes of interest. In addition, semantic analysis enabled AOP networks to be constructed at the level of phenotypic profiles based on similarities, complementing those based on event sharing by bringing genes, pathways, diseases, and chemicals into the networks too-thus greatly expanding the biological scope and our understanding of AOPs.
Collapse
Affiliation(s)
- Rong-Lin Wang
- Great Lakes Toxicology & Ecology Division, Center for Computational Toxicology & Exposure, U.S. Environmental Protection Agency, Cincinnati, OH, 45268, USA.
| |
Collapse
|
10
|
Kafkas Ş, Abdelhakim M, Hashish Y, Kulmanov M, Abdellatif M, Schofield PN, Hoehndorf R. PathoPhenoDB, linking human pathogens to their phenotypes in support of infectious disease research. Sci Data 2019; 6:79. [PMID: 31160594 PMCID: PMC6546783 DOI: 10.1038/s41597-019-0090-x] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2018] [Accepted: 05/07/2019] [Indexed: 12/11/2022] Open
Abstract
Understanding the relationship between the pathophysiology of infectious disease, the biology of the causative agent and the development of therapeutic and diagnostic approaches is dependent on the synthesis of a wide range of types of information. Provision of a comprehensive and integrated disease phenotype knowledgebase has the potential to provide novel and orthogonal sources of information for the understanding of infectious agent pathogenesis, and support for research on disease mechanisms. We have developed PathoPhenoDB, a database containing pathogen-to-phenotype associations. PathoPhenoDB relies on manual curation of pathogen-disease relations, on ontology-based text mining as well as manual curation to associate host disease phenotypes with infectious agents. Using Semantic Web technologies, PathoPhenoDB also links to knowledge about drug resistance mechanisms and drugs used in the treatment of infectious diseases. PathoPhenoDB is accessible at http://patho.phenomebrowser.net/ , and the data are freely available through a public SPARQL endpoint.
Collapse
Affiliation(s)
- Şenay Kafkas
- Computer, Electrical and Mathematical Sciences & Engineering with Division, Computational Bioscience Research Center, King Abdullah University of Science and Technology, Thuwal, 23955, Saudi Arabia
| | - Marwa Abdelhakim
- Computer, Electrical and Mathematical Sciences & Engineering with Division, Computational Bioscience Research Center, King Abdullah University of Science and Technology, Thuwal, 23955, Saudi Arabia
| | - Yasmeen Hashish
- Computer, Electrical and Mathematical Sciences & Engineering with Division, Computational Bioscience Research Center, King Abdullah University of Science and Technology, Thuwal, 23955, Saudi Arabia
| | - Maxat Kulmanov
- Computer, Electrical and Mathematical Sciences & Engineering with Division, Computational Bioscience Research Center, King Abdullah University of Science and Technology, Thuwal, 23955, Saudi Arabia
| | - Marwa Abdellatif
- Computer, Electrical and Mathematical Sciences & Engineering with Division, Computational Bioscience Research Center, King Abdullah University of Science and Technology, Thuwal, 23955, Saudi Arabia
| | - Paul N Schofield
- Department of Physiology, Development & Neuroscience, University of Cambridge, Downing Street, Cambridge, CB2 3EG, United Kingdom
| | - Robert Hoehndorf
- Computer, Electrical and Mathematical Sciences & Engineering with Division, Computational Bioscience Research Center, King Abdullah University of Science and Technology, Thuwal, 23955, Saudi Arabia.
| |
Collapse
|
11
|
Alghamdi SM, Sundberg BA, Sundberg JP, Schofield PN, Hoehndorf R. Quantitative evaluation of ontology design patterns for combining pathology and anatomy ontologies. Sci Rep 2019; 9:4025. [PMID: 30858527 PMCID: PMC6411989 DOI: 10.1038/s41598-019-40368-1] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2018] [Accepted: 02/14/2019] [Indexed: 12/28/2022] Open
Abstract
Data are increasingly annotated with multiple ontologies to capture rich information about the features of the subject under investigation. Analysis may be performed over each ontology separately, but recently there has been a move to combine multiple ontologies to provide more powerful analytical possibilities. However, it is often not clear how to combine ontologies or how to assess or evaluate the potential design patterns available. Here we use a large and well-characterized dataset of anatomic pathology descriptions from a major study of aging mice. We show how different design patterns based on the MPATH and MA ontologies provide orthogonal axes of analysis, and perform differently in over-representation and semantic similarity applications. We discuss how such a data-driven approach might be used generally to generate and evaluate ontology design patterns.
Collapse
Affiliation(s)
- Sarah M Alghamdi
- King Abdullah University of Science and Technology, Computer, Electrical & Mathematical Sciences and Engineering Division, Computational Bioscience Research Center, Thuwal, 23955-6900, Saudi Arabia
- King Abdul-Aziz University, Faculty of Computing and Information Technology, Rabigh, 25732, Saudi Arabia
| | - Beth A Sundberg
- The Jackson Laboratory, 600, Main Street, Bar Harbor, ME, 04609, USA
| | - John P Sundberg
- The Jackson Laboratory, 600, Main Street, Bar Harbor, ME, 04609, USA
| | - Paul N Schofield
- The Jackson Laboratory, 600, Main Street, Bar Harbor, ME, 04609, USA.
- Department of Physiology, Development & Neuroscience, University of Cambridge, Downing Street, Cambridge, CB2 3EG, UK.
| | - Robert Hoehndorf
- King Abdullah University of Science and Technology, Computer, Electrical & Mathematical Sciences and Engineering Division, Computational Bioscience Research Center, Thuwal, 23955-6900, Saudi Arabia.
| |
Collapse
|
12
|
Boudellioua I, Kulmanov M, Schofield PN, Gkoutos GV, Hoehndorf R. DeepPVP: phenotype-based prioritization of causative variants using deep learning. BMC Bioinformatics 2019; 20:65. [PMID: 30727941 PMCID: PMC6364462 DOI: 10.1186/s12859-019-2633-8] [Citation(s) in RCA: 44] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2018] [Accepted: 01/17/2019] [Indexed: 02/07/2023] Open
Abstract
BACKGROUND Prioritization of variants in personal genomic data is a major challenge. Recently, computational methods that rely on comparing phenotype similarity have shown to be useful to identify causative variants. In these methods, pathogenicity prediction is combined with a semantic similarity measure to prioritize not only variants that are likely to be dysfunctional but those that are likely involved in the pathogenesis of a patient's phenotype. RESULTS We have developed DeepPVP, a variant prioritization method that combined automated inference with deep neural networks to identify the likely causative variants in whole exome or whole genome sequence data. We demonstrate that DeepPVP performs significantly better than existing methods, including phenotype-based methods that use similar features. DeepPVP is freely available at https://github.com/bio-ontology-research-group/phenomenet-vp . CONCLUSIONS DeepPVP further improves on existing variant prioritization methods both in terms of speed as well as accuracy.
Collapse
Affiliation(s)
- Imane Boudellioua
- Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology, 4700 KAUST, Thuwal, 23955-6900, Kingdom of Saudi Arabia.,Computer, Electrical and Mathematical Sciences & Engineering Division (CEMSE), King Abdullah University of Science and Technology, 4700 KAUST, PO Box 2882, Thuwal, 23955-6900, Kingdom of Saudi Arabia
| | - Maxat Kulmanov
- Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology, 4700 KAUST, Thuwal, 23955-6900, Kingdom of Saudi Arabia.,Computer, Electrical and Mathematical Sciences & Engineering Division (CEMSE), King Abdullah University of Science and Technology, 4700 KAUST, PO Box 2882, Thuwal, 23955-6900, Kingdom of Saudi Arabia
| | - Paul N Schofield
- Department of Physiology, Development & Neuroscience, University of Cambridge, Downing Street, Cambridge, CB2 3EG, UK
| | - Georgios V Gkoutos
- College of Medical and Dental Sciences, Institute of Cancer and Genomic Sciences, Centre for Computational Biology, University of Birmingham, Birmingham, B15 2TT, UK.,Institute of Translational Medicine, University Hospitals Birmingham, NHS Foundation Trust, Birmingham, B15 2TT, UK.,NIHR Experimental Cancer Medicine Centre, Birmingham, B15 2TT, UK.,NIHR Surgical Reconstruction and Microbiology, Birmingham, B15 2TT, UK.,NIHR Biomedical Research Centre, Birmingham, B15 2TT, UK.,MRC Health Data Research UK, Birmingham, B15 2TT, UK
| | - Robert Hoehndorf
- Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology, 4700 KAUST, Thuwal, 23955-6900, Kingdom of Saudi Arabia. .,Computer, Electrical and Mathematical Sciences & Engineering Division (CEMSE), King Abdullah University of Science and Technology, 4700 KAUST, PO Box 2882, Thuwal, 23955-6900, Kingdom of Saudi Arabia.
| |
Collapse
|
13
|
Xrare: a machine learning method jointly modeling phenotypes and genetic evidence for rare disease diagnosis. Genet Med 2019; 21:2126-2134. [PMID: 30675030 PMCID: PMC6752318 DOI: 10.1038/s41436-019-0439-8] [Citation(s) in RCA: 44] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2018] [Accepted: 01/07/2019] [Indexed: 12/03/2022] Open
Abstract
Purpose Despite the successful progress next-generation sequencing technologies has achieved in diagnosing the genetic cause of rare Mendelian diseases, the current diagnostic rate is still far from satisfactory because of heterogeneity, imprecision, and noise in disease phenotype descriptions and insufficient utilization of expert knowledge in clinical genetics. To overcome these difficulties, we present a novel method called Xrare for the prioritization of causative gene variants in rare disease diagnosis. Methods We propose a new phenotype similarity scoring method called Emission-Reception Information Content (ERIC), which is highly tolerant of noise and imprecision in clinical phenotypes. We utilize medical genetic domain knowledge by designing genetic features implementing American College of Medical Genetics and Genomics (ACMG) guidelines. Results ERIC score ranked consistently higher for disease genes than other phenotypic similarity scores in the presence of imprecise and noisy phenotypes. Extensive simulations and real clinical data demonstrated that Xrare outperforms existing alternative methods by 10–40% at various genetic diagnosis scenarios. Conclusion The Xrare model is learned from a large database of clinical variants, and derives its strength from the tight integration of medical genetics features and phenotypic features similarity scores. Xrare provides the clinical community with a robust and powerful tool for variant prioritization.
Collapse
|
14
|
Gourdine JPF, Brush MH, Vasilevsky NA, Shefchek K, Köhler S, Matentzoglu N, Munoz-Torres MC, McMurry JA, Zhang XA, Robinson PN, Haendel MA. Representing glycophenotypes: semantic unification of glycobiology resources for disease discovery. Database (Oxford) 2019; 2019:baz114. [PMID: 31735951 PMCID: PMC6859258 DOI: 10.1093/database/baz114] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2019] [Revised: 08/27/2019] [Accepted: 08/28/2019] [Indexed: 12/11/2022]
Abstract
While abnormalities related to carbohydrates (glycans) are frequent for patients with rare and undiagnosed diseases as well as in many common diseases, these glycan-related phenotypes (glycophenotypes) are not well represented in knowledge bases (KBs). If glycan-related diseases were more robustly represented and curated with glycophenotypes, these could be used for molecular phenotyping to help to realize the goals of precision medicine. Diagnosis of rare diseases by computational cross-species comparison of genotype-phenotype data has been facilitated by leveraging ontological representations of clinical phenotypes, using Human Phenotype Ontology (HPO), and model organism ontologies such as Mammalian Phenotype Ontology (MP) in the context of the Monarch Initiative. In this article, we discuss the importance and complexity of glycobiology and review the structure of glycan-related content from existing KBs and biological ontologies. We show how semantically structuring knowledge about the annotation of glycophenotypes could enhance disease diagnosis, and propose a solution to integrate glycophenotypes and related diseases into the Unified Phenotype Ontology (uPheno), HPO, Monarch and other KBs. We encourage the community to practice good identifier hygiene for glycans in support of semantic analysis, and clinicians to add glycomics to their diagnostic analyses of rare diseases.
Collapse
Affiliation(s)
- Jean-Philippe F Gourdine
- Oregon Clinical & Translational Research Institute, Oregon Health & Science University, Portland, OR 97239, USA
- OHSU Library, Oregon Health & Science University Library, Portland, OR 97239, USA
- Monarch Initiative, monarchinitiative.org
| | - Matthew H Brush
- Oregon Clinical & Translational Research Institute, Oregon Health & Science University, Portland, OR 97239, USA
- Monarch Initiative, monarchinitiative.org
| | - Nicole A Vasilevsky
- Oregon Clinical & Translational Research Institute, Oregon Health & Science University, Portland, OR 97239, USA
- Monarch Initiative, monarchinitiative.org
| | - Kent Shefchek
- Monarch Initiative, monarchinitiative.org
- Linus Pauling Institute, Oregon State University, Corvallis, OR 97331, USA
| | - Sebastian Köhler
- Monarch Initiative, monarchinitiative.org
- Charité Centrum für Therapieforschung, Charité-Universitätsmedizin Berlin Corporate Member of Freie Universität Berlin, Humboldt-Universität zu Berlin and Berlin Institute of Health, Berlin 10117, Germany
| | - Nicolas Matentzoglu
- Monarch Initiative, monarchinitiative.org
- European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Cambridge, UK
| | - Monica C Munoz-Torres
- Monarch Initiative, monarchinitiative.org
- Linus Pauling Institute, Oregon State University, Corvallis, OR 97331, USA
| | - Julie A McMurry
- Monarch Initiative, monarchinitiative.org
- Linus Pauling Institute, Oregon State University, Corvallis, OR 97331, USA
| | - Xingmin Aaron Zhang
- Monarch Initiative, monarchinitiative.org
- The Jackson Laboratory for Genomic Medicine, Farmington, CT 06032, USA
| | - Peter N Robinson
- Monarch Initiative, monarchinitiative.org
- The Jackson Laboratory for Genomic Medicine, Farmington, CT 06032, USA
| | - Melissa A Haendel
- Oregon Clinical & Translational Research Institute, Oregon Health & Science University, Portland, OR 97239, USA
- Monarch Initiative, monarchinitiative.org
- Linus Pauling Institute, Oregon State University, Corvallis, OR 97331, USA
| |
Collapse
|
15
|
Kafkas Ş, Hoehndorf R. Ontology based text mining of gene-phenotype associations: application to candidate gene prediction. Database (Oxford) 2019; 2019:baz019. [PMID: 30809638 PMCID: PMC6391585 DOI: 10.1093/database/baz019] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2018] [Revised: 01/09/2019] [Accepted: 01/26/2019] [Indexed: 01/07/2023]
Abstract
Gene-phenotype associations play an important role in understanding the disease mechanisms which is a requirement for treatment development. A portion of gene-phenotype associations are observed mainly experimentally and made publicly available through several standard resources such as MGI. However, there is still a vast amount of gene-phenotype associations buried in the biomedical literature. Given the large amount of literature data, we need automated text mining tools to alleviate the burden in manual curation of gene-phenotype associations and to develop comprehensive resources. In this study, we present an ontology-based approach in combination with statistical methods to text mine gene-phenotype associations from the literature. Our method achieved AUC values of 0.90 and 0.75 in recovering known gene-phenotype associations from HPO and MGI respectively. We posit that candidate genes and their relevant diseases should be expressed with similar phenotypes in publications. Thus, we demonstrate the utility of our approach by predicting disease candidate genes based on the semantic similarities of phenotypes associated with genes and diseases. To the best of our knowledge, this is the first study using an ontology based approach to extract gene-phenotype associations from the literature. We evaluated our disease candidate prediction model on the gene-disease associations from MGI. Our model achieved AUC values of 0.90 and 0.87 on OMIM (human) and MGI (mouse) datasets of gene-disease associations respectively. Our manual analysis on the text mined data revealed that our method can accurately extract gene-phenotype associations which are not currently covered by the existing public gene-phenotype resources. Overall, results indicate that our method can precisely extract known as well as new gene-phenotype associations from literature. All the data and methods are available at https://github.com/bio-ontology-research-group/genepheno.
Collapse
Affiliation(s)
- Şenay Kafkas
- Computer, Electrical and Mathematical Sciences & Engineering Division, Computational Bioscience Research Center, King Abdullah University of Science and Technology, Thuwal, Kingdom of Saudi Arabia
| | - Robert Hoehndorf
- Computer, Electrical and Mathematical Sciences & Engineering Division, Computational Bioscience Research Center, King Abdullah University of Science and Technology, Thuwal, Kingdom of Saudi Arabia
| |
Collapse
|
16
|
Wang RL, Edwards S, Ives C. Ontology-based semantic mapping of chemical toxicities. Toxicology 2018; 412:89-100. [PMID: 30468866 DOI: 10.1016/j.tox.2018.11.005] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2018] [Revised: 11/11/2018] [Accepted: 11/19/2018] [Indexed: 12/15/2022]
Abstract
This study was undertaken to evaluate the use of ontology-based semantic mapping (OS-Mapping) in chemical toxicity assessment. Nineteen chemical-species phenotypic profiles (CSPPs) were constructed by ontologically annotating the toxicity responses reported in more than seven hundred published studies of ten chemicals on six vertebrate species. The CSPPs were semantically compared to more than 29,000 publicly available phenotypic profiles of genes, KEGG (Kyoto Encyclopedia of Genes and Genomes) pathways, and diseases based on a cross-species phenotype ontology. OS-Mapping was shown to differentiate chemical toxicities among themselves as well as within and across species. It also revealed cases of chemical by species interactions. In addition to confirming similar MOAs (mechanisms of action) for a few chemicals, OS-Mapping also generated novel insights into the MOAs underlying some seemingly different, yet phenotypically similar, classes of chemicals. The nature of a unified cross-species phenotype ontology and its representation of diverse knowledge domains allowed the construction of a complete phenotypic continuum for the 17α-ethynylestradiol_fathead minnow across the biological levels of organization, which complemented a similar one derived from the Comparative Toxicogenomics Database but based primarily on 17α-ethynylestradiol-induced molecular phenotypes. Overall, OS-Mapping has been demonstrated to offer a powerful approach to help bridge the gap between the molecular and non-molecular phenotypes of chemicals characterized by using high throughput or traditional omics methods and their apical endpoints of greater regulatory relevance, which are typically phenotypes found at the higher levels of biological organization. OS-Mapping also enables comparative toxicity assessment among chemicals, both within and across species. Furthermore, the semantic analysis of phenotypes can reveal additional novel MOAs for some well-known chemicals and discover candidate MOAs for chemicals that are less molecularly characterized. A full phenotypic continuum based on OS-Mapping will also be conducive to the future development of adverse outcome pathways. As phenomics continues to advance and the ontological annotation of literature becomes more automated, the power of OS-Mapping will be further enhanced.
Collapse
Affiliation(s)
- Rong-Lin Wang
- Exposure Methods and Measurements Division, National Exposure Research Laboratory, US EPA, Cincinnati, OH 45268, USA.
| | - Stephen Edwards
- Research Computing Division, RTI International, Research Triangle Park, NC 27709, USA
| | - Cataia Ives
- Research Computing Division, RTI International, Research Triangle Park, NC 27709, USA
| |
Collapse
|
17
|
Howe DG, Blake JA, Bradford YM, Bult CJ, Calvi BR, Engel SR, Kadin JA, Kaufman TC, Kishore R, Laulederkind SJF, Lewis SE, Moxon SAT, Richardson JE, Smith C. Model organism data evolving in support of translational medicine. Lab Anim (NY) 2018; 47:277-289. [PMID: 30224793 PMCID: PMC6322546 DOI: 10.1038/s41684-018-0150-4] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2018] [Accepted: 08/13/2018] [Indexed: 02/07/2023]
Abstract
Model organism databases (MODs) have been collecting and integrating biomedical research data for 30 years and were designed to meet specific needs of each model organism research community. The contributions of model organism research to understanding biological systems would be hard to overstate. Modern molecular biology methods and cost reductions in nucleotide sequencing have opened avenues for direct application of model organism research to elucidating mechanisms of human diseases. Thus, the mandate for model organism research and databases has now grown to include facilitating use of these data in translational applications. Challenges in meeting this opportunity include the distribution of research data across many databases and websites, a lack of data format standards for some data types, and sustainability of scale and cost for genomic database resources like MODs. The issues of widely distributed data and application of data standards are some of the challenges addressed by FAIR (Findable, Accessible, Interoperable, and Re-usable) data principles. The Alliance of Genome Resources is now moving to address these challenges by bringing together expertly curated research data from fly, mouse, rat, worm, yeast, zebrafish, and the Gene Ontology consortium. Centralized multi-species data access, integration, and format standardization will lower the data utilization barrier in comparative genomics and translational applications and will provide a framework in which sustainable scale and cost can be addressed. This article presents a brief historical perspective on how the Alliance model organisms are complementary and how they have already contributed to understanding the etiology of human diseases. In addition, we discuss four challenges for using data from MODs in translational applications and how the Alliance is working to address them, in part by applying FAIR data principles. Ultimately, combined data from these animal models are more powerful than the sum of the parts.
Collapse
Affiliation(s)
- Douglas G Howe
- The Institute of Neuroscience, University of Oregon, Eugene, OR, USA.
| | | | - Yvonne M Bradford
- The Institute of Neuroscience, University of Oregon, Eugene, OR, USA
| | | | - Brian R Calvi
- Department of Biology, Indiana University, Bloomington, IN, USA
| | - Stacia R Engel
- Department of Genetics, Stanford University, Palo Alto, CA, USA
| | | | | | - Ranjana Kishore
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, USA
| | - Stanley J F Laulederkind
- Department of Biomedical Engineering, Medical College of Wisconsin and Marquette University, Milwaukee, WI, USA
| | - Suzanna E Lewis
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Sierra A T Moxon
- The Institute of Neuroscience, University of Oregon, Eugene, OR, USA
| | | | | |
Collapse
|
18
|
Gkoutos GV, Schofield PN, Hoehndorf R. The anatomy of phenotype ontologies: principles, properties and applications. Brief Bioinform 2018; 19:1008-1021. [PMID: 28387809 PMCID: PMC6169674 DOI: 10.1093/bib/bbx035] [Citation(s) in RCA: 48] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/01/2017] [Revised: 02/05/2017] [Indexed: 12/14/2022] Open
Abstract
The past decade has seen an explosion in the collection of genotype data in domains as diverse as medicine, ecology, livestock and plant breeding. Along with this comes the challenge of dealing with the related phenotype data, which is not only large but also highly multidimensional. Computational analysis of phenotypes has therefore become critical for our ability to understand the biological meaning of genomic data in the biological sciences. At the heart of computational phenotype analysis are the phenotype ontologies. A large number of these ontologies have been developed across many domains, and we are now at a point where the knowledge captured in the structure of these ontologies can be used for the integration and analysis of large interrelated data sets. The Phenotype And Trait Ontology framework provides a method for formal definitions of phenotypes and associated data sets and has proved to be key to our ability to develop methods for the integration and analysis of phenotype data. Here, we describe the development and products of the ontological approach to phenotype capture, the formal content of phenotype ontologies and how their content can be used computationally.
Collapse
Affiliation(s)
| | | | - Robert Hoehndorf
- Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology, King Abdullah University of Science and Technology, Thuwal
| |
Collapse
|
19
|
Cornish AJ, David A, Sternberg MJE. PhenoRank: reducing study bias in gene prioritization through simulation. Bioinformatics 2018; 34:2087-2095. [PMID: 29360927 PMCID: PMC5949213 DOI: 10.1093/bioinformatics/bty028] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2017] [Revised: 01/10/2018] [Accepted: 01/16/2018] [Indexed: 02/07/2023] Open
Abstract
Motivation Genome-wide association studies have identified thousands of loci associated with human disease, but identifying the causal genes at these loci is often difficult. Several methods prioritize genes most likely to be disease causing through the integration of biological data, including protein-protein interaction and phenotypic data. Data availability is not the same for all genes however, potentially influencing the performance of these methods. Results We demonstrate that whilst disease genes tend to be associated with greater numbers of data, this may be at least partially a result of them being better studied. With this observation we develop PhenoRank, which prioritizes disease genes whilst avoiding being biased towards genes with more available data. Bias is avoided by comparing gene scores generated for the query disease against gene scores generated using simulated sets of phenotype terms, which ensures that differences in data availability do not affect the ranking of genes. We demonstrate that whilst existing prioritization methods are biased by data availability, PhenoRank is not similarly biased. Avoiding this bias allows PhenoRank to effectively prioritize genes with fewer available data and improves its overall performance. PhenoRank outperforms three available prioritization methods in cross-validation (PhenoRank area under receiver operating characteristic curve [AUC]=0.89, DADA AUC = 0.87, EXOMISER AUC = 0.71, PRINCE AUC = 0.83, P < 2.2 × 10-16). Availability and implementation PhenoRank is freely available for download at https://github.com/alexjcornish/PhenoRank. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Alex J Cornish
- Department of Life Sciences, Center of Bioinformatics and Systems
Biology, Imperial College London, London, UK
| | - Alessia David
- Department of Life Sciences, Center of Bioinformatics and Systems
Biology, Imperial College London, London, UK
| | - Michael J E Sternberg
- Department of Life Sciences, Center of Bioinformatics and Systems
Biology, Imperial College London, London, UK
| |
Collapse
|
20
|
Rodríguez-García MÁ, Gkoutos GV, Schofield PN, Hoehndorf R. Integrating phenotype ontologies with PhenomeNET. J Biomed Semantics 2017; 8:58. [PMID: 29258588 PMCID: PMC5735523 DOI: 10.1186/s13326-017-0167-4] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2017] [Accepted: 11/22/2017] [Indexed: 01/05/2023] Open
Abstract
BACKGROUND Integration and analysis of phenotype data from humans and model organisms is a key challenge in building our understanding of normal biology and pathophysiology. However, the range of phenotypes and anatomical details being captured in clinical and model organism databases presents complex problems when attempting to match classes across species and across phenotypes as diverse as behaviour and neoplasia. We have previously developed PhenomeNET, a system for disease gene prioritization that includes as one of its components an ontology designed to integrate phenotype ontologies. While not applicable to matching arbitrary ontologies, PhenomeNET can be used to identify related phenotypes in different species, including human, mouse, zebrafish, nematode worm, fruit fly, and yeast. RESULTS Here, we apply the PhenomeNET to identify related classes from two phenotype and two disease ontologies using automated reasoning. We demonstrate that we can identify a large number of mappings, some of which require automated reasoning and cannot easily be identified through lexical approaches alone. Combining automated reasoning with lexical matching further improves results in aligning ontologies. CONCLUSIONS PhenomeNET can be used to align and integrate phenotype ontologies. The results can be utilized for biomedical analyses in which phenomena observed in model organisms are used to identify causative genes and mutations underlying human disease.
Collapse
Affiliation(s)
- Miguel Ángel Rodríguez-García
- Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology, 4700 KAUST, Thuwal, 23955-6900, Saudi Arabia.,Computer, Electrical and Mathematical Sciences & Engineering Division (CEMSE), King Abdullah University of Science and Technology, 4700 KAUST, PO Box 2882, Thuwal, 23955-6900, Saudi Arabia
| | - Georgios V Gkoutos
- College of Medical and Dental Sciences, Institute of Cancer and Genomic Sciences, Centre for Computational Biology, University of Birmingham, Birmingham, B15 2TT, UK.,Institute of Translational Medicine, University Hospitals Birmingham, NHS Foundation Trust, Birmingham, B15 2TT, UK.,Institute of Biological, Environmental and Rural Sciences, Aberystwyth University, Aberystwyth, SY23 2AX, UK
| | - Paul N Schofield
- Department of Physiology, Development & Neuroscience, University of Cambridge, Downing Street, Cambridge, CB2 3EG, UK
| | - Robert Hoehndorf
- Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology, 4700 KAUST, Thuwal, 23955-6900, Saudi Arabia. .,Computer, Electrical and Mathematical Sciences & Engineering Division (CEMSE), King Abdullah University of Science and Technology, 4700 KAUST, PO Box 2882, Thuwal, 23955-6900, Saudi Arabia.
| |
Collapse
|
21
|
Abstract
The principles of genetics apply across the entire tree of life. At the cellular level we share biological mechanisms with species from which we diverged millions, even billions of years ago. We can exploit this common ancestry to learn about health and disease, by analyzing DNA and protein sequences, but also through the observable outcomes of genetic differences, i.e. phenotypes. To solve challenging disease problems we need to unify the heterogeneous data that relates genomics to disease traits. Without a big-picture view of phenotypic data, many questions in genetics are difficult or impossible to answer. The Monarch Initiative (https://monarchinitiative.org) provides tools for genotype-phenotype analysis, genomic diagnostics, and precision medicine across broad areas of disease.
Collapse
|
22
|
Improved Diagnosis and Care for Rare Diseases through Implementation of Precision Public Health Framework. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2017; 1031:55-94. [PMID: 29214566 DOI: 10.1007/978-3-319-67144-4_4] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
Public health relies on technologies to produce and analyse data, as well as effectively develop and implement policies and practices. An example is the public health practice of epidemiology, which relies on computational technology to monitor the health status of populations, identify disadvantaged or at risk population groups and thereby inform health policy and priority setting. Critical to achieving health improvements for the underserved population of people living with rare diseases is early diagnosis and best care. In the rare diseases field, the vast majority of diseases are caused by destructive but previously difficult to identify protein-coding gene mutations. The reduction in cost of genetic testing and advances in the clinical use of genome sequencing, data science and imaging are converging to provide more precise understandings of the 'person-time-place' triad. That is: who is affected (people); when the disease is occurring (time); and where the disease is occurring (place). Consequently we are witnessing a paradigm shift in public health policy and practice towards 'precision public health'.Patient and stakeholder engagement has informed the need for a national public health policy framework for rare diseases. The engagement approach in different countries has produced highly comparable outcomes and objectives. Knowledge and experience sharing across the international rare diseases networks and partnerships has informed the development of the Western Australian Rare Diseases Strategic Framework 2015-2018 (RD Framework) and Australian government health briefings on the need for a National plan.The RD Framework is guiding the translation of genomic and other technologies into the Western Australian health system, leading to greater precision in diagnostic pathways and care, and is an example of how a precision public health framework can improve health outcomes for the rare diseases population.Five vignettes are used to illustrate how policy decisions provide the scaffolding for translation of new genomics knowledge, and catalyze transformative change in delivery of clinical services. The vignettes presented here are from an Australian perspective and are not intended to be comprehensive, but rather to provide insights into how a new and emerging 'precision public health' paradigm can improve the experiences of patients living with rare diseases, their caregivers and families.The conclusion is that genomic public health is informed by the individual and family needs, and the population health imperatives of an early and accurate diagnosis; which is the portal to best practice care. Knowledge sharing is critical for public health policy development and improving the lives of people living with rare diseases.
Collapse
|
23
|
Mungall CJ, McMurry JA, Köhler S, Balhoff JP, Borromeo C, Brush M, Carbon S, Conlin T, Dunn N, Engelstad M, Foster E, Gourdine JP, Jacobsen JOB, Keith D, Laraway B, Lewis SE, NguyenXuan J, Shefchek K, Vasilevsky N, Yuan Z, Washington N, Hochheiser H, Groza T, Smedley D, Robinson PN, Haendel MA. The Monarch Initiative: an integrative data and analytic platform connecting phenotypes to genotypes across species. Nucleic Acids Res 2016; 45:D712-D722. [PMID: 27899636 PMCID: PMC5210586 DOI: 10.1093/nar/gkw1128] [Citation(s) in RCA: 189] [Impact Index Per Article: 21.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2016] [Revised: 10/26/2016] [Accepted: 11/02/2016] [Indexed: 02/04/2023] Open
Abstract
The correlation of phenotypic outcomes with genetic variation and environmental factors is a core pursuit in biology and biomedicine. Numerous challenges impede our progress: patient phenotypes may not match known diseases, candidate variants may be in genes that have not been characterized, model organisms may not recapitulate human or veterinary diseases, filling evolutionary gaps is difficult, and many resources must be queried to find potentially significant genotype–phenotype associations. Non-human organisms have proven instrumental in revealing biological mechanisms. Advanced informatics tools can identify phenotypically relevant disease models in research and diagnostic contexts. Large-scale integration of model organism and clinical research data can provide a breadth of knowledge not available from individual sources and can provide contextualization of data back to these sources. The Monarch Initiative (monarchinitiative.org) is a collaborative, open science effort that aims to semantically integrate genotype–phenotype data from many species and sources in order to support precision medicine, disease modeling, and mechanistic exploration. Our integrated knowledge graph, analytic tools, and web services enable diverse users to explore relationships between phenotypes and genotypes across species.
Collapse
Affiliation(s)
- Christopher J Mungall
- Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
| | - Julie A McMurry
- Department of Medical Informatics and Clinical Epidemiology and OHSU Library, Oregon Health & Science University, Portland, OR, 97239, USA
| | - Sebastian Köhler
- Institute for Medical Genetics and Human Genetics, Charité-Universitätsmedizin Berlin, Augustenburger Platz 1, 13353 Berlin, Germany
| | | | - Charles Borromeo
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA, 15260, USA
| | - Matthew Brush
- Department of Medical Informatics and Clinical Epidemiology and OHSU Library, Oregon Health & Science University, Portland, OR, 97239, USA
| | - Seth Carbon
- Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
| | - Tom Conlin
- Department of Medical Informatics and Clinical Epidemiology and OHSU Library, Oregon Health & Science University, Portland, OR, 97239, USA
| | - Nathan Dunn
- Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
| | - Mark Engelstad
- Department of Medical Informatics and Clinical Epidemiology and OHSU Library, Oregon Health & Science University, Portland, OR, 97239, USA
| | - Erin Foster
- Department of Medical Informatics and Clinical Epidemiology and OHSU Library, Oregon Health & Science University, Portland, OR, 97239, USA
| | - J P Gourdine
- Department of Medical Informatics and Clinical Epidemiology and OHSU Library, Oregon Health & Science University, Portland, OR, 97239, USA
| | - Julius O B Jacobsen
- William Harvey Research Institute, Barts & The London School of Medicine & Dentistry, Queen Mary University of London, Charterhouse Square, London EC1M 6BQ, UK
| | - Dan Keith
- Department of Medical Informatics and Clinical Epidemiology and OHSU Library, Oregon Health & Science University, Portland, OR, 97239, USA
| | - Bryan Laraway
- Department of Medical Informatics and Clinical Epidemiology and OHSU Library, Oregon Health & Science University, Portland, OR, 97239, USA
| | - Suzanna E Lewis
- Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
| | - Jeremy NguyenXuan
- Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
| | - Kent Shefchek
- Department of Medical Informatics and Clinical Epidemiology and OHSU Library, Oregon Health & Science University, Portland, OR, 97239, USA
| | - Nicole Vasilevsky
- Department of Medical Informatics and Clinical Epidemiology and OHSU Library, Oregon Health & Science University, Portland, OR, 97239, USA
| | - Zhou Yuan
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA, 15260, USA
| | - Nicole Washington
- Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
| | - Harry Hochheiser
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA, 15260, USA
| | - Tudor Groza
- Kinghorn Centre for Clinical Genomics, Garvan Institute of Medical Research, Darlinghurst, NSW 2010, Australia
| | - Damian Smedley
- William Harvey Research Institute, Barts & The London School of Medicine & Dentistry, Queen Mary University of London, Charterhouse Square, London EC1M 6BQ, UK
| | - Peter N Robinson
- Institute for Medical Genetics and Human Genetics, Charité-Universitätsmedizin Berlin, Augustenburger Platz 1, 13353 Berlin, Germany.,The Jackson Laboratory for Genomic Medicine, Farmington, CT, 06032mUSA
| | - Melissa A Haendel
- Department of Medical Informatics and Clinical Epidemiology and OHSU Library, Oregon Health & Science University, Portland, OR, 97239, USA
| |
Collapse
|
24
|
Köhler S, Vasilevsky NA, Engelstad M, Foster E, McMurry J, Aymé S, Baynam G, Bello SM, Boerkoel CF, Boycott KM, Brudno M, Buske OJ, Chinnery PF, Cipriani V, Connell LE, Dawkins HJS, DeMare LE, Devereau AD, de Vries BBA, Firth HV, Freson K, Greene D, Hamosh A, Helbig I, Hum C, Jähn JA, James R, Krause R, F Laulederkind SJ, Lochmüller H, Lyon GJ, Ogishima S, Olry A, Ouwehand WH, Pontikos N, Rath A, Schaefer F, Scott RH, Segal M, Sergouniotis PI, Sever R, Smith CL, Straub V, Thompson R, Turner C, Turro E, Veltman MWM, Vulliamy T, Yu J, von Ziegenweidt J, Zankl A, Züchner S, Zemojtel T, Jacobsen JOB, Groza T, Smedley D, Mungall CJ, Haendel M, Robinson PN. The Human Phenotype Ontology in 2017. Nucleic Acids Res 2016; 45:D865-D876. [PMID: 27899602 PMCID: PMC5210535 DOI: 10.1093/nar/gkw1039] [Citation(s) in RCA: 509] [Impact Index Per Article: 56.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2016] [Accepted: 10/28/2016] [Indexed: 12/14/2022] Open
Abstract
Deep phenotyping has been defined as the precise and comprehensive analysis of phenotypic abnormalities in which the individual components of the phenotype are observed and described. The three components of the Human Phenotype Ontology (HPO; www.human-phenotype-ontology.org) project are the phenotype vocabulary, disease-phenotype annotations and the algorithms that operate on these. These components are being used for computational deep phenotyping and precision medicine as well as integration of clinical data into translational research. The HPO is being increasingly adopted as a standard for phenotypic abnormalities by diverse groups such as international rare disease organizations, registries, clinical labs, biomedical resources, and clinical software tools and will thereby contribute toward nascent efforts at global data exchange for identifying disease etiologies. This update article reviews the progress of the HPO project since the debut Nucleic Acids Research database article in 2014, including specific areas of expansion such as common (complex) disease, new algorithms for phenotype driven genomic discovery and diagnostics, integration of cross-species mapping efforts with the Mammalian Phenotype Ontology, an improved quality control pipeline, and the addition of patient-friendly terminology.
Collapse
Affiliation(s)
- Sebastian Köhler
- Institute for Medical Genetics and Human Genetics, Charité-Universitätsmedizin Berlin, Augustenburger Platz 1, 13353 Berlin, Germany
| | - Nicole A Vasilevsky
- Library and Department of Medical Informatics and Clinical Epidemiology, Oregon Health & Science University, Portland, OR 97239, USA
| | - Mark Engelstad
- Library and Department of Medical Informatics and Clinical Epidemiology, Oregon Health & Science University, Portland, OR 97239, USA
| | - Erin Foster
- Library and Department of Medical Informatics and Clinical Epidemiology, Oregon Health & Science University, Portland, OR 97239, USA
| | - Julie McMurry
- Library and Department of Medical Informatics and Clinical Epidemiology, Oregon Health & Science University, Portland, OR 97239, USA
| | - Ségolène Aymé
- Institut du Cerveau et de la Moelle épinière-ICM, CNRS UMR 7225-Inserm U 1127-UPMC-P6 UMR S 1127, Hôpital Pitié-Salpêtrière, 47, bd de l'Hôpital, 75013 Paris, France
| | - Gareth Baynam
- Western Australian Register of Developmental Anomalies and Genetic Services of Western Australia, King Edward Memorial Hospital Department of Health, Government of Western Australia, Perth, WA 6008, Australia.,School of Paediatrics and Child Health, University of Western Australia, Perth, WA 6008, Australia
| | - Susan M Bello
- The Jackson Laboratory, 600 Main St, Bar Harbor, ME 04609, USA
| | - Cornelius F Boerkoel
- Imagenetics Research, Sanford Health, PO Box 5039, Route 5001, Sioux Falls, SD 57117-5039, USA
| | - Kym M Boycott
- Children's Hospital of Eastern Ontario Research Institute, University of Ottawa, Ottawa, Ontario, Canada
| | - Michael Brudno
- Department of Computer Science, University of Toronto, Toronto, ON M5S 2E4, Canada Centre for Computational Medicine, Hospital for Sick Children, Toronto, ON M5G 1L7, Canada
| | - Orion J Buske
- Department of Computer Science, University of Toronto, Toronto, ON M5S 2E4, Canada Centre for Computational Medicine, Hospital for Sick Children, Toronto, ON M5G 1L7, Canada
| | - Patrick F Chinnery
- Department of Clinical Neurosciences, School of Clinical Medicine, University of Cambridge, Cambridge CB2 0QQ, UK.,NIHR Rare Diseases Translational Research Collaboration, Cambridge Biomedical Campus, Cambridge CB2 0QQ, UK
| | - Valentina Cipriani
- UCL Institute of Ophthalmology, Department of Ocular Biology and Therapeutics, 11-43 Bath Street, London EC1V 9EL, UK.,UCL Genetics Institute, University College London, London WC1E 6BT, UK
| | | | - Hugh J S Dawkins
- Office of Population Health Genomics, Public Health Division, Health Department of Western Australia, 189 Royal Street, Perth, WA, 6004 Australia
| | - Laura E DeMare
- Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY, USA
| | - Andrew D Devereau
- Genomics England, Queen Mary University of London, Dawson Hall, Charterhouse Square, London EC1M 6BQ, UK
| | - Bert B A de Vries
- Department of Human Genetics, Radboud University, University Medical Centre, Nijmegen, The Netherlands
| | - Helen V Firth
- Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK
| | - Kathleen Freson
- Department of Cardiovascular Sciences, Center for Molecular and Vascular Biology, University of Leuven, Leuven, Belgium
| | - Daniel Greene
- Department of Haematology, University of Cambridge, NHS Blood and Transplant Centre, Long Road, Cambridge CB2 0PT, UK.,Medical Research Council Biostatistics Unit, Cambridge Institute of Public Health, Cambridge Biomedical Campus, Cambridge, UK
| | - Ada Hamosh
- McKusick-Nathans Institute of Genetic Medicine, Department of Pediatrics, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Ingo Helbig
- Division of Neurology, The Children's Hospital of Philadelphia, 3501 Civic Center Blvd, Philadelphia, PA 19104, USA.,Department of Neuropediatrics, University Medical Center Schleswig-Holstein (UKSH), Kiel, Germany
| | - Courtney Hum
- Centre for Computational Medicine, The Hospital for Sick Children, Toronto, ON M5G 1H3, Canada
| | - Johanna A Jähn
- Department of Neuropediatrics, University Medical Center Schleswig-Holstein (UKSH), Kiel, Germany
| | - Roger James
- NIHR Rare Diseases Translational Research Collaboration, Cambridge Biomedical Campus, Cambridge CB2 0QQ, UK.,Medical Research Council Biostatistics Unit, Cambridge Institute of Public Health, Cambridge Biomedical Campus, Cambridge, UK
| | - Roland Krause
- LuxembourgCentre for Systems Biomedicine, University of Luxembourg, 7, avenue des Hauts-Fourneaux, L-4362 Esch-sur-Alzette, Luxembourg
| | | | - Hanns Lochmüller
- John Walton Muscular Dystrophy Research Centre, MRC Centre for Neuromuscular Diseases, Institute of Genetic Medicine, University of Newcastle, Newcastle upon Tyne, UK
| | - Gholson J Lyon
- Stanley Institute for Cognitive Genomics, Cold Spring Harbor Laboratory, New York, NY 11797, USA
| | - Soichi Ogishima
- Dept of Bioclinical Informatics, Tohoku Medical Megabank Organization, Tohoku University, Tohoku Medical Megabank Organization Bldg 7F room #741,736, Seiryo 2-1, Aoba-ku, Sendai Miyagi 980-8573 Japan
| | - Annie Olry
- Orphanet-INSERM, US14, Plateforme Maladies Rares, 96 rue Didot, 75014 Paris, France
| | - Willem H Ouwehand
- Medical Research Council Biostatistics Unit, Cambridge Institute of Public Health, Cambridge Biomedical Campus, Cambridge, UK
| | - Nikolas Pontikos
- UCL Institute of Ophthalmology, Department of Ocular Biology and Therapeutics, 11-43 Bath Street, London EC1V 9EL, UK.,UCL Genetics Institute, University College London, London WC1E 6BT, UK
| | - Ana Rath
- Orphanet-INSERM, US14, Plateforme Maladies Rares, 96 rue Didot, 75014 Paris, France
| | - Franz Schaefer
- Division of Pediatric Nephrology and KFH Children's Kidney Center, Center for Pediatrics and Adolescent Medicine, 69120 Heidelberg, Germany
| | - Richard H Scott
- Genomics England, Queen Mary University of London, Dawson Hall, Charterhouse Square, London EC1M 6BQ, UK
| | - Michael Segal
- SimulConsult Inc., 27 Crafts Road, Chestnut Hill, MA 02467, USA
| | | | - Richard Sever
- Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY, USA
| | - Cynthia L Smith
- The Jackson Laboratory, 600 Main St, Bar Harbor, ME 04609, USA
| | - Volker Straub
- John Walton Muscular Dystrophy Research Centre, MRC Centre for Neuromuscular Diseases, Institute of Genetic Medicine, University of Newcastle, Newcastle upon Tyne, UK
| | - Rachel Thompson
- John Walton Muscular Dystrophy Research Centre, MRC Centre for Neuromuscular Diseases, Institute of Genetic Medicine, University of Newcastle, Newcastle upon Tyne, UK
| | - Catherine Turner
- John Walton Muscular Dystrophy Research Centre, MRC Centre for Neuromuscular Diseases, Institute of Genetic Medicine, University of Newcastle, Newcastle upon Tyne, UK
| | - Ernest Turro
- Department of Haematology, University of Cambridge, NHS Blood and Transplant Centre, Long Road, Cambridge CB2 0PT, UK.,Medical Research Council Biostatistics Unit, Cambridge Institute of Public Health, Cambridge Biomedical Campus, Cambridge, UK
| | - Marijcke W M Veltman
- NIHR Rare Diseases Translational Research Collaboration, Cambridge Biomedical Campus, Cambridge CB2 0QQ, UK
| | - Tom Vulliamy
- Blizard Institute, Barts and The London School of Medicine and Dentistry, Queen Mary University of London, London E1 2AT, UK
| | - Jing Yu
- Nuffield Department of Clinical Neurosciences, University of Oxford, Level 6, West Wing, John Radcliffe Hospital, Oxford OX3 9DU, UK
| | - Julie von Ziegenweidt
- Department of Haematology, University of Cambridge, NHS Blood and Transplant Centre, Long Road, Cambridge CB2 0PT, UK
| | - Andreas Zankl
- Discipline of Genetic Medicine, Sydney Medical School, The University of Sydney, Australia.,Academic Department of Medical Genetics, Sydney Childrens Hospitals Network (Westmead), Australia
| | - Stephan Züchner
- JD McDonald Department of Human Genetics and Hussman Institute for Human Genomics, University of Miami, Miami, FL, USA
| | - Tomasz Zemojtel
- Institute for Medical Genetics and Human Genetics, Charité-Universitätsmedizin Berlin, Augustenburger Platz 1, 13353 Berlin, Germany
| | - Julius O B Jacobsen
- Genomics England, Queen Mary University of London, Dawson Hall, Charterhouse Square, London EC1M 6BQ, UK
| | - Tudor Groza
- Garvan Institute of Medical Research, Darlinghurst, Sydney, NSW 2010, Australia.,St Vincent's Clinical School, Faculty of Medicine, UNSW Australia
| | - Damian Smedley
- Genomics England, Queen Mary University of London, Dawson Hall, Charterhouse Square, London EC1M 6BQ, UK
| | - Christopher J Mungall
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, 1 Cyclotron Road, Berkeley, CA 94720, USA
| | - Melissa Haendel
- Library and Department of Medical Informatics and Clinical Epidemiology, Oregon Health & Science University, Portland, OR 97239, USA
| | - Peter N Robinson
- The Jackson Laboratory for Genomic Medicine, 10 Discovery Drive, Farmington, CT 06032, USA .,Institute for Systems Genomics, University of Connecticut, Farmington, CT 06032, USA
| |
Collapse
|
25
|
Hoehndorf R, Alshahrani M, Gkoutos GV, Gosline G, Groom Q, Hamann T, Kattge J, de Oliveira SM, Schmidt M, Sierra S, Smets E, Vos RA, Weiland C. The flora phenotype ontology (FLOPO): tool for integrating morphological traits and phenotypes of vascular plants. J Biomed Semantics 2016; 7:65. [PMID: 27842607 PMCID: PMC5109718 DOI: 10.1186/s13326-016-0107-8] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2015] [Accepted: 11/01/2016] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The systematic analysis of a large number of comparable plant trait data can support investigations into phylogenetics and ecological adaptation, with broad applications in evolutionary biology, agriculture, conservation, and the functioning of ecosystems. Floras, i.e., books collecting the information on all known plant species found within a region, are a potentially rich source of such plant trait data. Floras describe plant traits with a focus on morphology and other traits relevant for species identification in addition to other characteristics of plant species, such as ecological affinities, distribution, economic value, health applications, traditional uses, and so on. However, a key limitation in systematically analyzing information in Floras is the lack of a standardized vocabulary for the described traits as well as the difficulties in extracting structured information from free text. RESULTS We have developed the Flora Phenotype Ontology (FLOPO), an ontology for describing traits of plant species found in Floras. We used the Plant Ontology (PO) and the Phenotype And Trait Ontology (PATO) to extract entity-quality relationships from digitized taxon descriptions in Floras, and used a formal ontological approach based on phenotype description patterns and automated reasoning to generate the FLOPO. The resulting ontology consists of 25,407 classes and is based on the PO and PATO. The classified ontology closely follows the structure of Plant Ontology in that the primary axis of classification is the observed plant anatomical structure, and more specific traits are then classified based on parthood and subclass relations between anatomical structures as well as subclass relations between phenotypic qualities. CONCLUSIONS The FLOPO is primarily intended as a framework based on which plant traits can be integrated computationally across all species and higher taxa of flowering plants. Importantly, it is not intended to replace established vocabularies or ontologies, but rather serve as an overarching framework based on which different application- and domain-specific ontologies, thesauri and vocabularies of phenotypes observed in flowering plants can be integrated.
Collapse
Affiliation(s)
- Robert Hoehndorf
- Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology, 4700 KAUST, Thuwal, 23955–6900 Kingdom of Saudi Arabia
- Computer, Electrical and Mathematical Sciences & Engineering Division (CEMSE), King Abdullah University of Science and Technology, 4700 KAUST, Thuwal, 23955–6900 Kingdom of Saudi Arabia
| | - Mona Alshahrani
- Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology, 4700 KAUST, Thuwal, 23955–6900 Kingdom of Saudi Arabia
- Computer, Electrical and Mathematical Sciences & Engineering Division (CEMSE), King Abdullah University of Science and Technology, 4700 KAUST, Thuwal, 23955–6900 Kingdom of Saudi Arabia
| | - Georgios V. Gkoutos
- College of Medical and Dental Sciences, Institute of Cancer and Genomic Sciences, Centre for Computational Biology, University of Birmingham, Birmingham, B15 2TT United Kingdom
- Institute of Translational Medicine, University Hospitals Birmingham, NHS Foundation Trust, Birmingham, B15 2TT United Kingdom
- Institute of Biological, Environmental and Rural Sciences, Aberystwyth University, Aberystwyth, SY23 2AX United Kingdom
| | - George Gosline
- Royal Botanical Gardens, Kew, Richmond, Surrey, TW9 3AB United Kingdom
| | - Quentin Groom
- Botanic Garden Meise, Nieuwelaan 38, Meise, 1860 Belgium
| | - Thomas Hamann
- Naturalis Biodiversity Center, P.O. Box 9517, Leiden, 2300 RA The Netherlands
| | - Jens Kattge
- Max Planck Institute for Biogeochemistry, Hans Knoell Str. 10, Jena, 07745 Germany
- German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig, Deutscher Platz 5e, Leipzig, 04103 Germany
| | | | - Marco Schmidt
- Senckenberg Biodiversity and Climate Research Centre (BiK-F), Senckenberganlage 25, Frankfurt am Main, 60325 Germany
| | - Soraya Sierra
- Naturalis Biodiversity Center, P.O. Box 9517, Leiden, 2300 RA The Netherlands
| | - Erik Smets
- Naturalis Biodiversity Center, P.O. Box 9517, Leiden, 2300 RA The Netherlands
| | - Rutger A. Vos
- Naturalis Biodiversity Center, P.O. Box 9517, Leiden, 2300 RA The Netherlands
| | - Claus Weiland
- Senckenberg Biodiversity and Climate Research Centre (BiK-F), Senckenberganlage 25, Frankfurt am Main, 60325 Germany
| |
Collapse
|
26
|
Soul J, Dunn SL, Hardingham TE, Boot-Handford RP, Schwartz JM. PhenomeScape: a cytoscape app to identify differentially regulated sub-networks using known disease associations. Bioinformatics 2016; 32:3847-3849. [PMID: 27559157 PMCID: PMC5167065 DOI: 10.1093/bioinformatics/btw545] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2016] [Revised: 07/29/2016] [Accepted: 08/15/2016] [Indexed: 01/12/2023] Open
Abstract
Summary: PhenomeScape is a Cytoscape app which provides easy access to the PhenomeExpress algorithm to interpret gene expression data. PhenomeExpress integrates protein interaction networks with known phenotype to gene associations to find active sub-networks enriched in differentially expressed genes. It also incorporates cross-species phenotypes and associations to include results from animal models of disease. With expression data imported into PhenomeScape, the user can quickly generate and visualise interactive sub-networks. PhenomeScape thus enables researchers to use prior knowledge of a disease to identify differentially regulated sub-networks and to generate an overview of altered biologically processes specific to that disease. Availability and Implementation: Freely available for download at https://github.com/soulj/PhenomeScape Contact:jamie.soul@postgrad.manchester.ac.uk or jean-marc.schwartz@manchester.ac.uk
Collapse
Affiliation(s)
- Jamie Soul
- Faculty of Biology, Medicine and Health, University of Manchester, Manchester M13 9PT, UK
| | - Sara L Dunn
- Faculty of Biology, Medicine and Health, University of Manchester, Manchester M13 9PT, UK
| | - Tim E Hardingham
- Faculty of Biology, Medicine and Health, University of Manchester, Manchester M13 9PT, UK
| | - Ray P Boot-Handford
- Faculty of Biology, Medicine and Health, University of Manchester, Manchester M13 9PT, UK
| | - Jean-Marc Schwartz
- Faculty of Biology, Medicine and Health, University of Manchester, Manchester M13 9PT, UK
| |
Collapse
|
27
|
Wang Z, Clark NR, Ma'ayan A. Drug-induced adverse events prediction with the LINCS L1000 data. Bioinformatics 2016; 32:2338-45. [PMID: 27153606 PMCID: PMC4965635 DOI: 10.1093/bioinformatics/btw168] [Citation(s) in RCA: 107] [Impact Index Per Article: 11.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2015] [Revised: 03/05/2016] [Accepted: 03/23/2016] [Indexed: 01/22/2023] Open
Abstract
MOTIVATION Adverse drug reactions (ADRs) are a central consideration during drug development. Here we present a machine learning classifier to prioritize ADRs for approved drugs and pre-clinical small-molecule compounds by combining chemical structure (CS) and gene expression (GE) features. The GE data is from the Library of Integrated Network-based Cellular Signatures (LINCS) L1000 dataset that measured changes in GE before and after treatment of human cells with over 20 000 small-molecule compounds including most of the FDA-approved drugs. Using various benchmarking methods, we show that the integration of GE data with the CS of the drugs can significantly improve the predictability of ADRs. Moreover, transforming GE features to enrichment vectors of biological terms further improves the predictive capability of the classifiers. The most predictive biological-term features can assist in understanding the drug mechanisms of action. Finally, we applied the classifier to all >20 000 small-molecules profiled, and developed a web portal for browsing and searching predictive small-molecule/ADR connections. AVAILABILITY AND IMPLEMENTATION The interface for the adverse event predictions for the >20 000 LINCS compounds is available at http://maayanlab.net/SEP-L1000/ CONTACT: avi.maayan@mssm.edu SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Zichen Wang
- Department of Pharmacology and Systems Therapeutics, One Gustave L. Levy Place Box 1215, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Neil R Clark
- Department of Pharmacology and Systems Therapeutics, One Gustave L. Levy Place Box 1215, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Avi Ma'ayan
- Department of Pharmacology and Systems Therapeutics, One Gustave L. Levy Place Box 1215, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| |
Collapse
|
28
|
Bone WP, Washington NL, Buske OJ, Adams DR, Davis J, Draper D, Flynn ED, Girdea M, Godfrey R, Golas G, Groden C, Jacobsen J, Köhler S, Lee EMJ, Links AE, Markello TC, Mungall CJ, Nehrebecky M, Robinson PN, Sincan M, Soldatos AG, Tifft CJ, Toro C, Trang H, Valkanas E, Vasilevsky N, Wahl C, Wolfe LA, Boerkoel CF, Brudno M, Haendel MA, Gahl WA, Smedley D. Computational evaluation of exome sequence data using human and model organism phenotypes improves diagnostic efficiency. Genet Med 2016; 18:608-17. [PMID: 26562225 PMCID: PMC4916229 DOI: 10.1038/gim.2015.137] [Citation(s) in RCA: 73] [Impact Index Per Article: 8.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2015] [Accepted: 08/27/2015] [Indexed: 01/18/2023] Open
Abstract
PURPOSE Medical diagnosis and molecular or biochemical confirmation typically rely on the knowledge of the clinician. Although this is very difficult in extremely rare diseases, we hypothesized that the recording of patient phenotypes in Human Phenotype Ontology (HPO) terms and computationally ranking putative disease-associated sequence variants improves diagnosis, particularly for patients with atypical clinical profiles. METHODS Using simulated exomes and the National Institutes of Health Undiagnosed Diseases Program (UDP) patient cohort and associated exome sequence, we tested our hypothesis using Exomiser. Exomiser ranks candidate variants based on patient phenotype similarity to (i) known disease-gene phenotypes, (ii) model organism phenotypes of candidate orthologs, and (iii) phenotypes of protein-protein association neighbors. RESULTS Benchmarking showed Exomiser ranked the causal variant as the top hit in 97% of known disease-gene associations and ranked the correct seeded variant in up to 87% when detectable disease-gene associations were unavailable. Using UDP data, Exomiser ranked the causative variant(s) within the top 10 variants for 11 previously diagnosed variants and achieved a diagnosis for 4 of 23 cases undiagnosed by clinical evaluation. CONCLUSION Structured phenotyping of patients and computational analysis are effective adjuncts for diagnosing patients with genetic disorders.Genet Med 18 6, 608-617.
Collapse
Affiliation(s)
- William P. Bone
- Undiagnosed Diseases Program, Common Fund, Office of the Director, National Institutes of Health, Bethesda, Maryland, USA
| | - Nicole L. Washington
- Genomics Division, Lawrence Berkeley National Laboratory, Berkeley, California, USA
| | - Orion J. Buske
- Centre for Computational Medicine Hospital for Sick Children, Toronto, Ontario, Canada
- Department of Computer Science, University of Toronto, Toronto, Ontario, Canada
| | - David R. Adams
- Undiagnosed Diseases Program, Common Fund, Office of the Director, National Institutes of Health, Bethesda, Maryland, USA
- Medical Genetics Branch, National Human Genome Research Institute, Bethesda, Maryland, USA
| | - Joie Davis
- Undiagnosed Diseases Program, Common Fund, Office of the Director, National Institutes of Health, Bethesda, Maryland, USA
| | - David Draper
- Undiagnosed Diseases Program, Common Fund, Office of the Director, National Institutes of Health, Bethesda, Maryland, USA
| | - Elise D. Flynn
- Undiagnosed Diseases Program, Common Fund, Office of the Director, National Institutes of Health, Bethesda, Maryland, USA
| | - Marta Girdea
- Centre for Computational Medicine Hospital for Sick Children, Toronto, Ontario, Canada
- Department of Computer Science, University of Toronto, Toronto, Ontario, Canada
| | - Rena Godfrey
- Undiagnosed Diseases Program, Common Fund, Office of the Director, National Institutes of Health, Bethesda, Maryland, USA
| | - Gretchen Golas
- Undiagnosed Diseases Program, Common Fund, Office of the Director, National Institutes of Health, Bethesda, Maryland, USA
| | - Catherine Groden
- Undiagnosed Diseases Program, Common Fund, Office of the Director, National Institutes of Health, Bethesda, Maryland, USA
| | - Julius Jacobsen
- Skarnes Faculty group, Wellcome Trust Sanger Institute, Hinxton, UK
| | - Sebastian Köhler
- Institute for Medical Genetics and Human Genetics, Charité-Universitätsmedizin Berlin, Berlin, Germany
| | - Elizabeth M. J. Lee
- Undiagnosed Diseases Program, Common Fund, Office of the Director, National Institutes of Health, Bethesda, Maryland, USA
| | - Amanda E. Links
- Undiagnosed Diseases Program, Common Fund, Office of the Director, National Institutes of Health, Bethesda, Maryland, USA
| | - Thomas C. Markello
- Undiagnosed Diseases Program, Common Fund, Office of the Director, National Institutes of Health, Bethesda, Maryland, USA
| | | | - Michele Nehrebecky
- Undiagnosed Diseases Program, Common Fund, Office of the Director, National Institutes of Health, Bethesda, Maryland, USA
| | - Peter N. Robinson
- Institute for Medical Genetics and Human Genetics, Charité-Universitätsmedizin Berlin, Berlin, Germany
| | - Murat Sincan
- Undiagnosed Diseases Program, Common Fund, Office of the Director, National Institutes of Health, Bethesda, Maryland, USA
| | - Ariane G. Soldatos
- Undiagnosed Diseases Program, Common Fund, Office of the Director, National Institutes of Health, Bethesda, Maryland, USA
| | - Cynthia J. Tifft
- Undiagnosed Diseases Program, Common Fund, Office of the Director, National Institutes of Health, Bethesda, Maryland, USA
- Medical Genetics Branch, National Human Genome Research Institute, Bethesda, Maryland, USA
| | - Camilo Toro
- Undiagnosed Diseases Program, Common Fund, Office of the Director, National Institutes of Health, Bethesda, Maryland, USA
| | - Heather Trang
- Centre for Computational Medicine Hospital for Sick Children, Toronto, Ontario, Canada
- Department of Computer Science, University of Toronto, Toronto, Ontario, Canada
| | - Elise Valkanas
- Undiagnosed Diseases Program, Common Fund, Office of the Director, National Institutes of Health, Bethesda, Maryland, USA
| | - Nicole Vasilevsky
- Library; and Department of Medical Informatics and Epidemiology, Oregon Health & Science University, Portland, Oregon, USA
| | - Colleen Wahl
- Undiagnosed Diseases Program, Common Fund, Office of the Director, National Institutes of Health, Bethesda, Maryland, USA
| | - Lynne A. Wolfe
- Undiagnosed Diseases Program, Common Fund, Office of the Director, National Institutes of Health, Bethesda, Maryland, USA
| | - Cornelius F. Boerkoel
- Undiagnosed Diseases Program, Common Fund, Office of the Director, National Institutes of Health, Bethesda, Maryland, USA
| | - Michael Brudno
- Centre for Computational Medicine Hospital for Sick Children, Toronto, Ontario, Canada
- Department of Computer Science, University of Toronto, Toronto, Ontario, Canada
| | - Melissa A. Haendel
- Library; and Department of Medical Informatics and Epidemiology, Oregon Health & Science University, Portland, Oregon, USA
| | - William A. Gahl
- Undiagnosed Diseases Program, Common Fund, Office of the Director, National Institutes of Health, Bethesda, Maryland, USA
- Medical Genetics Branch, National Human Genome Research Institute, Bethesda, Maryland, USA
| | - Damian Smedley
- Skarnes Faculty group, Wellcome Trust Sanger Institute, Hinxton, UK
| |
Collapse
|
29
|
Jupp S, Malone J, Burdett T, Heriche JK, Williams E, Ellenberg J, Parkinson H, Rustici G. The cellular microscopy phenotype ontology. J Biomed Semantics 2016; 7:28. [PMID: 27195102 PMCID: PMC4870745 DOI: 10.1186/s13326-016-0074-0] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2015] [Accepted: 05/10/2016] [Indexed: 11/17/2022] Open
Abstract
Background Phenotypic data derived from high content screening is currently annotated using free-text, thus preventing the integration of independent datasets, including those generated in different biological domains, such as cell lines, mouse and human tissues. Description We present the Cellular Microscopy Phenotype Ontology (CMPO), a species neutral ontology for describing phenotypic observations relating to the whole cell, cellular components, cellular processes and cell populations. CMPO is compatible with related ontology efforts, allowing for future cross-species integration of phenotypic data. CMPO was developed following a curator-driven approach where phenotype data were annotated by expert biologists following the Entity-Quality (EQ) pattern. These EQs were subsequently transformed into new CMPO terms following an established post composition process. Conclusion CMPO is currently being utilized to annotate phenotypes associated with high content screening datasets stored in several image repositories including the Image Data Repository (IDR), MitoSys project database and the Cellular Phenotype Database to facilitate data browsing and discoverability.
Collapse
Affiliation(s)
- Simon Jupp
- European Bioinformatics Institute (EMBL-EBI), European Molecular Biology Laboratory, Wellcome Trust Genome Campus, Hinxton Cambridge, CB10 1SD UK
| | - James Malone
- European Bioinformatics Institute (EMBL-EBI), European Molecular Biology Laboratory, Wellcome Trust Genome Campus, Hinxton Cambridge, CB10 1SD UK
| | - Tony Burdett
- European Bioinformatics Institute (EMBL-EBI), European Molecular Biology Laboratory, Wellcome Trust Genome Campus, Hinxton Cambridge, CB10 1SD UK
| | - Jean-Karim Heriche
- European Molecular Biology Laboratory, Meyerhofstrasse 1, 69117 Heidelberg, Germany
| | - Eleanor Williams
- Centre for Gene Regulation and Expression, University of Dundee, Dundee, DD1 5EH UK
| | - Jan Ellenberg
- European Molecular Biology Laboratory, Meyerhofstrasse 1, 69117 Heidelberg, Germany
| | - Helen Parkinson
- European Bioinformatics Institute (EMBL-EBI), European Molecular Biology Laboratory, Wellcome Trust Genome Campus, Hinxton Cambridge, CB10 1SD UK
| | - Gabriella Rustici
- European Bioinformatics Institute (EMBL-EBI), European Molecular Biology Laboratory, Wellcome Trust Genome Campus, Hinxton Cambridge, CB10 1SD UK
| |
Collapse
|
30
|
Robinson PN, Mungall CJ, Haendel M. Capturing phenotypes for precision medicine. Cold Spring Harb Mol Case Stud 2016; 1:a000372. [PMID: 27148566 PMCID: PMC4850887 DOI: 10.1101/mcs.a000372] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
Deep phenotyping followed by integrated computational analysis of genotype and phenotype is becoming ever more important for many areas of genomic diagnostics and translational research. The overwhelming majority of clinical descriptions in the medical literature are available only as natural language text, meaning that searching, analysis, and integration of medically relevant information in databases such as PubMed is challenging. The new journal Cold Spring Harbor Molecular Case Studies will require authors to select Human Phenotype Ontology terms for research papers that will be displayed alongside the manuscript, thereby providing a foundation for ontology-based indexing and searching of articles that contain descriptions of phenotypic abnormalities-an important step toward improving the ability of researchers and clinicians to get biomedical information that is critical for clinical care or translational research.
Collapse
Affiliation(s)
- Peter N Robinson
- Institute for Medical Genetics and Human Genetics, Charité-Universitätsmedizin Berlin, 10117 Berlin, Germany;; Max Planck Institute for Molecular Genetics, 14195 Berlin, Germany;; Berlin Brandenburg Center for Regenerative Therapies (BCRT), Charité-Universitätsmedizin Berlin, 13353 Berlin, Germany;; Institute for Bioinformatics, Department of Mathematics and Computer Science, Freie Universität Berlin, 14195 Berlin, Germany
| | | | - Melissa Haendel
- Oregon Health and Science University, Portland, Oregon 97239, USA
| |
Collapse
|
31
|
Greene D, Richardson S, Turro E, Turro E. Phenotype Similarity Regression for Identifying the Genetic Determinants of Rare Diseases. Am J Hum Genet 2016; 98:490-499. [PMID: 26924528 PMCID: PMC4827100 DOI: 10.1016/j.ajhg.2016.01.008] [Citation(s) in RCA: 41] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2015] [Accepted: 01/08/2016] [Indexed: 12/31/2022] Open
Abstract
Rare genetic disorders, which can now be studied systematically with affordable genome sequencing, are often caused by high-penetrance rare variants. Such disorders are often heterogeneous and characterized by abnormalities spanning multiple organ systems ascertained with variable clinical precision. Existing methods for identifying genes with variants responsible for rare diseases summarize phenotypes with unstructured binary or quantitative variables. The Human Phenotype Ontology (HPO) allows composite phenotypes to be represented systematically but association methods accounting for the ontological relationship between HPO terms do not exist. We present a Bayesian method to model the association between an HPO-coded patient phenotype and genotype. Our method estimates the probability of an association together with an HPO-coded phenotype characteristic of the disease. We thus formalize a clinical approach to phenotyping that is lacking in standard regression techniques for rare disease research. We demonstrate the power of our method by uncovering a number of true associations in a large collection of genome-sequenced and HPO-coded cases with rare diseases.
Collapse
Affiliation(s)
| | | | | | - Ernest Turro
- Department of Haematology, University of Cambridge, NHS Blood and Transplant, Cambridge Biomedical Campus, Cambridge CB2 0PT, UK; Medical Research Council Biostatistics Unit, Cambridge Biomedical Campus, Cambridge CB2 0SR, UK.
| |
Collapse
|
32
|
Turro E, Greene D, Wijgaerts A, Thys C, Lentaigne C, Bariana TK, Westbury SK, Kelly AM, Selleslag D, Stephens JC, Papadia S, Simeoni I, Penkett CJ, Ashford S, Attwood A, Austin S, Bakchoul T, Collins P, Deevi SVV, Favier R, Kostadima M, Lambert MP, Mathias M, Millar CM, Peerlinck K, Perry DJ, Schulman S, Whitehorn D, Wittevrongel C, De Maeyer M, Rendon A, Gomez K, Erber WN, Mumford AD, Nurden P, Stirrups K, Bradley JR, Raymond FL, Laffan MA, Van Geet C, Richardson S, Freson K, Ouwehand WH. A dominant gain-of-function mutation in universal tyrosine kinase SRC causes thrombocytopenia, myelofibrosis, bleeding, and bone pathologies. Sci Transl Med 2016; 8:328ra30. [PMID: 26936507 DOI: 10.1126/scitranslmed.aad7666] [Citation(s) in RCA: 74] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2015] [Accepted: 01/21/2016] [Indexed: 12/14/2022]
Abstract
The Src family kinase (SFK) member SRC is a major target in drug development because it is activated in many human cancers, yet deleterious SRC germline mutations have not been reported. We used genome sequencing and Human Phenotype Ontology patient coding to identify a gain-of-function mutation in SRC causing thrombocytopenia, myelofibrosis, bleeding, and bone pathologies in nine cases. Modeling of the E527K substitution predicts loss of SRC's self-inhibitory capacity, which we confirmed with in vitro studies showing increased SRC kinase activity and enhanced Tyr(419) phosphorylation in COS-7 cells overexpressing E527K SRC. The active form of SRC predominates in patients' platelets, resulting in enhanced overall tyrosine phosphorylation. Patients with myelofibrosis have hypercellular bone marrow with trilineage dysplasia, and their stem cells grown in vitro form more myeloid and megakaryocyte (MK) colonies than control cells. These MKs generate platelets that are dysmorphic, low in number, highly variable in size, and have a paucity of α-granules. Overactive SRC in patient-derived MKs causes a reduction in proplatelet formation, which can be rescued by SRC kinase inhibition. Stem cells transduced with lentiviral E527K SRC form MKs with a similar defect and enhanced tyrosine phosphorylation levels. Patient-derived and E527K-transduced MKs show Y419 SRC-positive stained podosomes that induce altered actin organization. Expression of mutated src in zebrafish recapitulates patients' blood and bone phenotypes. Similar studies of platelets and MKs may reveal the mechanism underlying the severe bleeding frequently observed in cancer patients treated with next-generation SFK inhibitors.
Collapse
Affiliation(s)
- Ernest Turro
- Department of Haematology, University of Cambridge, Cambridge Biomedical Campus, Cambridge CB2 0PT, UK. National Health Service (NHS) Blood and Transplant, Cambridge Biomedical Campus, Cambridge CB2 0PT, UK. Medical Research Council Biostatistics Unit, Cambridge Institute of Public Health, Cambridge Biomedical Campus, Cambridge CB2 0SR, UK. National Institute for Health Research (NIHR) BioResource-Rare Diseases, Cambridge University Hospitals, Cambridge Biomedical Campus, Cambridge CB2 0PT, UK
| | - Daniel Greene
- Department of Haematology, University of Cambridge, Cambridge Biomedical Campus, Cambridge CB2 0PT, UK. Medical Research Council Biostatistics Unit, Cambridge Institute of Public Health, Cambridge Biomedical Campus, Cambridge CB2 0SR, UK. National Institute for Health Research (NIHR) BioResource-Rare Diseases, Cambridge University Hospitals, Cambridge Biomedical Campus, Cambridge CB2 0PT, UK
| | - Anouck Wijgaerts
- Department of Cardiovascular Sciences, Center for Molecular and Vascular Biology, University of Leuven, 3000 Leuven, Belgium
| | - Chantal Thys
- Department of Cardiovascular Sciences, Center for Molecular and Vascular Biology, University of Leuven, 3000 Leuven, Belgium
| | - Claire Lentaigne
- Centre for Haematology, Hammersmith Campus, Imperial College Academic Health Sciences Centre, Imperial College London, London W12 0HS, UK. Imperial College Healthcare NHS Trust, Du Cane Road, London W12 0HS, UK
| | - Tadbir K Bariana
- Department of Haematology, University College London Cancer Institute, London WC1E 6BT, UK. Katharine Dormandy Haemophilia Centre and Thrombosis Unit, Royal Free London NHS Foundation Trust, London NW3 2QG, UK
| | - Sarah K Westbury
- School of Clinical Sciences, University of Bristol, Bristol BS2 8DZ, UK
| | - Anne M Kelly
- Department of Haematology, University of Cambridge, Cambridge Biomedical Campus, Cambridge CB2 0PT, UK. National Health Service (NHS) Blood and Transplant, Cambridge Biomedical Campus, Cambridge CB2 0PT, UK
| | - Dominik Selleslag
- Academisch Ziekenhuis Sint-Jan Brugge-Oostende, 8000 Brugge, Belgium
| | - Jonathan C Stephens
- Department of Haematology, University of Cambridge, Cambridge Biomedical Campus, Cambridge CB2 0PT, UK. National Health Service (NHS) Blood and Transplant, Cambridge Biomedical Campus, Cambridge CB2 0PT, UK. National Institute for Health Research (NIHR) BioResource-Rare Diseases, Cambridge University Hospitals, Cambridge Biomedical Campus, Cambridge CB2 0PT, UK
| | - Sofia Papadia
- Department of Haematology, University of Cambridge, Cambridge Biomedical Campus, Cambridge CB2 0PT, UK. National Institute for Health Research (NIHR) BioResource-Rare Diseases, Cambridge University Hospitals, Cambridge Biomedical Campus, Cambridge CB2 0PT, UK
| | - Ilenia Simeoni
- Department of Haematology, University of Cambridge, Cambridge Biomedical Campus, Cambridge CB2 0PT, UK. National Institute for Health Research (NIHR) BioResource-Rare Diseases, Cambridge University Hospitals, Cambridge Biomedical Campus, Cambridge CB2 0PT, UK
| | - Christopher J Penkett
- Department of Haematology, University of Cambridge, Cambridge Biomedical Campus, Cambridge CB2 0PT, UK. National Institute for Health Research (NIHR) BioResource-Rare Diseases, Cambridge University Hospitals, Cambridge Biomedical Campus, Cambridge CB2 0PT, UK
| | - Sofie Ashford
- Department of Haematology, University of Cambridge, Cambridge Biomedical Campus, Cambridge CB2 0PT, UK. National Institute for Health Research (NIHR) BioResource-Rare Diseases, Cambridge University Hospitals, Cambridge Biomedical Campus, Cambridge CB2 0PT, UK
| | - Antony Attwood
- Department of Haematology, University of Cambridge, Cambridge Biomedical Campus, Cambridge CB2 0PT, UK. National Health Service (NHS) Blood and Transplant, Cambridge Biomedical Campus, Cambridge CB2 0PT, UK. National Institute for Health Research (NIHR) BioResource-Rare Diseases, Cambridge University Hospitals, Cambridge Biomedical Campus, Cambridge CB2 0PT, UK
| | - Steve Austin
- Department of Haematology, Guy's and St Thomas' NHS Foundation Trust, London SE1 7EH, UK
| | - Tamam Bakchoul
- Institute for Immunology and Transfusion Medicine, Universitätsmedizin Greifswald, 17475 Greifswald, Germany
| | - Peter Collins
- Arthur Bloom Haemophilia Centre, Institute of Infection and Immunity, School of Medicine, Cardiff University, Cardiff CF14 4XN, UK
| | - Sri V V Deevi
- Department of Haematology, University of Cambridge, Cambridge Biomedical Campus, Cambridge CB2 0PT, UK. National Institute for Health Research (NIHR) BioResource-Rare Diseases, Cambridge University Hospitals, Cambridge Biomedical Campus, Cambridge CB2 0PT, UK
| | - Rémi Favier
- Assistance Publique-Hôpitaux de Paris, Armand Trousseau Children Hospital, 75012 Paris, France. INSERM U1170, 94805 Villejuif, France
| | - Myrto Kostadima
- Department of Haematology, University of Cambridge, Cambridge Biomedical Campus, Cambridge CB2 0PT, UK. National Health Service (NHS) Blood and Transplant, Cambridge Biomedical Campus, Cambridge CB2 0PT, UK
| | - Michele P Lambert
- Division of Hematology, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA. Department of Pediatrics, Perelman School of Medicine at the University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Mary Mathias
- Department of Haematology, Great Ormond Street Hospital for Children NHS Foundation Trust, London WC1N 3JH, UK
| | - Carolyn M Millar
- Centre for Haematology, Hammersmith Campus, Imperial College Academic Health Sciences Centre, Imperial College London, London W12 0HS, UK. Imperial College Healthcare NHS Trust, Du Cane Road, London W12 0HS, UK
| | - Kathelijne Peerlinck
- Department of Cardiovascular Sciences, Center for Molecular and Vascular Biology, University of Leuven, 3000 Leuven, Belgium
| | - David J Perry
- Department of Haematology, Addenbrooke's Hospital, Cambridge University Hospitals NHS Foundation Trust, Cambridge Biomedical Campus, Cambridge CB2 0QQ, UK
| | - Sol Schulman
- Beth Israel Deaconess Medical Centre, Harvard Medical School, Boston, MA 02215, USA
| | - Deborah Whitehorn
- Department of Haematology, University of Cambridge, Cambridge Biomedical Campus, Cambridge CB2 0PT, UK. National Health Service (NHS) Blood and Transplant, Cambridge Biomedical Campus, Cambridge CB2 0PT, UK
| | - Christine Wittevrongel
- Department of Cardiovascular Sciences, Center for Molecular and Vascular Biology, University of Leuven, 3000 Leuven, Belgium
| | | | - Marc De Maeyer
- Biochemistry, Molecular and Structural Biology Section, University of Leuven, 3001 Leuven, Belgium
| | - Augusto Rendon
- Department of Haematology, University of Cambridge, Cambridge Biomedical Campus, Cambridge CB2 0PT, UK. Genomics England Ltd., London EC1M 6BQ, UK
| | - Keith Gomez
- Department of Haematology, University College London Cancer Institute, London WC1E 6BT, UK. Katharine Dormandy Haemophilia Centre and Thrombosis Unit, Royal Free London NHS Foundation Trust, London NW3 2QG, UK
| | - Wendy N Erber
- Pathology and Laboratory Medicine, University of Western Australia, Crawley, Western Australia WA 6009, Australia
| | - Andrew D Mumford
- School of Clinical Sciences, University of Bristol, Bristol BS2 8DZ, UK. School of Cellular and Molecular Medicine, University of Bristol, Bristol BS8 1TD, UK
| | - Paquita Nurden
- Institut Hospitalo-Universitaire LIRYC, PTIB, Hôpital Xavier Arnozan, 33600 Pessac, France
| | - Kathleen Stirrups
- Department of Haematology, University of Cambridge, Cambridge Biomedical Campus, Cambridge CB2 0PT, UK. National Institute for Health Research (NIHR) BioResource-Rare Diseases, Cambridge University Hospitals, Cambridge Biomedical Campus, Cambridge CB2 0PT, UK
| | - John R Bradley
- National Institute for Health Research (NIHR) BioResource-Rare Diseases, Cambridge University Hospitals, Cambridge Biomedical Campus, Cambridge CB2 0PT, UK. Research and Development, Cambridge University Hospitals NHS Foundation Trust, Cambridge CB2 0QQ, UK
| | - F Lucy Raymond
- National Institute for Health Research (NIHR) BioResource-Rare Diseases, Cambridge University Hospitals, Cambridge Biomedical Campus, Cambridge CB2 0PT, UK. Department of Medical Genetics, Cambridge Institute for Medical Research, University of Cambridge, Cambridge CB2 0XY, UK
| | - Michael A Laffan
- Centre for Haematology, Hammersmith Campus, Imperial College Academic Health Sciences Centre, Imperial College London, London W12 0HS, UK. Imperial College Healthcare NHS Trust, Du Cane Road, London W12 0HS, UK
| | - Chris Van Geet
- Department of Cardiovascular Sciences, Center for Molecular and Vascular Biology, University of Leuven, 3000 Leuven, Belgium
| | - Sylvia Richardson
- Medical Research Council Biostatistics Unit, Cambridge Institute of Public Health, Cambridge Biomedical Campus, Cambridge CB2 0SR, UK
| | - Kathleen Freson
- Department of Cardiovascular Sciences, Center for Molecular and Vascular Biology, University of Leuven, 3000 Leuven, Belgium.
| | - Willem H Ouwehand
- Department of Haematology, University of Cambridge, Cambridge Biomedical Campus, Cambridge CB2 0PT, UK. National Health Service (NHS) Blood and Transplant, Cambridge Biomedical Campus, Cambridge CB2 0PT, UK. National Institute for Health Research (NIHR) BioResource-Rare Diseases, Cambridge University Hospitals, Cambridge Biomedical Campus, Cambridge CB2 0PT, UK. Human Genetics, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK
| |
Collapse
|
33
|
Smedley D, Jacobsen JOB, Jäger M, Köhler S, Holtgrewe M, Schubach M, Siragusa E, Zemojtel T, Buske OJ, Washington NL, Bone WP, Haendel MA, Robinson PN. Next-generation diagnostics and disease-gene discovery with the Exomiser. Nat Protoc 2015; 10:2004-15. [PMID: 26562621 DOI: 10.1038/nprot.2015.124] [Citation(s) in RCA: 247] [Impact Index Per Article: 24.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Abstract
Exomiser is an application that prioritizes genes and variants in next-generation sequencing (NGS) projects for novel disease-gene discovery or differential diagnostics of Mendelian disease. Exomiser comprises a suite of algorithms for prioritizing exome sequences using random-walk analysis of protein interaction networks, clinical relevance and cross-species phenotype comparisons, as well as a wide range of other computational filters for variant frequency, predicted pathogenicity and pedigree analysis. In this protocol, we provide a detailed explanation of how to install Exomiser and use it to prioritize exome sequences in a number of scenarios. Exomiser requires ∼3 GB of RAM and roughly 15-90 s of computing time on a standard desktop computer to analyze a variant call format (VCF) file. Exomiser is freely available for academic use from http://www.sanger.ac.uk/science/tools/exomiser.
Collapse
Affiliation(s)
- Damian Smedley
- Skarnes Faculty Group, Wellcome Trust Sanger Institute, Hinxton, UK
| | | | - Marten Jäger
- Institute for Medical and Human Genetics, Charité-Universitätsmedizin Berlin, Berlin, Germany.,Berlin Brandenburg Center for Regenerative Therapies (BCRT), Charité-Universitätsmedizin Berlin, Berlin, Germany
| | - Sebastian Köhler
- Institute for Medical and Human Genetics, Charité-Universitätsmedizin Berlin, Berlin, Germany
| | - Manuel Holtgrewe
- Institute for Medical and Human Genetics, Charité-Universitätsmedizin Berlin, Berlin, Germany.,Berlin Institute for Health, Berlin, Germany
| | - Max Schubach
- Institute for Medical and Human Genetics, Charité-Universitätsmedizin Berlin, Berlin, Germany
| | - Enrico Siragusa
- Institute for Medical and Human Genetics, Charité-Universitätsmedizin Berlin, Berlin, Germany.,Berlin Institute for Health, Berlin, Germany.,Max Planck Institute for Molecular Genetics, Berlin, Germany
| | - Tomasz Zemojtel
- Institute for Medical and Human Genetics, Charité-Universitätsmedizin Berlin, Berlin, Germany.,Institute of Bioorganic Chemistry, Polish Academy of Sciences, Poznan, Poland.,Labor Berlin - Charité Vivantes, Humangenetik, Berlin, Germany
| | - Orion J Buske
- Department of Computer Science, University of Toronto, Toronto, Ontario, Canada.,Genetics and Genome Biology, The Hospital for Sick Children, Toronto, Ontario, Canada
| | - Nicole L Washington
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, California, USA
| | - William P Bone
- The National Institutes of Health (NIH) Undiagnosed Diseases Program, Common Fund, Office of the Director, NIH, Bethesda, Maryland, USA
| | - Melissa A Haendel
- Department of Medical Informatics and Clinical Epidemiology, Oregon Health &Science University, Portland, Oregon, USA
| | - Peter N Robinson
- Institute for Medical and Human Genetics, Charité-Universitätsmedizin Berlin, Berlin, Germany.,Berlin Brandenburg Center for Regenerative Therapies (BCRT), Charité-Universitätsmedizin Berlin, Berlin, Germany.,Max Planck Institute for Molecular Genetics, Berlin, Germany.,Department of Mathematics and Computer Science, Institute for Bioinformatics, Freie Universität Berlin, Berlin, Germany
| |
Collapse
|
34
|
Hoehndorf R, Schofield PN, Gkoutos GV. The role of ontologies in biological and biomedical research: a functional perspective. Brief Bioinform 2015; 16:1069-80. [PMID: 25863278 PMCID: PMC4652617 DOI: 10.1093/bib/bbv011] [Citation(s) in RCA: 119] [Impact Index Per Article: 11.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2014] [Revised: 01/20/2015] [Indexed: 12/19/2022] Open
Abstract
Ontologies are widely used in biological and biomedical research. Their success lies in their combination of four main features present in almost all ontologies: provision of standard identifiers for classes and relations that represent the phenomena within a domain; provision of a vocabulary for a domain; provision of metadata that describes the intended meaning of the classes and relations in ontologies; and the provision of machine-readable axioms and definitions that enable computational access to some aspects of the meaning of classes and relations. While each of these features enables applications that facilitate data integration, data access and analysis, a great potential lies in the possibility of combining these four features to support integrative analysis and interpretation of multimodal data. Here, we provide a functional perspective on ontologies in biology and biomedicine, focusing on what ontologies can do and describing how they can be used in support of integrative research. We also outline perspectives for using ontologies in data-driven science, in particular their application in structured data mining and machine learning applications.
Collapse
|
35
|
Mungall CJ, Washington NL, Nguyen-Xuan J, Condit C, Smedley D, Köhler S, Groza T, Shefchek K, Hochheiser H, Robinson PN, Lewis SE, Haendel MA. Use of model organism and disease databases to support matchmaking for human disease gene discovery. Hum Mutat 2015; 36:979-84. [PMID: 26269093 PMCID: PMC5473253 DOI: 10.1002/humu.22857] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2015] [Accepted: 07/22/2015] [Indexed: 11/10/2022]
Abstract
The Matchmaker Exchange application programming interface (API) allows searching a patient's genotypic or phenotypic profiles across clinical sites, for the purposes of cohort discovery and variant disease causal validation. This API can be used not only to search for matching patients, but also to match against public disease and model organism data. This public disease data enable matching known diseases and variant-phenotype associations using phenotype semantic similarity algorithms developed by the Monarch Initiative. The model data can provide additional evidence to aid diagnosis, suggest relevant models for disease mechanism and treatment exploration, and identify collaborators across the translational divide. The Monarch Initiative provides an implementation of this API for searching multiple integrated sources of data that contextualize the knowledge about any given patient or patient family into the greater biomedical knowledge landscape. While this corpus of data can aid diagnosis, it is also the beginning of research to improve understanding of rare human diseases.
Collapse
Affiliation(s)
| | - Nicole L. Washington
- Genomics Division, Lawrence Berkeley National Laboratory, Berkeley, California, USA
| | - Jeremy Nguyen-Xuan
- Genomics Division, Lawrence Berkeley National Laboratory, Berkeley, California, USA
| | - Christopher Condit
- San Diego Supercomputing Center, UC San Diego, La Jolla, California, USA
| | - Damian Smedley
- Wellcome Trust Sanger Institute, Mouse Informatics group, Hinxton, UK
| | - Sebastian Köhler
- Charité - Universitätsmedizin Berlin, Institute for Medical and Human Genetics, Berlin, Germany
| | - Tudor Groza
- Garvan Institute, Kinghorn Centre for Clinical Genomics, Sydney, Australia
| | - Kent Shefchek
- Department of Biomedical Informatics and Clinical Epidemiology, Oregon Health and Science University
| | - Harry Hochheiser
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, Pennsylvania, USA
| | - Peter N. Robinson
- Charité - Universitätsmedizin Berlin, Institute for Medical and Human Genetics, Berlin, Germany
| | - Suzanna E. Lewis
- Genomics Division, Lawrence Berkeley National Laboratory, Berkeley, California, USA
| | - Melissa A. Haendel
- Department of Biomedical Informatics and Clinical Epidemiology, Oregon Health and Science University
| |
Collapse
|
36
|
Oellrich A, Collier N, Groza T, Rebholz-Schuhmann D, Shah N, Bodenreider O, Boland MR, Georgiev I, Liu H, Livingston K, Luna A, Mallon AM, Manda P, Robinson PN, Rustici G, Simon M, Wang L, Winnenburg R, Dumontier M. The digital revolution in phenotyping. Brief Bioinform 2015; 17:819-30. [PMID: 26420780 PMCID: PMC5036847 DOI: 10.1093/bib/bbv083] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2015] [Indexed: 12/22/2022] Open
Abstract
Phenotypes have gained increased notoriety in the clinical and biological domain owing to their application in numerous areas such as the discovery of disease genes and drug targets, phylogenetics and pharmacogenomics. Phenotypes, defined as observable characteristics of organisms, can be seen as one of the bridges that lead to a translation of experimental findings into clinical applications and thereby support 'bench to bedside' efforts. However, to build this translational bridge, a common and universal understanding of phenotypes is required that goes beyond domain-specific definitions. To achieve this ambitious goal, a digital revolution is ongoing that enables the encoding of data in computer-readable formats and the data storage in specialized repositories, ready for integration, enabling translational research. While phenome research is an ongoing endeavor, the true potential hidden in the currently available data still needs to be unlocked, offering exciting opportunities for the forthcoming years. Here, we provide insights into the state-of-the-art in digital phenotyping, by means of representing, acquiring and analyzing phenotype data. In addition, we provide visions of this field for future research work that could enable better applications of phenotype data.
Collapse
|
37
|
Applications of comparative evolution to human disease genetics. Curr Opin Genet Dev 2015; 35:16-24. [PMID: 26338499 DOI: 10.1016/j.gde.2015.08.004] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2015] [Revised: 08/11/2015] [Accepted: 08/12/2015] [Indexed: 12/15/2022]
Abstract
Direct comparison of human diseases with model phenotypes allows exploration of key areas of human biology which are often inaccessible for practical or ethical reasons. We review recent developments in comparative evolutionary approaches for finding models for genetic disease, including high-throughput generation of gene/phenotype relationship data, the linking of orthologous genes and phenotypes across species, and statistical methods for linking human diseases to model phenotypes.
Collapse
|
38
|
Antanaviciute A, Watson CM, Harrison SM, Lascelles C, Crinnion L, Markham AF, Bonthron DT, Carr IM. OVA: integrating molecular and physical phenotype data from multiple biomedical domain ontologies with variant filtering for enhanced variant prioritization. Bioinformatics 2015; 31:3822-9. [PMID: 26272982 PMCID: PMC4653395 DOI: 10.1093/bioinformatics/btv473] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2015] [Accepted: 08/09/2015] [Indexed: 12/13/2022] Open
Abstract
MOTIVATION Exome sequencing has become a de facto standard method for Mendelian disease gene discovery in recent years, yet identifying disease-causing mutations among thousands of candidate variants remains a non-trivial task. RESULTS Here we describe a new variant prioritization tool, OVA (ontology variant analysis), in which user-provided phenotypic information is exploited to infer deeper biological context. OVA combines a knowledge-based approach with a variant-filtering framework. It reduces the number of candidate variants by considering genotype and predicted effect on protein sequence, and scores the remainder on biological relevance to the query phenotype.We take advantage of several ontologies in order to bridge knowledge across multiple biomedical domains and facilitate computational analysis of annotations pertaining to genes, diseases, phenotypes, tissues and pathways. In this way, OVA combines information regarding molecular and physical phenotypes and integrates both human and model organism data to effectively prioritize variants. By assessing performance on both known and novel disease mutations, we show that OVA performs biologically meaningful candidate variant prioritization and can be more accurate than another recently published candidate variant prioritization tool. AVAILABILITY AND IMPLEMENTATION OVA is freely accessible at http://dna2.leeds.ac.uk:8080/OVA/index.jsp. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online. CONTACT umaan@leeds.ac.uk.
Collapse
Affiliation(s)
- Agne Antanaviciute
- Section of Genetics, Institute of Biomedical and Clinical Sciences, School of Medicine, University of Leeds and
| | - Christopher M Watson
- Section of Genetics, Institute of Biomedical and Clinical Sciences, School of Medicine, University of Leeds and Yorkshire Regional Genetics Service, St James's University Hospital, Leeds, UK
| | - Sally M Harrison
- Section of Genetics, Institute of Biomedical and Clinical Sciences, School of Medicine, University of Leeds and
| | - Carolina Lascelles
- Section of Genetics, Institute of Biomedical and Clinical Sciences, School of Medicine, University of Leeds and
| | - Laura Crinnion
- Section of Genetics, Institute of Biomedical and Clinical Sciences, School of Medicine, University of Leeds and Yorkshire Regional Genetics Service, St James's University Hospital, Leeds, UK
| | - Alexander F Markham
- Section of Genetics, Institute of Biomedical and Clinical Sciences, School of Medicine, University of Leeds and
| | - David T Bonthron
- Section of Genetics, Institute of Biomedical and Clinical Sciences, School of Medicine, University of Leeds and
| | - Ian M Carr
- Section of Genetics, Institute of Biomedical and Clinical Sciences, School of Medicine, University of Leeds and
| |
Collapse
|
39
|
Lyne R, Sullivan J, Butano D, Contrino S, Heimbach J, Hu F, Kalderimis A, Lyne M, Smith RN, Štěpán R, Balakrishnan R, Binkley G, Harris T, Karra K, Moxon SAT, Motenko H, Neuhauser S, Ruzicka L, Cherry M, Richardson J, Stein L, Westerfield M, Worthey E, Micklem G. Cross-organism analysis using InterMine. Genesis 2015; 53:547-60. [PMID: 26097192 PMCID: PMC4545681 DOI: 10.1002/dvg.22869] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2015] [Revised: 06/17/2015] [Accepted: 06/17/2015] [Indexed: 01/01/2023]
Abstract
InterMine is a data integration warehouse and analysis software system developed for large and complex biological data sets. Designed for integrative analysis, it can be accessed through a user-friendly web interface. For bioinformaticians, extensive web services as well as programming interfaces for most common scripting languages support access to all features. The web interface includes a useful identifier look-up system, and both simple and sophisticated search options. Interactive results tables enable exploration, and data can be filtered, summarized, and browsed. A set of graphical analysis tools provide a rich environment for data exploration including statistical enrichment of sets of genes or other entities. InterMine databases have been developed for the major model organisms, budding yeast, nematode worm, fruit fly, zebrafish, mouse, and rat together with a newly developed human database. Here, we describe how this has facilitated interoperation and development of cross-organism analysis tools and reports. InterMine as a data exploration and analysis tool is also described. All the InterMine-based systems described in this article are resources freely available to the scientific community.
Collapse
Affiliation(s)
- Rachel Lyne
- Cambridge Systems Biology Centre, University of Cambridge, Cambridge CB2 1QR, United Kingdom
- Department of Genetics, University of Cambridge, Cambridge CB2 3EH, United Kingdom
| | - Julie Sullivan
- Cambridge Systems Biology Centre, University of Cambridge, Cambridge CB2 1QR, United Kingdom
- Department of Genetics, University of Cambridge, Cambridge CB2 3EH, United Kingdom
| | - Daniela Butano
- Cambridge Systems Biology Centre, University of Cambridge, Cambridge CB2 1QR, United Kingdom
- Department of Genetics, University of Cambridge, Cambridge CB2 3EH, United Kingdom
| | - Sergio Contrino
- Cambridge Systems Biology Centre, University of Cambridge, Cambridge CB2 1QR, United Kingdom
- Department of Genetics, University of Cambridge, Cambridge CB2 3EH, United Kingdom
| | - Josh Heimbach
- Cambridge Systems Biology Centre, University of Cambridge, Cambridge CB2 1QR, United Kingdom
- Department of Genetics, University of Cambridge, Cambridge CB2 3EH, United Kingdom
| | - Fengyuan Hu
- Cambridge Systems Biology Centre, University of Cambridge, Cambridge CB2 1QR, United Kingdom
- Department of Genetics, University of Cambridge, Cambridge CB2 3EH, United Kingdom
| | - Alex Kalderimis
- Cambridge Systems Biology Centre, University of Cambridge, Cambridge CB2 1QR, United Kingdom
- Department of Genetics, University of Cambridge, Cambridge CB2 3EH, United Kingdom
| | - Mike Lyne
- Cambridge Systems Biology Centre, University of Cambridge, Cambridge CB2 1QR, United Kingdom
- Department of Genetics, University of Cambridge, Cambridge CB2 3EH, United Kingdom
| | - Richard N. Smith
- Cambridge Systems Biology Centre, University of Cambridge, Cambridge CB2 1QR, United Kingdom
- Department of Genetics, University of Cambridge, Cambridge CB2 3EH, United Kingdom
| | - Radek Štěpán
- Cambridge Systems Biology Centre, University of Cambridge, Cambridge CB2 1QR, United Kingdom
- Department of Genetics, University of Cambridge, Cambridge CB2 3EH, United Kingdom
| | - Rama Balakrishnan
- Department of Genetics, Stanford University, Stanford, CA 94305-5120, USA
| | - Gail Binkley
- Department of Genetics, Stanford University, Stanford, CA 94305-5120, USA
| | - Todd Harris
- Ontario Institute for Cancer Research, Toronto, ON, M5G0A3, Canada
| | - Kalpana Karra
- Department of Genetics, Stanford University, Stanford, CA 94305-5120, USA
| | | | - Howie Motenko
- The Jackson Laboratory, Bar Harbor, Maine, 04609, USA
| | | | | | - Mike Cherry
- Department of Genetics, Stanford University, Stanford, CA 94305-5120, USA
| | | | - Lincoln Stein
- Ontario Institute for Cancer Research, Toronto, ON, M5G0A3, Canada
| | - Monte Westerfield
- ZFIN, University of Oregon, Eugene, OR, 97403, USA
- Institute of Neuroscience, University of Oregon, Eugene, OR, 97403, USA
| | - Elizabeth Worthey
- Human and Molecular Genetics Center, Medical College of Wisconsin, Milwaukee, WI, 53226, USA
| | - Gos Micklem
- Cambridge Systems Biology Centre, University of Cambridge, Cambridge CB2 1QR, United Kingdom
- Department of Genetics, University of Cambridge, Cambridge CB2 3EH, United Kingdom
| |
Collapse
|
40
|
Groza T, Köhler S, Moldenhauer D, Vasilevsky N, Baynam G, Zemojtel T, Schriml LM, Kibbe WA, Schofield PN, Beck T, Vasant D, Brookes AJ, Zankl A, Washington NL, Mungall CJ, Lewis SE, Haendel MA, Parkinson H, Robinson PN. The Human Phenotype Ontology: Semantic Unification of Common and Rare Disease. Am J Hum Genet 2015; 97:111-24. [PMID: 26119816 PMCID: PMC4572507 DOI: 10.1016/j.ajhg.2015.05.020] [Citation(s) in RCA: 152] [Impact Index Per Article: 15.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2015] [Accepted: 05/22/2015] [Indexed: 12/24/2022] Open
Abstract
The Human Phenotype Ontology (HPO) is widely used in the rare disease community for differential diagnostics, phenotype-driven analysis of next-generation sequence-variation data, and translational research, but a comparable resource has not been available for common disease. Here, we have developed a concept-recognition procedure that analyzes the frequencies of HPO disease annotations as identified in over five million PubMed abstracts by employing an iterative procedure to optimize precision and recall of the identified terms. We derived disease models for 3,145 common human diseases comprising a total of 132,006 HPO annotations. The HPO now comprises over 250,000 phenotypic annotations for over 10,000 rare and common diseases and can be used for examining the phenotypic overlap among common diseases that share risk alleles, as well as between Mendelian diseases and common diseases linked by genomic location. The annotations, as well as the HPO itself, are freely available.
Collapse
Affiliation(s)
- Tudor Groza
- School of Information Technology and Electrical Engineering, University of Queensland, St. Lucia, QLD 4072, Australia; Garvan Institute of Medical Research, Darlinghurst, Sydney, NSW 2010, Australia
| | - Sebastian Köhler
- Institute for Medical and Human Genetics, Charité-Universitätsmedizin Berlin, Augustenburger Platz 1, 13353 Berlin, Germany
| | - Dawid Moldenhauer
- Institute for Medical and Human Genetics, Charité-Universitätsmedizin Berlin, Augustenburger Platz 1, 13353 Berlin, Germany; University of Applied Sciences, Wiesenstrasse 14, 35390 Giessen, Germany
| | - Nicole Vasilevsky
- Library, Oregon Health & Science University, Portland, OR 97239, USA
| | - Gareth Baynam
- School of Paediatrics and Child Health, University of Western Australia, Perth, WA 6840, Australia; Institute for Immunology and Infectious Diseases, Murdoch University, Perth, WA 6150, Australia; Office of Population Health Genomics, Public Health and Clinical Services Division, Department of Health, Perth, WA 6004, Australia; Genetic Services of Western Australia, King Edward Memorial Hospital, Perth, WA 6008, Australia; Telethon Kids Institute, Perth, WA 6008, Australia
| | - Tomasz Zemojtel
- Institute for Medical and Human Genetics, Charité-Universitätsmedizin Berlin, Augustenburger Platz 1, 13353 Berlin, Germany; Institute of Bioorganic Chemistry, Polish Academy of Sciences, 61-704 Poznań, Poland
| | - Lynn Marie Schriml
- Department of Epidemiology and Public Health, School of Medicine, University of Maryland, Baltimore, MD 21201, USA; Institute for Genome Sciences, School of Medicine, University of Maryland, Baltimore, MD 21201, USA
| | - Warren Alden Kibbe
- Center for Biomedical Informatics and Information Technology, National Cancer Institute, 9609 Medical Center Drive, Rockville, MD 20850, USA
| | - Paul N Schofield
- Department of Physiology, Development and Neuroscience, University of Cambridge, Downing Street, Cambridge CB2 3EG, UK; The Jackson Laboratory, Bar Harbor, ME 04609, USA
| | - Tim Beck
- Department of Genetics, University of Leicester, Leicester LE1 7RH, UK
| | - Drashtti Vasant
- European Bioinformatics Institute, European Molecular Biology Laboratory, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD UK
| | - Anthony J Brookes
- Department of Genetics, University of Leicester, Leicester LE1 7RH, UK
| | - Andreas Zankl
- Garvan Institute of Medical Research, Darlinghurst, Sydney, NSW 2010, Australia; Academic Department of Medical Genetics, The Children's Hospital at Westmead, Sydney, NSW 2145, Australia; Discipline of Genetic Medicine, Sydney Medical School, University of Sydney, Sydney, NSW 2145, Australia
| | - Nicole L Washington
- Genomics Division, Lawrence Berkeley National Laboratory, 1 Cyclotron Road, Berkeley, CA 94720, USA
| | - Christopher J Mungall
- Genomics Division, Lawrence Berkeley National Laboratory, 1 Cyclotron Road, Berkeley, CA 94720, USA
| | - Suzanna E Lewis
- Genomics Division, Lawrence Berkeley National Laboratory, 1 Cyclotron Road, Berkeley, CA 94720, USA
| | - Melissa A Haendel
- Library, Oregon Health & Science University, Portland, OR 97239, USA
| | - Helen Parkinson
- European Bioinformatics Institute, European Molecular Biology Laboratory, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD UK
| | - Peter N Robinson
- Institute for Medical and Human Genetics, Charité-Universitätsmedizin Berlin, Augustenburger Platz 1, 13353 Berlin, Germany; Max Planck Institute for Molecular Genetics, Ihnestrasse 63-73, 14195 Berlin, Germany; Berlin Brandenburg Center for Regenerative Therapies, Charité-Universitätsmedizin Berlin, Augustenburger Platz 1, 13353 Berlin, Germany; Institute of Bioinformatics, Department of Mathematics and Computer Science, Freie Universität Berlin, Takustrasse 9, 14195 Berlin, Germany.
| |
Collapse
|
41
|
Haendel MA, Vasilevsky N, Brush M, Hochheiser HS, Jacobsen J, Oellrich A, Mungall CJ, Washington N, Köhler S, Lewis SE, Robinson PN, Smedley D. Disease insights through cross-species phenotype comparisons. Mamm Genome 2015; 26:548-55. [PMID: 26092691 PMCID: PMC4602072 DOI: 10.1007/s00335-015-9577-8] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2015] [Accepted: 05/20/2015] [Indexed: 11/30/2022]
Abstract
New sequencing technologies have ushered in a new era for diagnosis and discovery of new causative mutations for rare diseases. However, the sheer numbers of candidate variants that require interpretation in an exome or genomic analysis are still a challenging prospect. A powerful approach is the comparison of the patient’s set of phenotypes (phenotypic profile) to known phenotypic profiles caused by mutations in orthologous genes associated with these variants. The most abundant source of relevant data for this task is available through the efforts of the Mouse Genome Informatics group and the International Mouse Phenotyping Consortium. In this review, we highlight the challenges in comparing human clinical phenotypes with mouse phenotypes and some of the solutions that have been developed by members of the Monarch Initiative. These tools allow the identification of mouse models for known disease-gene associations that may otherwise have been overlooked as well as candidate genes may be prioritized for novel associations. The culmination of these efforts is the Exomiser software package that allows clinical researchers to analyse patient exomes in the context of variant frequency and predicted pathogenicity as well the phenotypic similarity of the patient to any given candidate orthologous gene.
Collapse
Affiliation(s)
- Melissa A Haendel
- University Library and Department of Medical Informatics and Epidemiology, Oregon Health & Science University, Portland, OR, USA
| | - Nicole Vasilevsky
- University Library and Department of Medical Informatics and Epidemiology, Oregon Health & Science University, Portland, OR, USA
| | - Matthew Brush
- University Library and Department of Medical Informatics and Epidemiology, Oregon Health & Science University, Portland, OR, USA
| | - Harry S Hochheiser
- Department of Biomedical Informatics and Intelligent Systems Program, University of Pittsburgh, Pittsburgh, PA, 15206, USA
| | - Julius Jacobsen
- Skarnes Faculty Group, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | - Anika Oellrich
- Skarnes Faculty Group, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | - Christopher J Mungall
- Genomics Division, Lawrence Berkeley National Laboratory, 1 Cyclotron Road, Berkeley, CA, 94720, USA
| | - Nicole Washington
- Genomics Division, Lawrence Berkeley National Laboratory, 1 Cyclotron Road, Berkeley, CA, 94720, USA
| | - Sebastian Köhler
- Computational Biology Group, Institute for Medical Genetics and Human Genetics, Universitatsklinikum Charité, Augustenburger Platz 1, 13353, Berlin, Germany
| | - Suzanna E Lewis
- Genomics Division, Lawrence Berkeley National Laboratory, 1 Cyclotron Road, Berkeley, CA, 94720, USA
| | - Peter N Robinson
- Computational Biology Group, Institute for Medical Genetics and Human Genetics, Universitatsklinikum Charité, Augustenburger Platz 1, 13353, Berlin, Germany
| | - Damian Smedley
- Skarnes Faculty Group, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK.
| |
Collapse
|
42
|
Antanaviciute A, Daly C, Crinnion LA, Markham AF, Watson CM, Bonthron DT, Carr IM. GeneTIER: prioritization of candidate disease genes using tissue-specific gene expression profiles. Bioinformatics 2015; 31:2728-35. [PMID: 25861967 PMCID: PMC4528628 DOI: 10.1093/bioinformatics/btv196] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2014] [Accepted: 04/01/2015] [Indexed: 12/12/2022] Open
Abstract
Motivation: In attempts to determine the genetic causes of human disease, researchers are often faced with a large number of candidate genes. Linkage studies can point to a genomic region containing hundreds of genes, while the high-throughput sequencing approach will often identify a great number of non-synonymous genetic variants. Since systematic experimental verification of each such candidate gene is not feasible, a method is needed to decide which genes are worth investigating further. Computational gene prioritization presents itself as a solution to this problem, systematically analyzing and sorting each gene from the most to least likely to be the disease-causing gene, in a fraction of the time it would take a researcher to perform such queries manually. Results: Here, we present Gene TIssue Expression Ranker (GeneTIER), a new web-based application for candidate gene prioritization. GeneTIER replaces knowledge-based inference traditionally used in candidate disease gene prioritization applications with experimental data from tissue-specific gene expression datasets and thus largely overcomes the bias toward the better characterized genes/diseases that commonly afflict other methods. We show that our approach is capable of accurate candidate gene prioritization and illustrate its strengths and weaknesses using case study examples. Availability and Implementation: Freely available on the web at http://dna.leeds.ac.uk/GeneTIER/. Contact:umaan@leeds.ac.uk Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Agne Antanaviciute
- Section of Genetics, Institute of Biomedical and Clinical Sciences, School of Medicine, University of Leeds, St James's University Hospital and
| | - Catherine Daly
- Section of Genetics, Institute of Biomedical and Clinical Sciences, School of Medicine, University of Leeds, St James's University Hospital and
| | - Laura A Crinnion
- Yorkshire Regional Genetics Service, St James's University Hospital, Leeds, UK
| | - Alexander F Markham
- Section of Genetics, Institute of Biomedical and Clinical Sciences, School of Medicine, University of Leeds, St James's University Hospital and
| | | | - David T Bonthron
- Section of Genetics, Institute of Biomedical and Clinical Sciences, School of Medicine, University of Leeds, St James's University Hospital and
| | - Ian M Carr
- Section of Genetics, Institute of Biomedical and Clinical Sciences, School of Medicine, University of Leeds, St James's University Hospital and
| |
Collapse
|
43
|
Groza T, Köhler S, Doelken S, Collier N, Oellrich A, Smedley D, Couto FM, Baynam G, Zankl A, Robinson PN. Automatic concept recognition using the human phenotype ontology reference and test suite corpora. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2015; 2015:bav005. [PMID: 25725061 PMCID: PMC4343077 DOI: 10.1093/database/bav005] [Citation(s) in RCA: 38] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]
Abstract
Concept recognition tools rely on the availability of textual corpora to assess their performance and enable the identification of areas for improvement. Typically, corpora are developed for specific purposes, such as gene name recognition. Gene and protein name identification are longstanding goals of biomedical text mining, and therefore a number of different corpora exist. However, phenotypes only recently became an entity of interest for specialized concept recognition systems, and hardly any annotated text is available for performance testing and training. Here, we present a unique corpus, capturing text spans from 228 abstracts manually annotated with Human Phenotype Ontology (HPO) concepts and harmonized by three curators, which can be used as a reference standard for free text annotation of human phenotypes. Furthermore, we developed a test suite for standardized concept recognition error analysis, incorporating 32 different types of test cases corresponding to 2164 HPO concepts. Finally, three established phenotype concept recognizers (NCBO Annotator, OBO Annotator and Bio-LarK CR) were comprehensively evaluated, and results are reported against both the text corpus and the test suites. The gold standard and test suites corpora are available from http://bio-lark.org/hpo_res.html. Database URL:http://bio-lark.org/hpo_res.html
Collapse
Affiliation(s)
- Tudor Groza
- School of ITEE, The University of Queensland, St. Lucia, QLD 4072, Australia, Garvan Institute of Medical Research, Darlinghurst, Sydney, NSW 2010, Australia, Institute for Medical Genetics and Human Genetics, Charité-Universitätsmedizin Berlin, 13353 Berlin, Germany, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, UK, National Institute of Informatics, Hitotsubashi, Tokyo, Japan, Mouse Informatics Group, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton CB10 1SA, UK, LASIGE, Departamento de Informática, Faculdade de Ciências, Universidade de Lisboa, 1749-016 Lisboa, Portugal, Genetic Services of Western Australia, King Edward Memorial Hospital, WA 6008, Australia, School of Paediatrics and Child Health, University of Western Australia, WA 6008, Australia, Institute for Immunology and Infectious Diseases, Murdoch University, WA 6150, Australia, Office of Population Health, Public Health and Clinical Services Division, Western Australian Department of Health, WA 6004, Australia, Academic Department of Medical Genetics, Sydney Children's Hospitals Network (Westmead), NSW 2145, Australia, Discipline of Genetic Medicine, Sydney Medical School, The University of Sydney, NSW 2006, Australia, Max Planck Institute for Molecular Genetics, 14195 Berlin, Germany, Institute for Bioinformatics, Department of Mathematics and Computer Science, Freie Universität Berlin, 14195 Berlin, Germany and Berlin Brandenburg Center for Regenerative Therapies, 13353 Berlin, Germany School of ITEE, The University of Queensland, St. Lucia, QLD 4072, Australia, Garvan Institute of Medical Research, Darlinghurst, Sydney, NSW 2010, Australia, Institute for Medical Genetics and Human Genetics, Charité-Universitätsmedizin Berlin, 13353 Berlin, Germany, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, UK, National Institute of Informatics, Hitotsubashi, Tokyo, Japan, Mouse Informa
| | - Sebastian Köhler
- School of ITEE, The University of Queensland, St. Lucia, QLD 4072, Australia, Garvan Institute of Medical Research, Darlinghurst, Sydney, NSW 2010, Australia, Institute for Medical Genetics and Human Genetics, Charité-Universitätsmedizin Berlin, 13353 Berlin, Germany, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, UK, National Institute of Informatics, Hitotsubashi, Tokyo, Japan, Mouse Informatics Group, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton CB10 1SA, UK, LASIGE, Departamento de Informática, Faculdade de Ciências, Universidade de Lisboa, 1749-016 Lisboa, Portugal, Genetic Services of Western Australia, King Edward Memorial Hospital, WA 6008, Australia, School of Paediatrics and Child Health, University of Western Australia, WA 6008, Australia, Institute for Immunology and Infectious Diseases, Murdoch University, WA 6150, Australia, Office of Population Health, Public Health and Clinical Services Division, Western Australian Department of Health, WA 6004, Australia, Academic Department of Medical Genetics, Sydney Children's Hospitals Network (Westmead), NSW 2145, Australia, Discipline of Genetic Medicine, Sydney Medical School, The University of Sydney, NSW 2006, Australia, Max Planck Institute for Molecular Genetics, 14195 Berlin, Germany, Institute for Bioinformatics, Department of Mathematics and Computer Science, Freie Universität Berlin, 14195 Berlin, Germany and Berlin Brandenburg Center for Regenerative Therapies, 13353 Berlin, Germany
| | - Sandra Doelken
- School of ITEE, The University of Queensland, St. Lucia, QLD 4072, Australia, Garvan Institute of Medical Research, Darlinghurst, Sydney, NSW 2010, Australia, Institute for Medical Genetics and Human Genetics, Charité-Universitätsmedizin Berlin, 13353 Berlin, Germany, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, UK, National Institute of Informatics, Hitotsubashi, Tokyo, Japan, Mouse Informatics Group, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton CB10 1SA, UK, LASIGE, Departamento de Informática, Faculdade de Ciências, Universidade de Lisboa, 1749-016 Lisboa, Portugal, Genetic Services of Western Australia, King Edward Memorial Hospital, WA 6008, Australia, School of Paediatrics and Child Health, University of Western Australia, WA 6008, Australia, Institute for Immunology and Infectious Diseases, Murdoch University, WA 6150, Australia, Office of Population Health, Public Health and Clinical Services Division, Western Australian Department of Health, WA 6004, Australia, Academic Department of Medical Genetics, Sydney Children's Hospitals Network (Westmead), NSW 2145, Australia, Discipline of Genetic Medicine, Sydney Medical School, The University of Sydney, NSW 2006, Australia, Max Planck Institute for Molecular Genetics, 14195 Berlin, Germany, Institute for Bioinformatics, Department of Mathematics and Computer Science, Freie Universität Berlin, 14195 Berlin, Germany and Berlin Brandenburg Center for Regenerative Therapies, 13353 Berlin, Germany
| | - Nigel Collier
- School of ITEE, The University of Queensland, St. Lucia, QLD 4072, Australia, Garvan Institute of Medical Research, Darlinghurst, Sydney, NSW 2010, Australia, Institute for Medical Genetics and Human Genetics, Charité-Universitätsmedizin Berlin, 13353 Berlin, Germany, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, UK, National Institute of Informatics, Hitotsubashi, Tokyo, Japan, Mouse Informatics Group, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton CB10 1SA, UK, LASIGE, Departamento de Informática, Faculdade de Ciências, Universidade de Lisboa, 1749-016 Lisboa, Portugal, Genetic Services of Western Australia, King Edward Memorial Hospital, WA 6008, Australia, School of Paediatrics and Child Health, University of Western Australia, WA 6008, Australia, Institute for Immunology and Infectious Diseases, Murdoch University, WA 6150, Australia, Office of Population Health, Public Health and Clinical Services Division, Western Australian Department of Health, WA 6004, Australia, Academic Department of Medical Genetics, Sydney Children's Hospitals Network (Westmead), NSW 2145, Australia, Discipline of Genetic Medicine, Sydney Medical School, The University of Sydney, NSW 2006, Australia, Max Planck Institute for Molecular Genetics, 14195 Berlin, Germany, Institute for Bioinformatics, Department of Mathematics and Computer Science, Freie Universität Berlin, 14195 Berlin, Germany and Berlin Brandenburg Center for Regenerative Therapies, 13353 Berlin, Germany School of ITEE, The University of Queensland, St. Lucia, QLD 4072, Australia, Garvan Institute of Medical Research, Darlinghurst, Sydney, NSW 2010, Australia, Institute for Medical Genetics and Human Genetics, Charité-Universitätsmedizin Berlin, 13353 Berlin, Germany, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, UK, National Institute of Informatics, Hitotsubashi, Tokyo, Japan, Mouse Informa
| | - Anika Oellrich
- School of ITEE, The University of Queensland, St. Lucia, QLD 4072, Australia, Garvan Institute of Medical Research, Darlinghurst, Sydney, NSW 2010, Australia, Institute for Medical Genetics and Human Genetics, Charité-Universitätsmedizin Berlin, 13353 Berlin, Germany, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, UK, National Institute of Informatics, Hitotsubashi, Tokyo, Japan, Mouse Informatics Group, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton CB10 1SA, UK, LASIGE, Departamento de Informática, Faculdade de Ciências, Universidade de Lisboa, 1749-016 Lisboa, Portugal, Genetic Services of Western Australia, King Edward Memorial Hospital, WA 6008, Australia, School of Paediatrics and Child Health, University of Western Australia, WA 6008, Australia, Institute for Immunology and Infectious Diseases, Murdoch University, WA 6150, Australia, Office of Population Health, Public Health and Clinical Services Division, Western Australian Department of Health, WA 6004, Australia, Academic Department of Medical Genetics, Sydney Children's Hospitals Network (Westmead), NSW 2145, Australia, Discipline of Genetic Medicine, Sydney Medical School, The University of Sydney, NSW 2006, Australia, Max Planck Institute for Molecular Genetics, 14195 Berlin, Germany, Institute for Bioinformatics, Department of Mathematics and Computer Science, Freie Universität Berlin, 14195 Berlin, Germany and Berlin Brandenburg Center for Regenerative Therapies, 13353 Berlin, Germany
| | - Damian Smedley
- School of ITEE, The University of Queensland, St. Lucia, QLD 4072, Australia, Garvan Institute of Medical Research, Darlinghurst, Sydney, NSW 2010, Australia, Institute for Medical Genetics and Human Genetics, Charité-Universitätsmedizin Berlin, 13353 Berlin, Germany, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, UK, National Institute of Informatics, Hitotsubashi, Tokyo, Japan, Mouse Informatics Group, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton CB10 1SA, UK, LASIGE, Departamento de Informática, Faculdade de Ciências, Universidade de Lisboa, 1749-016 Lisboa, Portugal, Genetic Services of Western Australia, King Edward Memorial Hospital, WA 6008, Australia, School of Paediatrics and Child Health, University of Western Australia, WA 6008, Australia, Institute for Immunology and Infectious Diseases, Murdoch University, WA 6150, Australia, Office of Population Health, Public Health and Clinical Services Division, Western Australian Department of Health, WA 6004, Australia, Academic Department of Medical Genetics, Sydney Children's Hospitals Network (Westmead), NSW 2145, Australia, Discipline of Genetic Medicine, Sydney Medical School, The University of Sydney, NSW 2006, Australia, Max Planck Institute for Molecular Genetics, 14195 Berlin, Germany, Institute for Bioinformatics, Department of Mathematics and Computer Science, Freie Universität Berlin, 14195 Berlin, Germany and Berlin Brandenburg Center for Regenerative Therapies, 13353 Berlin, Germany
| | - Francisco M Couto
- School of ITEE, The University of Queensland, St. Lucia, QLD 4072, Australia, Garvan Institute of Medical Research, Darlinghurst, Sydney, NSW 2010, Australia, Institute for Medical Genetics and Human Genetics, Charité-Universitätsmedizin Berlin, 13353 Berlin, Germany, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, UK, National Institute of Informatics, Hitotsubashi, Tokyo, Japan, Mouse Informatics Group, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton CB10 1SA, UK, LASIGE, Departamento de Informática, Faculdade de Ciências, Universidade de Lisboa, 1749-016 Lisboa, Portugal, Genetic Services of Western Australia, King Edward Memorial Hospital, WA 6008, Australia, School of Paediatrics and Child Health, University of Western Australia, WA 6008, Australia, Institute for Immunology and Infectious Diseases, Murdoch University, WA 6150, Australia, Office of Population Health, Public Health and Clinical Services Division, Western Australian Department of Health, WA 6004, Australia, Academic Department of Medical Genetics, Sydney Children's Hospitals Network (Westmead), NSW 2145, Australia, Discipline of Genetic Medicine, Sydney Medical School, The University of Sydney, NSW 2006, Australia, Max Planck Institute for Molecular Genetics, 14195 Berlin, Germany, Institute for Bioinformatics, Department of Mathematics and Computer Science, Freie Universität Berlin, 14195 Berlin, Germany and Berlin Brandenburg Center for Regenerative Therapies, 13353 Berlin, Germany
| | - Gareth Baynam
- School of ITEE, The University of Queensland, St. Lucia, QLD 4072, Australia, Garvan Institute of Medical Research, Darlinghurst, Sydney, NSW 2010, Australia, Institute for Medical Genetics and Human Genetics, Charité-Universitätsmedizin Berlin, 13353 Berlin, Germany, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, UK, National Institute of Informatics, Hitotsubashi, Tokyo, Japan, Mouse Informatics Group, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton CB10 1SA, UK, LASIGE, Departamento de Informática, Faculdade de Ciências, Universidade de Lisboa, 1749-016 Lisboa, Portugal, Genetic Services of Western Australia, King Edward Memorial Hospital, WA 6008, Australia, School of Paediatrics and Child Health, University of Western Australia, WA 6008, Australia, Institute for Immunology and Infectious Diseases, Murdoch University, WA 6150, Australia, Office of Population Health, Public Health and Clinical Services Division, Western Australian Department of Health, WA 6004, Australia, Academic Department of Medical Genetics, Sydney Children's Hospitals Network (Westmead), NSW 2145, Australia, Discipline of Genetic Medicine, Sydney Medical School, The University of Sydney, NSW 2006, Australia, Max Planck Institute for Molecular Genetics, 14195 Berlin, Germany, Institute for Bioinformatics, Department of Mathematics and Computer Science, Freie Universität Berlin, 14195 Berlin, Germany and Berlin Brandenburg Center for Regenerative Therapies, 13353 Berlin, Germany School of ITEE, The University of Queensland, St. Lucia, QLD 4072, Australia, Garvan Institute of Medical Research, Darlinghurst, Sydney, NSW 2010, Australia, Institute for Medical Genetics and Human Genetics, Charité-Universitätsmedizin Berlin, 13353 Berlin, Germany, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, UK, National Institute of Informatics, Hitotsubashi, Tokyo, Japan, Mouse Informa
| | - Andreas Zankl
- School of ITEE, The University of Queensland, St. Lucia, QLD 4072, Australia, Garvan Institute of Medical Research, Darlinghurst, Sydney, NSW 2010, Australia, Institute for Medical Genetics and Human Genetics, Charité-Universitätsmedizin Berlin, 13353 Berlin, Germany, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, UK, National Institute of Informatics, Hitotsubashi, Tokyo, Japan, Mouse Informatics Group, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton CB10 1SA, UK, LASIGE, Departamento de Informática, Faculdade de Ciências, Universidade de Lisboa, 1749-016 Lisboa, Portugal, Genetic Services of Western Australia, King Edward Memorial Hospital, WA 6008, Australia, School of Paediatrics and Child Health, University of Western Australia, WA 6008, Australia, Institute for Immunology and Infectious Diseases, Murdoch University, WA 6150, Australia, Office of Population Health, Public Health and Clinical Services Division, Western Australian Department of Health, WA 6004, Australia, Academic Department of Medical Genetics, Sydney Children's Hospitals Network (Westmead), NSW 2145, Australia, Discipline of Genetic Medicine, Sydney Medical School, The University of Sydney, NSW 2006, Australia, Max Planck Institute for Molecular Genetics, 14195 Berlin, Germany, Institute for Bioinformatics, Department of Mathematics and Computer Science, Freie Universität Berlin, 14195 Berlin, Germany and Berlin Brandenburg Center for Regenerative Therapies, 13353 Berlin, Germany School of ITEE, The University of Queensland, St. Lucia, QLD 4072, Australia, Garvan Institute of Medical Research, Darlinghurst, Sydney, NSW 2010, Australia, Institute for Medical Genetics and Human Genetics, Charité-Universitätsmedizin Berlin, 13353 Berlin, Germany, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, UK, National Institute of Informatics, Hitotsubashi, Tokyo, Japan, Mouse Informa
| | - Peter N Robinson
- School of ITEE, The University of Queensland, St. Lucia, QLD 4072, Australia, Garvan Institute of Medical Research, Darlinghurst, Sydney, NSW 2010, Australia, Institute for Medical Genetics and Human Genetics, Charité-Universitätsmedizin Berlin, 13353 Berlin, Germany, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, UK, National Institute of Informatics, Hitotsubashi, Tokyo, Japan, Mouse Informatics Group, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton CB10 1SA, UK, LASIGE, Departamento de Informática, Faculdade de Ciências, Universidade de Lisboa, 1749-016 Lisboa, Portugal, Genetic Services of Western Australia, King Edward Memorial Hospital, WA 6008, Australia, School of Paediatrics and Child Health, University of Western Australia, WA 6008, Australia, Institute for Immunology and Infectious Diseases, Murdoch University, WA 6150, Australia, Office of Population Health, Public Health and Clinical Services Division, Western Australian Department of Health, WA 6004, Australia, Academic Department of Medical Genetics, Sydney Children's Hospitals Network (Westmead), NSW 2145, Australia, Discipline of Genetic Medicine, Sydney Medical School, The University of Sydney, NSW 2006, Australia, Max Planck Institute for Molecular Genetics, 14195 Berlin, Germany, Institute for Bioinformatics, Department of Mathematics and Computer Science, Freie Universität Berlin, 14195 Berlin, Germany and Berlin Brandenburg Center for Regenerative Therapies, 13353 Berlin, Germany School of ITEE, The University of Queensland, St. Lucia, QLD 4072, Australia, Garvan Institute of Medical Research, Darlinghurst, Sydney, NSW 2010, Australia, Institute for Medical Genetics and Human Genetics, Charité-Universitätsmedizin Berlin, 13353 Berlin, Germany, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, UK, National Institute of Informatics, Hitotsubashi, Tokyo, Japan, Mouse Informa
| |
Collapse
|
44
|
Soul J, Hardingham TE, Boot-Handford RP, Schwartz JM. PhenomeExpress: a refined network analysis of expression datasets by inclusion of known disease phenotypes. Sci Rep 2015; 5:8117. [PMID: 25631385 PMCID: PMC4822650 DOI: 10.1038/srep08117] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2014] [Accepted: 12/19/2014] [Indexed: 12/19/2022] Open
Abstract
We describe a new method, PhenomeExpress, for the analysis of transcriptomic datasets to identify pathogenic disease mechanisms. Our analysis method includes input from both protein-protein interaction and phenotype similarity networks. This introduces valuable information from disease relevant phenotypes, which aids the identification of sub-networks that are significantly enriched in differentially expressed genes and are related to the disease relevant phenotypes. This contrasts with many active sub-network detection methods, which rely solely on protein-protein interaction networks derived from compounded data of many unrelated biological conditions and which are therefore not specific to the context of the experiment. PhenomeExpress thus exploits readily available animal model and human disease phenotype information. It combines this prior evidence of disease phenotypes with the experimentally derived disease data sets to provide a more targeted analysis. Two case studies, in subchondral bone in osteoarthritis and in Pax5 in acute lymphoblastic leukaemia, demonstrate that PhenomeExpress identifies core disease pathways in both mouse and human disease expression datasets derived from different technologies. We also validate the approach by comparison to state-of-the-art active sub-network detection methods, which reveals how it may enhance the detection of molecular phenotypes and provide a more detailed context to those previously identified as possible candidates.
Collapse
Affiliation(s)
- Jamie Soul
- Wellcome Trust Centre for Cell-Matrix Research, Faculty of Life Sciences, University of Manchester, Manchester M13 9PT, UK
| | - Timothy E Hardingham
- Wellcome Trust Centre for Cell-Matrix Research, Faculty of Life Sciences, University of Manchester, Manchester M13 9PT, UK
| | - Raymond P Boot-Handford
- Wellcome Trust Centre for Cell-Matrix Research, Faculty of Life Sciences, University of Manchester, Manchester M13 9PT, UK
| | - Jean-Marc Schwartz
- Wellcome Trust Centre for Cell-Matrix Research, Faculty of Life Sciences, University of Manchester, Manchester M13 9PT, UK
| |
Collapse
|
45
|
Hoehndorf R, Slater L, Schofield PN, Gkoutos GV. Aber-OWL: a framework for ontology-based data access in biology. BMC Bioinformatics 2015; 16:26. [PMID: 25627673 PMCID: PMC4384359 DOI: 10.1186/s12859-015-0456-9] [Citation(s) in RCA: 58] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2014] [Accepted: 01/09/2015] [Indexed: 11/10/2022] Open
Abstract
Background Many ontologies have been developed in biology and these ontologies increasingly contain large volumes of formalized knowledge commonly expressed in the Web Ontology Language (OWL). Computational access to the knowledge contained within these ontologies relies on the use of automated reasoning. Results We have developed the Aber-OWL infrastructure that provides reasoning services for bio-ontologies. Aber-OWL consists of an ontology repository, a set of web services and web interfaces that enable ontology-based semantic access to biological data and literature. Aber-OWL is freely available at http://aber-owl.net. Conclusions Aber-OWL provides a framework for automatically accessing information that is annotated with ontologies or contains terms used to label classes in ontologies. When using Aber-OWL, access to ontologies and data annotated with them is not merely based on class names or identifiers but rather on the knowledge the ontologies contain and the inferences that can be drawn from it.
Collapse
Affiliation(s)
- Robert Hoehndorf
- Computational Bioscience Research Center, King Abdullah University of Science and Technology, 4700 KAUST, Thuwal, 23955-6900, Saudi Arabia. .,Computer, Electrical and Mathematical Sciences & Engineering Division, King Abdullah University of Science and Technology, 4700 KAUST, Thuwal, 23955-6900, Saudi Arabia.
| | - Luke Slater
- Computational Bioscience Research Center, King Abdullah University of Science and Technology, 4700 KAUST, Thuwal, 23955-6900, Saudi Arabia. .,Computer, Electrical and Mathematical Sciences & Engineering Division, King Abdullah University of Science and Technology, 4700 KAUST, Thuwal, 23955-6900, Saudi Arabia. .,Department of Computer Science, Aberystwyth University, Llandinam Building, Aberystwyth, SY23 3DB, UK.
| | - Paul N Schofield
- Department of Physiology, Development & Neuroscience, University of Cambridge, Downing Street, Cambridge, CB2 3EG, UK.
| | - Georgios V Gkoutos
- Department of Computer Science, Aberystwyth University, Llandinam Building, Aberystwyth, SY23 3DB, UK.
| |
Collapse
|
46
|
Deans AR, Lewis SE, Huala E, Anzaldo SS, Ashburner M, Balhoff JP, Blackburn DC, Blake JA, Burleigh JG, Chanet B, Cooper LD, Courtot M, Csösz S, Cui H, Dahdul W, Das S, Dececchi TA, Dettai A, Diogo R, Druzinsky RE, Dumontier M, Franz NM, Friedrich F, Gkoutos GV, Haendel M, Harmon LJ, Hayamizu TF, He Y, Hines HM, Ibrahim N, Jackson LM, Jaiswal P, James-Zorn C, Köhler S, Lecointre G, Lapp H, Lawrence CJ, Le Novère N, Lundberg JG, Macklin J, Mast AR, Midford PE, Mikó I, Mungall CJ, Oellrich A, Osumi-Sutherland D, Parkinson H, Ramírez MJ, Richter S, Robinson PN, Ruttenberg A, Schulz KS, Segerdell E, Seltmann KC, Sharkey MJ, Smith AD, Smith B, Specht CD, Squires RB, Thacker RW, Thessen A, Fernandez-Triana J, Vihinen M, Vize PD, Vogt L, Wall CE, Walls RL, Westerfeld M, Wharton RA, Wirkner CS, Woolley JB, Yoder MJ, Zorn AM, Mabee P. Finding our way through phenotypes. PLoS Biol 2015; 13:e1002033. [PMID: 25562316 PMCID: PMC4285398 DOI: 10.1371/journal.pbio.1002033] [Citation(s) in RCA: 124] [Impact Index Per Article: 12.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023] Open
Abstract
Despite a large and multifaceted effort to understand the vast landscape of phenotypic data, their current form inhibits productive data analysis. The lack of a community-wide, consensus-based, human- and machine-interpretable language for describing phenotypes and their genomic and environmental contexts is perhaps the most pressing scientific bottleneck to integration across many key fields in biology, including genomics, systems biology, development, medicine, evolution, ecology, and systematics. Here we survey the current phenomics landscape, including data resources and handling, and the progress that has been made to accurately capture relevant data descriptions for phenotypes. We present an example of the kind of integration across domains that computable phenotypes would enable, and we call upon the broader biology community, publishers, and relevant funding agencies to support efforts to surmount today's data barriers and facilitate analytical reproducibility.
Collapse
Affiliation(s)
- Andrew R. Deans
- Department of Entomology, Pennsylvania State University, University Park, Pennsylvania, United States of America
| | - Suzanna E. Lewis
- Genome Division, Lawrence Berkeley National Lab, Berkeley, California, United States of America
| | - Eva Huala
- Department of Plant Biology, Carnegie Institution for Science, Stanford, California, United States of America
- Phoenix Bioinformatics, Palo Alto, California, United States of America
| | - Salvatore S. Anzaldo
- School of Life Sciences, Arizona State University, Tempe, Arizona, United States of America
| | - Michael Ashburner
- Department of Genetics, University of Cambridge, Cambridge, United Kingdom
| | - James P. Balhoff
- National Evolutionary Synthesis Center, Durham, North Carolina, United States of America
| | - David C. Blackburn
- Department of Vertebrate Zoology and Anthropology, California Academy of Sciences, San Francisco, California, United States of America
| | - Judith A. Blake
- The Jackson Laboratory, Bar Harbor, Maine, United States of America
| | - J. Gordon Burleigh
- Department of Biology, University of Florida, Gainesville, Florida, United States of America
| | - Bruno Chanet
- Muséum national d'Histoire naturelle, Département Systématique et Evolution, Paris, France
| | - Laurel D. Cooper
- Department of Botany and Plant Pathology, Oregon State University, Corvallis, Oregon, United States of America
| | - Mélanie Courtot
- Molecular Biology and Biochemistry Department, Simon Fraser University, Burnaby, British Columbia, Canada
| | - Sándor Csösz
- MTA-ELTE-MTM, Ecology Research Group, Pázmány Péter sétány 1C, Budapest, Hungary
| | - Hong Cui
- School of Information Resources and Library Science, University of Arizona, Tucson, Arizona, United States of America
| | - Wasila Dahdul
- Department of Biology, University of South Dakota, Vermillion, South Dakota, United States of America
| | - Sandip Das
- Department of Botany, University of Delhi, Delhi, India
| | - T. Alexander Dececchi
- Department of Biology, University of South Dakota, Vermillion, South Dakota, United States of America
| | - Agnes Dettai
- Muséum national d'Histoire naturelle, Département Systématique et Evolution, Paris, France
| | - Rui Diogo
- Department of Anatomy, Howard University College of Medicine, Washington D.C., United States of America
| | - Robert E. Druzinsky
- Department of Oral Biology, College of Dentistry, University of Illinois, Chicago, Illinois, United States of America
| | - Michel Dumontier
- Stanford Center for Biomedical Informatics Research, Stanford, California, United States of America
| | - Nico M. Franz
- School of Life Sciences, Arizona State University, Tempe, Arizona, United States of America
| | - Frank Friedrich
- Biocenter Grindel and Zoological Museum, Hamburg University, Hamburg, Germany
| | - George V. Gkoutos
- Department of Computer Science, Aberystwyth University, Aberystwyth, Ceredigion, United Kingdom
| | - Melissa Haendel
- Department of Medical Informatics & Epidemiology, Oregon Health & Science University, Portland, Oregon, United States of America
| | - Luke J. Harmon
- Department of Biological Sciences, University of Idaho, Moscow, Idaho, United States of America
| | - Terry F. Hayamizu
- Mouse Genome Informatics, The Jackson Laboratory, Bar Harbor, Maine, United States of America
| | - Yongqun He
- Unit for Laboratory Animal Medicine, Department of Microbiology and Immunology, Center for Computational Medicine and Bioinformatics, and Comprehensive Cancer Center, University of Michigan Medical School, Ann Arbor, Michigan, United States of America
| | - Heather M. Hines
- Department of Entomology, Pennsylvania State University, University Park, Pennsylvania, United States of America
| | - Nizar Ibrahim
- Department of Organismal Biology and Anatomy, University of Chicago, Chicago, Illinois, United States of America
| | - Laura M. Jackson
- Department of Biology, University of South Dakota, Vermillion, South Dakota, United States of America
| | - Pankaj Jaiswal
- Department of Botany and Plant Pathology, Oregon State University, Corvallis, Oregon, United States of America
| | - Christina James-Zorn
- Cincinnati Children's Hospital, Division of Developmental Biology, Cincinnati, Ohio, United States of America
| | - Sebastian Köhler
- Institute for Medical Genetics and Human Genetics, Charité-Universitätsmedizin Berlin, Berlin, Germany
| | - Guillaume Lecointre
- Muséum national d'Histoire naturelle, Département Systématique et Evolution, Paris, France
| | - Hilmar Lapp
- National Evolutionary Synthesis Center, Durham, North Carolina, United States of America
| | - Carolyn J. Lawrence
- Department of Genetics, Development and Cell Biology and Department of Agronomy, Iowa State University, Ames, Iowa, United States of America
| | | | - John G. Lundberg
- Department of Ichthyology, The Academy of Natural Sciences, Philadelphia, Pennsylvania, United States of America
| | - James Macklin
- Eastern Cereal and Oilseed Research Centre, Ottawa, Ontario, Canada
| | - Austin R. Mast
- Department of Biological Science, Florida State University, Tallahassee, Florida, United States of America
| | | | - István Mikó
- Department of Entomology, Pennsylvania State University, University Park, Pennsylvania, United States of America
| | - Christopher J. Mungall
- Genome Division, Lawrence Berkeley National Lab, Berkeley, California, United States of America
| | - Anika Oellrich
- European Molecular Biology Laboratory - European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, United Kingdom
| | - David Osumi-Sutherland
- European Molecular Biology Laboratory - European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, United Kingdom
| | - Helen Parkinson
- European Molecular Biology Laboratory - European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, United Kingdom
| | - Martín J. Ramírez
- Division of Arachnology, Museo Argentino de Ciencias Naturales - CONICET, Buenos Aires, Argentina
| | - Stefan Richter
- Allgemeine & Spezielle Zoologie, Institut für Biowissenschaften, Universität Rostock, Universitätsplatz 2, Rostock, Germany
| | - Peter N. Robinson
- Institut für Medizinische Genetik und Humangenetik Charité – Universitätsmedizin Berlin, Berlin, Germany
| | - Alan Ruttenberg
- School of Dental Medicine, University at Buffalo, Buffalo, New York, United States of America
| | - Katja S. Schulz
- Smithsonian Institution, National Museum of Natural History, Washington, D.C., United States of America
| | - Erik Segerdell
- Knight Cancer Institute, Oregon Health & Science University, Portland, Oregon, United States of America
| | - Katja C. Seltmann
- Division of Invertebrate Zoology, American Museum of Natural History, New York, New York, United States of America
| | - Michael J. Sharkey
- Department of Entomology, University of Kentucky, Lexington, Kentucky, United States of America
| | - Aaron D. Smith
- Department of Biological Sciences, Northern Arizona University, Flagstaff, Arizona, United States of America
| | - Barry Smith
- Department of Philosophy, University at Buffalo, Buffalo, New York, United States of America
| | - Chelsea D. Specht
- Department of Plant and Microbial Biology, Integrative Biology, and the University and Jepson Herbaria, University of California, Berkeley, California, United States of America
| | - R. Burke Squires
- Bioinformatics and Computational Biosciences Branch, Office of Cyber Infrastructure and Computational Biology, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, Maryland, United States of America
| | - Robert W. Thacker
- Department of Biology, University of Alabama at Birmingham, Birmingham, Alabama, United States of America
| | - Anne Thessen
- The Data Detektiv, 1412 Stearns Hill Road, Waltham, Massachusetts, United States of America
| | | | - Mauno Vihinen
- Department of Experimental Medical Science, Lund University, Lund, Sweden
| | - Peter D. Vize
- Department of Biological Sciences, University of Calgary, Calgary, Alberta, Canada
| | - Lars Vogt
- Universität Bonn, Institut für Evolutionsbiologie und Ökologie, Bonn, Germany
| | - Christine E. Wall
- Department of Evolutionary Anthropology, Duke University, Durham, North Carolina, United States of America
| | - Ramona L. Walls
- iPlant Collaborative University of Arizona, Thomas J. Keating Bioresearch Building, Tucson, Arizona, United States of America
| | - Monte Westerfeld
- Institute of Neuroscience, University of Oregon, Eugene, Oregon, United States of America
| | - Robert A. Wharton
- Department of Entomology, Texas A & M University, College, Station, Texas, United States of America
| | - Christian S. Wirkner
- Allgemeine & Spezielle Zoologie, Institut für Biowissenschaften, Universität Rostock, Universitätsplatz 2, Rostock, Germany
| | - James B. Woolley
- Department of Entomology, Texas A & M University, College, Station, Texas, United States of America
| | - Matthew J. Yoder
- Illinois Natural History Survey, University of Illinois, Champaign, Illinois, United States of America
| | - Aaron M. Zorn
- Cincinnati Children's Hospital, Division of Developmental Biology, Cincinnati, Ohio, United States of America
| | - Paula Mabee
- Department of Biology, University of South Dakota, Vermillion, South Dakota, United States of America
| |
Collapse
|
47
|
Lotan A, Fenckova M, Bralten J, Alttoa A, Dixson L, Williams RW, van der Voet M. Neuroinformatic analyses of common and distinct genetic components associated with major neuropsychiatric disorders. Front Neurosci 2014; 8:331. [PMID: 25414627 PMCID: PMC4222236 DOI: 10.3389/fnins.2014.00331] [Citation(s) in RCA: 64] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2014] [Accepted: 10/01/2014] [Indexed: 12/11/2022] Open
Abstract
Major neuropsychiatric disorders are highly heritable, with mounting evidence suggesting that these disorders share overlapping sets of molecular and cellular underpinnings. In the current article we systematically test the degree of genetic commonality across six major neuropsychiatric disorders-attention deficit hyperactivity disorder (ADHD), anxiety disorders (Anx), autistic spectrum disorders (ASD), bipolar disorder (BD), major depressive disorder (MDD), and schizophrenia (SCZ). We curated a well-vetted list of genes based on large-scale human genetic studies based on the NHGRI catalog of published genome-wide association studies (GWAS). A total of 180 genes were accepted into the analysis on the basis of low but liberal GWAS p-values (<10(-5)). 22% of genes overlapped two or more disorders. The most widely shared subset of genes-common to five of six disorders-included ANK3, AS3MT, CACNA1C, CACNB2, CNNM2, CSMD1, DPCR1, ITIH3, NT5C2, PPP1R11, SYNE1, TCF4, TENM4, TRIM26, and ZNRD1. Using a suite of neuroinformatic resources, we showed that many of the shared genes are implicated in the postsynaptic density (PSD), expressed in immune tissues and co-expressed in developing human brain. Using a translational cross-species approach, we detected two distinct genetic components that were both shared by each of the six disorders; the 1st component is involved in CNS development, neural projections and synaptic transmission, while the 2nd is implicated in various cytoplasmic organelles and cellular processes. Combined, these genetic components account for 20-30% of the genetic load. The remaining risk is conferred by distinct, disorder-specific variants. Our systematic comparative analysis of shared and unique genetic factors highlights key gene sets and molecular processes that may ultimately translate into improved diagnosis and treatment of these debilitating disorders.
Collapse
Affiliation(s)
- Amit Lotan
- Department of Adult Psychiatry and the Biological Psychiatry Laboratory, Hadassah-Hebrew University Medical Center Jerusalem, Israel
| | - Michaela Fenckova
- Department of Human Genetics, Donders Institute for Brain, Cognition and Behaviour, Radboud University Medical Center Nijmegen, Netherlands
| | - Janita Bralten
- Department of Human Genetics, Donders Institute for Brain, Cognition and Behaviour, Radboud University Medical Center Nijmegen, Netherlands ; Department of Cognitive Neuroscience, Donders Institute for Brain, Cognition and Behaviour, Radboud University Medical Center Nijmegen, Netherlands
| | - Aet Alttoa
- Department of Psychiatry, Psychotherapy and Psychosomatics, Psychiatric Neurobiology Program, University of Würzburg Würzburg, Germany
| | - Luanna Dixson
- Department of Psychiatry and Psychotherapy, Medical Faculty Mannheim, Central Institute of Mental Health, University of Heidelberg Mannheim, Germany
| | - Robert W Williams
- Department of Genetics, Genomics and Informatics, Center for Integrative and Translational Genomics, University of Tennessee Health Science Center Memphis, TN, USA
| | - Monique van der Voet
- Department of Human Genetics, Donders Institute for Brain, Cognition and Behaviour, Radboud University Medical Center Nijmegen, Netherlands
| |
Collapse
|
48
|
Köhler S, Schoeneberg U, Czeschik JC, Doelken SC, Hehir-Kwa JY, Ibn-Salem J, Mungall CJ, Smedley D, Haendel MA, Robinson PN. Clinical interpretation of CNVs with cross-species phenotype data. J Med Genet 2014; 51:766-772. [PMID: 25280750 DOI: 10.1136/jmedgenet-2014-102633] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/30/2023]
Abstract
BACKGROUND Clinical evaluation of CNVs identified via techniques such as array comparative genome hybridisation (aCGH) involves the inspection of lists of known and unknown duplications and deletions with the goal of distinguishing pathogenic from benign CNVs. A key step in this process is the comparison of the individual's phenotypic abnormalities with those associated with Mendelian disorders of the genes affected by the CNV. However, because often there is not much known about these human genes, an additional source of data that could be used is model organism phenotype data. Currently, almost 6000 genes in mouse and zebrafish are, when knocked out, associated with a phenotype in the model organism, but no disease is known to be caused by mutations in the human ortholog. Yet, searching model organism databases and comparing model organism phenotypes with patient phenotypes for identifying novel disease genes and medical evaluation of CNVs is hindered by the difficulty in integrating phenotype information across species and the lack of appropriate software tools. METHODS Here, we present an integrated ranking scheme based on phenotypic matching, degree of overlap with known benign or pathogenic CNVs and the haploinsufficiency score for the prioritisation of CNVs responsible for a patient's clinical findings. RESULTS We show that this scheme leads to significant improvements compared with rankings that do not exploit phenotypic information. We provide a software tool called PhenogramViz, which supports phenotype-driven interpretation of aCGH findings based on multiple data sources, including the integrated cross-species phenotype ontology Uberpheno, in order to visualise gene-to-phenotype relations. CONCLUSIONS Integrating and visualising cross-species phenotype information on the affected genes may help in routine diagnostics of CNVs.
Collapse
Affiliation(s)
- Sebastian Köhler
- Institute for Medical Genetics and Human Genetics, Charité-Universitätsmedizin Berlin,Berlin, Germany.,Berlin-Brandenburg Center for Regenerative Therapies (BCRT), Berlin, Germany
| | - Uwe Schoeneberg
- Foundation Institute Molecular Biology and Bioinformatics, Freie Universitaet Berlin, Berlin, Germany
| | | | - Sandra C Doelken
- Institute for Medical Genetics and Human Genetics, Charité-Universitätsmedizin Berlin,Berlin, Germany
| | - Jayne Y Hehir-Kwa
- Department of Human Genetics, Radboud University Medical Centre, Nijmegen, The Netherlands
| | - Jonas Ibn-Salem
- Institute for Medical Genetics and Human Genetics, Charité-Universitätsmedizin Berlin,Berlin, Germany
| | | | - Damian Smedley
- The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, UK
| | - Melissa A Haendel
- Department of Medical Informatics and Epidemiology and OHSU Library, Oregon Health & Science University, Portland, USA
| | - Peter N Robinson
- Institute for Medical Genetics and Human Genetics, Charité-Universitätsmedizin Berlin,Berlin, Germany.,Berlin-Brandenburg Center for Regenerative Therapies (BCRT), Berlin, Germany.,Max Planck Institute for Molecular Genetics, Berlin, Germany.,Department of Mathematics and Computer Science, Institute for Bioinformatics, Freie Universitaet Berlin, Berlin, Germany
| |
Collapse
|
49
|
Ibn-Salem J, Köhler S, Love MI, Chung HR, Huang N, Hurles ME, Haendel M, Washington NL, Smedley D, Mungall CJ, Lewis SE, Ott CE, Bauer S, Schofield PN, Mundlos S, Spielmann M, Robinson PN. Deletions of chromosomal regulatory boundaries are associated with congenital disease. Genome Biol 2014; 15:423. [PMID: 25315429 PMCID: PMC4180961 DOI: 10.1186/s13059-014-0423-1] [Citation(s) in RCA: 115] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2014] [Accepted: 07/24/2014] [Indexed: 12/21/2022] Open
Abstract
Background Recent data from genome-wide chromosome conformation capture analysis indicate that the human genome is divided into conserved megabase-sized self-interacting regions called topological domains. These topological domains form the regulatory backbone of the genome and are separated by regulatory boundary elements or barriers. Copy-number variations can potentially alter the topological domain architecture by deleting or duplicating the barriers and thereby allowing enhancers from neighboring domains to ectopically activate genes causing misexpression and disease, a mutational mechanism that has recently been termed enhancer adoption. Results We use the Human Phenotype Ontology database to relate the phenotypes of 922 deletion cases recorded in the DECIPHER database to monogenic diseases associated with genes in or adjacent to the deletions. We identify combinations of tissue-specific enhancers and genes adjacent to the deletion and associated with phenotypes in the corresponding tissue, whereby the phenotype matched that observed in the deletion. We compare this computationally with a gene-dosage pathomechanism that attempts to explain the deletion phenotype based on haploinsufficiency of genes located within the deletions. Up to 11.8% of the deletions could be best explained by enhancer adoption or a combination of enhancer adoption and gene-dosage effects. Conclusions Our results suggest that enhancer adoption caused by deletions of regulatory boundaries may contribute to a substantial minority of copy-number variation phenotypes and should thus be taken into account in their medical interpretation. Electronic supplementary material The online version of this article (doi:10.1186/s13059-014-0423-1) contains supplementary material, which is available to authorized users.
Collapse
|
50
|
Filges I, Friedman JM. Exome sequencing for gene discovery in lethal fetal disorders--harnessing the value of extreme phenotypes. Prenat Diagn 2014; 35:1005-9. [PMID: 25046514 DOI: 10.1002/pd.4464] [Citation(s) in RCA: 60] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2014] [Revised: 06/14/2014] [Accepted: 07/16/2014] [Indexed: 12/18/2022]
Abstract
Massively parallel sequencing has revolutionized our understanding of Mendelian disorders, and many novel genes have been discovered to cause disease phenotypes when mutant. At the same time, next-generation sequencing approaches have enabled non-invasive prenatal testing of free fetal DNA in maternal blood. However, little attention has been paid to using whole exome and genome sequencing strategies for gene identification in fetal disorders that are lethal in utero, because they can appear to be sporadic and Mendelian inheritance may be missed. We present challenges and advantages of applying next-generation sequencing approaches to gene discovery in fetal malformation phenotypes and review recent successful discovery approaches. We discuss the implication and significance of recessive inheritance and cross-species phenotyping in fetal lethal conditions. Whole exome sequencing can be used in individual families with undiagnosed lethal congenital anomaly syndromes to discover causal mutations, provided that prior to data analysis, the fetal phenotype can be correlated to a particular developmental pathway in embryogenesis. Cross-species phenotyping allows providing further evidence for causality of discovered variants in genes involved in those extremely rare phenotypes and will increase our knowledge about normal and abnormal human developmental processes. Ultimately, families will benefit from the option of early prenatal diagnosis.
Collapse
Affiliation(s)
- Isabel Filges
- Medical Genetics, Department of Biomedicine, University Hospital Basel, University of Basel, Basel, Switzerland.,Department of Medical Genetics, Children's and Women's Hospital, Child and Family Research Institute, University of British Columbia, Vancouver, Canada
| | - Jan M Friedman
- Department of Medical Genetics, Children's and Women's Hospital, Child and Family Research Institute, University of British Columbia, Vancouver, Canada
| |
Collapse
|