1
|
Dhombres F, Morgan P, Chaudhari BP, Filges I, Sparks TN, Lapunzina P, Roscioli T, Agarwal U, Aggarwal S, Beneteau C, Cacheiro P, Carmody LC, Collardeau‐Frachon S, Dempsey EA, Dufke A, Duyzend MH, el Ghosh M, Giordano JL, Glad R, Grinfelde I, Iliescu DG, Ladewig MS, Munoz‐Torres MC, Pollazzon M, Radio FC, Rodo C, Silva RG, Smedley D, Sundaramurthi JC, Toro S, Valenzuela I, Vasilevsky NA, Wapner RJ, Zemet R, Haendel MA, Robinson PN. Prenatal phenotyping: A community effort to enhance the Human Phenotype Ontology. AMERICAN JOURNAL OF MEDICAL GENETICS. PART C, SEMINARS IN MEDICAL GENETICS 2022; 190:231-242. [PMID: 35872606 PMCID: PMC9588534 DOI: 10.1002/ajmg.c.31989] [Citation(s) in RCA: 19] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/14/2022] [Accepted: 07/01/2022] [Indexed: 01/07/2023]
Abstract
Technological advances in both genome sequencing and prenatal imaging are increasing our ability to accurately recognize and diagnose Mendelian conditions prenatally. Phenotype-driven early genetic diagnosis of fetal genetic disease can help to strategize treatment options and clinical preventive measures during the perinatal period, to plan in utero therapies, and to inform parental decision-making. Fetal phenotypes of genetic diseases are often unique and at present are not well understood; more comprehensive knowledge about prenatal phenotypes and computational resources have an enormous potential to improve diagnostics and translational research. The Human Phenotype Ontology (HPO) has been widely used to support diagnostics and translational research in human genetics. To better support prenatal usage, the HPO consortium conducted a series of workshops with a group of domain experts in a variety of medical specialties, diagnostic techniques, as well as diseases and phenotypes related to prenatal medicine, including perinatal pathology, musculoskeletal anomalies, neurology, medical genetics, hydrops fetalis, craniofacial malformations, cardiology, neonatal-perinatal medicine, fetal medicine, placental pathology, prenatal imaging, and bioinformatics. We expanded the representation of prenatal phenotypes in HPO by adding 95 new phenotype terms under the Abnormality of prenatal development or birth (HP:0001197) grouping term, and revised definitions, synonyms, and disease annotations for most of the 152 terms that existed before the beginning of this effort. The expansion of prenatal phenotypes in HPO will support phenotype-driven prenatal exome and genome sequencing for precision genetic diagnostics of rare diseases to support prenatal care.
Collapse
Affiliation(s)
- Ferdinand Dhombres
- Sorbonne University, GRC26, INSERM, Limics, Armand Trousseau Hospital, Fetal Medicine Department, APHPParisFrance
| | - Patricia Morgan
- American College of Medical Genetics and Genomics, Newborn Screening Translational Research NetworkBethesdaMarylandUSA
| | - Bimal P. Chaudhari
- Institute for Genomic MedicineNationwide Children's HospitalColumbusOhioUSA
| | - Isabel Filges
- University Hospital Basel and University of Basel, Medical GeneticsBaselSwitzerland
| | - Teresa N. Sparks
- Department of Obstetrics, Gynecology, & Reproductive SciencesUniversity of California, San FranciscoSan FranciscoCaliforniaUSA
| | - Pablo Lapunzina
- CIBERER and Hospital Universitario La Paz, INGEMM‐Institute of Medical and Molecular GeneticsMadridSpain
| | - Tony Roscioli
- Neuroscience Research Australia (NeuRA), University of New South WalesSydneyNew South WalesAustralia
| | - Umber Agarwal
- Department of Maternal and Fetal MedicineLiverpool Women's NHS Foundation TrustLiverpoolUK
| | - Shagun Aggarwal
- Department of Medical GeneticsNizam's Institute of Medical SciencesHyderabadTelanganaIndia
| | - Claire Beneteau
- Service de Génétique Médicale, UF 9321 de Fœtopathologie et Génétique, CHU de NantesNantesFrance
| | - Pilar Cacheiro
- William Harvey Research InstituteQueen Mary University of LondonLondonUK
| | - Leigh C. Carmody
- Department of Genomic MedicineThe Jackson LaboratoryFarmingtonConnecticutUSA
| | | | - Esther A. Dempsey
- St George's University of London, Molecular and Clinical Sciences Research InstituteLondonUK
| | - Andreas Dufke
- University of Tübingen, Institute of Medical Genetics and Applied GenomicsTübingenGermany
| | | | | | - Jessica L. Giordano
- Department of Obstetrics and GynecologyColumbia University Irving Medical CenterNew YorkNew YorkUSA
| | - Ragnhild Glad
- Department of Obstetrics and GynecologyUniversity Hospital of North NorwayTromsøNorway
| | - Ieva Grinfelde
- Department of Medical Genetics and Prenatal diagnosisChildren's University HospitalRigaLatvia
| | - Dominic G. Iliescu
- Department of Obstetrics and GynecologyUniversity of Medicine and Pharmacy CraiovaCraiovaDoljRomania
| | - Markus S. Ladewig
- Department of OphthalmologyKlinikum SaarbrückenSaarbrückenSaarlandGermany
| | - Monica C. Munoz‐Torres
- Department of Biochemistry and Molecular GeneticsUniversity of Colorado Anschutz Medical CampusAuroraColoradoUSA
| | - Marzia Pollazzon
- Azienda USL‐IRCCS di Reggio EmiliaMedical Genetics UnitReggio EmiliaItaly
| | | | - Carlota Rodo
- Vall d'Hebron Hospital Campus, Maternal & Fetal MedicineBarcelonaSpain
| | - Raquel Gouveia Silva
- Hospital Santa Maria, Serviço de Genética, Departamento de PediatriaHospital de Santa Maria, Centro Hospitalar Universitário Lisboa Norte, Centro Académico de Medicina de LisboaLisboaPortugal
| | - Damian Smedley
- William Harvey Research InstituteQueen Mary University of LondonLondonUK
| | | | - Sabrina Toro
- Department of Biochemistry and Molecular GeneticsUniversity of Colorado Anschutz Medical CampusAuroraColoradoUSA
| | - Irene Valenzuela
- Hospital Vall d'Hebron, Clinical and Molecular Genetics AreaBarcelonaSpain
| | - Nicole A. Vasilevsky
- Department of Biochemistry and Molecular GeneticsUniversity of Colorado Anschutz Medical CampusAuroraColoradoUSA
| | - Ronald J. Wapner
- Department of Obstetrics and GynecologyColumbia University Irving Medical CenterNew YorkNew YorkUSA
| | - Roni Zemet
- Department of Molecular and Human GeneticsBaylor College of MedicineHoustonTexasUSA
| | - Melissa A Haendel
- Department of Biochemistry and Molecular GeneticsUniversity of Colorado Anschutz Medical CampusAuroraColoradoUSA
| | - Peter N. Robinson
- Department of Genomic MedicineThe Jackson LaboratoryFarmingtonConnecticutUSA
| |
Collapse
|
2
|
Seaby EG, Rehm HL, O’Donnell-Luria A. Strategies to Uplift Novel Mendelian Gene Discovery for Improved Clinical Outcomes. Front Genet 2021; 12:674295. [PMID: 34220947 PMCID: PMC8248347 DOI: 10.3389/fgene.2021.674295] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2021] [Accepted: 05/12/2021] [Indexed: 01/31/2023] Open
Abstract
Rare genetic disorders, while individually rare, are collectively common. They represent some of the most severe disorders affecting patients worldwide with significant morbidity and mortality. Over the last decade, advances in genomic methods have significantly uplifted diagnostic rates for patients and facilitated novel and targeted therapies. However, many patients with rare genetic disorders still remain undiagnosed as the genetic etiology of only a proportion of Mendelian conditions has been discovered to date. This article explores existing strategies to identify novel Mendelian genes and how these discoveries impact clinical care and therapeutics. We discuss the importance of data sharing, phenotype-driven approaches, patient-led approaches, utilization of large-scale genomic sequencing projects, constraint-based methods, integration of multi-omics data, and gene-to-patient methods. We further consider the health economic advantages of novel gene discovery and speculate on potential future methods for improved clinical outcomes.
Collapse
Affiliation(s)
- Eleanor G. Seaby
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, United States
- Genomic Informatics Group, University Hospital Southampton, Southampton, United Kingdom
- Center for Genomic Medicine, Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, United States
- Division of Genetics and Genomics, Boston Children’s Hospital, Boston, MA, United States
| | - Heidi L. Rehm
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, United States
- Center for Genomic Medicine, Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, United States
| | - Anne O’Donnell-Luria
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, United States
- Center for Genomic Medicine, Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, United States
- Division of Genetics and Genomics, Boston Children’s Hospital, Boston, MA, United States
- Manton Center for Orphan Disease Research, Boston Children’s Hospital, Boston, MA, United States
| |
Collapse
|
3
|
Lewis-Smith D, Galer PD, Balagura G, Kearney H, Ganesan S, Cosico M, O'Brien M, Vaidiswaran P, Krause R, Ellis CA, Thomas RH, Robinson PN, Helbig I. Modeling seizures in the Human Phenotype Ontology according to contemporary ILAE concepts makes big phenotypic data tractable. Epilepsia 2021; 62:1293-1305. [PMID: 33949685 PMCID: PMC8272408 DOI: 10.1111/epi.16908] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2020] [Revised: 02/19/2021] [Accepted: 04/01/2021] [Indexed: 01/08/2023]
Abstract
Objective: The clinical features of epilepsy determine how it is defined, which in turn guides management. Therefore, consideration of the fundamental clinical entities that comprise an epilepsy is essential in the study of causes, trajectories, and treatment responses. The Human Phenotype Ontology (HPO) is used widely in clinical and research genetics for concise communication and modeling of clinical features, allowing extracted data to be harmonized using logical inference. We sought to redesign the HPO seizure subontology to improve its consistency with current epileptological concepts, supporting the use of large clinical data sets in high-throughput clinical and research genomics. Methods: We created a new HPO seizure subontology based on the 2017 International League Against Epilepsy (ILAE) Operational Classification of Seizure Types, and integrated concepts of status epilepticus, febrile, reflex, and neonatal seizures at different levels of detail. We compared the HPO seizure subontology prior to, and following, our revision, according to the information that could be inferred about the seizures of 791 individuals from three independent cohorts: 2 previously published and 150 newly recruited individuals. Each cohort’s data were provided in a different format and harmonized using the two versions of the HPO. Results: The new seizure subontology increased the number of descriptive concepts for seizures 5-fold. The number of seizure descriptors that could be annotated to the cohort increased by 40% and the total amount of information about individuals’ seizures increased by 38%. The most important qualitative difference was the relationship of focal to bilateral tonic-clonic seizure to generalized-onset and focal-onset seizures.
Collapse
Affiliation(s)
- David Lewis-Smith
- Translational and Clinical Research Institute, Newcastle University, Newcastle-upon-Tyne, UK.,Department of Clinical Neurosciences, Royal Victoria Infirmary, Newcastle-upon-Tyne, UK
| | - Peter D Galer
- Division of Neurology, Children's Hospital of Philadelphia, Philadelphia, PA, USA.,The Epilepsy NeuroGenetics Initiative (ENGIN), Children's Hospital of Philadelphia, Philadelphia, PA, USA.,Department of Biomedical and Health Informatics (DBHi), Children's Hospital of Philadelphia, Philadelphia, PA, USA.,Department of Neurology, University of Pennsylvania, Philadelphia, PA, USA
| | - Ganna Balagura
- Medical Genetics Unit, IRCSS Giannina Gaslini Institute, Genoa, Italy
| | - Hugh Kearney
- FutureNeuro the SFI Research Centre for Chronic and Rare Neurological Diseases, Royal College of Surgeons in Ireland, Dublin, Ireland.,Department of Neurology, Beaumont Hospital, Dublin, Ireland
| | - Shiva Ganesan
- Division of Neurology, Children's Hospital of Philadelphia, Philadelphia, PA, USA.,The Epilepsy NeuroGenetics Initiative (ENGIN), Children's Hospital of Philadelphia, Philadelphia, PA, USA.,Department of Biomedical and Health Informatics (DBHi), Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - Mahgenn Cosico
- Division of Neurology, Children's Hospital of Philadelphia, Philadelphia, PA, USA.,The Epilepsy NeuroGenetics Initiative (ENGIN), Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - Margaret O'Brien
- Division of Neurology, Children's Hospital of Philadelphia, Philadelphia, PA, USA.,The Epilepsy NeuroGenetics Initiative (ENGIN), Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - Priya Vaidiswaran
- Division of Neurology, Children's Hospital of Philadelphia, Philadelphia, PA, USA.,The Epilepsy NeuroGenetics Initiative (ENGIN), Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - Roland Krause
- Luxembourg Centre for Systems Biomedicine, Université du Luxembourg, Esch-sur-Alzette, Luxembourg
| | - Colin A Ellis
- The Epilepsy NeuroGenetics Initiative (ENGIN), Children's Hospital of Philadelphia, Philadelphia, PA, USA.,Department of Neurology, University of Pennsylvania, Philadelphia, PA, USA
| | - Rhys H Thomas
- Translational and Clinical Research Institute, Newcastle University, Newcastle-upon-Tyne, UK.,Department of Clinical Neurosciences, Royal Victoria Infirmary, Newcastle-upon-Tyne, UK
| | - Peter N Robinson
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA.,Institute for Systems Genomics, University of Connecticut, Farmington, CT, USA
| | - Ingo Helbig
- Division of Neurology, Children's Hospital of Philadelphia, Philadelphia, PA, USA.,The Epilepsy NeuroGenetics Initiative (ENGIN), Children's Hospital of Philadelphia, Philadelphia, PA, USA.,Department of Biomedical and Health Informatics (DBHi), Children's Hospital of Philadelphia, Philadelphia, PA, USA.,Department of Neurology, University of Pennsylvania, Philadelphia, PA, USA
| |
Collapse
|
4
|
Rubinstein YR, Robinson PN, Gahl WA, Avillach P, Baynam G, Cederroth H, Goodwin RM, Groft SC, Hansson MG, Harris NL, Huser V, Mascalzoni D, McMurry JA, Might M, Nellaker C, Mons B, Paltoo DN, Pevsner J, Posada M, Rockett-Frase AP, Roos M, Rubinstein TB, Taruscio D, van Enckevort E, Haendel MA. The case for open science: rare diseases. JAMIA Open 2020; 3:472-486. [PMID: 33426479 PMCID: PMC7660964 DOI: 10.1093/jamiaopen/ooaa030] [Citation(s) in RCA: 26] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2020] [Revised: 05/30/2020] [Accepted: 06/23/2020] [Indexed: 01/04/2023] Open
Abstract
The premise of Open Science is that research and medical management will progress faster if data and knowledge are openly shared. The value of Open Science is nowhere more important and appreciated than in the rare disease (RD) community. Research into RDs has been limited by insufficient patient data and resources, a paucity of trained disease experts, and lack of therapeutics, leading to long delays in diagnosis and treatment. These issues can be ameliorated by following the principles and practices of sharing that are intrinsic to Open Science. Here, we describe how the RD community has adopted the core pillars of Open Science, adding new initiatives to promote care and research for RD patients and, ultimately, for all of medicine. We also present recommendations that can advance Open Science more globally.
Collapse
Affiliation(s)
- Yaffa R Rubinstein
- Special Volunteer in the Office of Strategic Initiatives, National Library of Medicine, Bethesda, Maryland, USA
| | - Peter N Robinson
- The Jackson Laboratory for Genomic Medicine, Farmington, Connecticut, USA
| | - William A Gahl
- Undiagnosed Diseases Program and Office of the Clinical Director, National Human Genome Research Institute (NHGRI), National Institutes of Health, Bethesda, Maryland, USA
| | - Paul Avillach
- Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts, USA
| | - Gareth Baynam
- Western Australian Register of Developmental Anomalies and Telethon Kids Institute, Perth, Australia
| | | | - Rebecca M Goodwin
- Department of Health and Human Services, National Library of Medicine, National Institutes of Health, Bethesda, Maryland, USA
| | - Stephen C Groft
- NCATS, National Institutes of Health, Bethesda, Maryland, USA
| | - Mats G Hansson
- Center for Research Ethics and Bioethics, Uppsala Universitet, Uppsala, Sweden
| | - Nomi L Harris
- Department of Environmental Genomics & System Biology, Lawrence Berkeley National Laboratory, Berkeley, California, USA
| | - Vojtech Huser
- Department of Health and Human Services, NCBI, National Institutes of Health, Bethesda, Maryland, USA
| | - Deborah Mascalzoni
- Center for Research Ethics and Bioethics, Uppsala University, Sweden and EURAC Research, Bolzano, Italy
| | - Julie A McMurry
- Linus Pauling Institute, Oregon State University, Corvallis, Oregon, USA
| | - Matthew Might
- Hugh Kaul Precision Medicine Institute, The University of Alabama at Birmingham, Birmingham, Alabama, USA
| | - Christoffer Nellaker
- Nuffield Department of Women's and Reproductive Health, Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford, UK
| | - Barend Mons
- Department of Human Genetics, Leiden University Medical Center, Leiden, Netherlands
| | - Dina N Paltoo
- Department of Health and Human Services, National Library of Medicine, National Institutes of Health, Bethesda, Maryland, USA
| | - Jonathan Pevsner
- Department of Neurology, Kennedy Krieger Institute and Department of Psychiatry and Behavioral Sciences, Johns Hopkins School of Medicine, Baltimore, Maryland, USA
| | - Manuel Posada
- Rare Diseases Research Institute & CIBERER, Instituto de Salud Carlos III, Madrid, Spain
| | | | - Marco Roos
- Human Genetics, Leiden University Medical Center, Leiden, Netherlands
| | - Tamar B Rubinstein
- Children Hospital at Montefiore/Albert Einstein College of Medicine—Pediatrics, Bronx, New York, USA
| | - Domenica Taruscio
- National Centre for Rare Diseases, Istituto Superiore di Sanità, Rome, Italy
| | - Esther van Enckevort
- Department of Genetics, University Medical Center Groningen, University of Groningen, Leiden, Netherlands
| | - Melissa A Haendel
- Linus Pauling Institute, Oregon State University, Corvallis, Oregon, USA
| |
Collapse
|
5
|
Köhler S. Improved ontology-based similarity calculations using a study-wise annotation model. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2018; 2018:4953405. [PMID: 29688377 PMCID: PMC5868182 DOI: 10.1093/database/bay026] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/01/2017] [Accepted: 02/20/2018] [Indexed: 11/13/2022]
Abstract
A typical use case of ontologies is the calculation of similarity scores between items that are annotated with classes of the ontology. For example, in differential diagnostics and disease gene prioritization, the human phenotype ontology (HPO) is often used to compare a query phenotype profile against gold-standard phenotype profiles of diseases or genes. The latter have long been constructed as flat lists of ontology classes, which, as we show in this work, can be improved by exploiting existing structure and information in annotation datasets or full text disease descriptions. We derive a study-wise annotation model of diseases and genes and show that this can improve the performance of semantic similarity measures. Inferred weights of individual annotations are one reason for this improvement, but more importantly using the study-wise structure further boosts the results of the algorithms according to precision-recall analyses. We test the study-wise annotation model for diseases annotated with classes from the HPO and for genes annotated with gene ontology (GO) classes. We incorporate this annotation model into similarity algorithms and show how this leads to improved performance. This work adds weight to the need for enhancing simple list-based representations of disease or gene annotations. We show how study-wise annotations can be automatically derived from full text summaries of disease descriptions and from the annotation data provided by the GO Consortium and how semantic similarity measure can utilize this extended annotation model. Database URL: https://phenomics.github.io/
Collapse
Affiliation(s)
- Sebastian Köhler
- NeuroCure Clinical Research Center, Charité Universitätsklinikum, Charitéplatz 1, 10117 Berlin, Germany
| |
Collapse
|
6
|
Abstract
Diagnosing rare diseases can be challenging for clinicians. This article gives an overview on novel approaches, which enable automated phenotype-driven analyses of differential diagnoses for rare diseases as well as genomic variation data of affected individuals. The focus lies on reliable methods for collating clinical phenotypic data and new algorithms for precise and robust assessment of the similarity between phenotypic profiles. The Human Phenotype Ontology project (HPO; www.human-phenotype-ontology.org ) provides an ontology for collating symptoms and clinical phenotypic abnormalities. Using ontologies makes it possible to capture these data in a precise and comprehensive fashion as well as to apply reliable and robust automated analyses. Tools, such as the Phenomizer, enable the algorithmic calculation of similarity values amongst patients or between patients and disease descriptions. Such digital tools represent a solid foundation for differential diagnostic applications. Many rare diseases have a strong genetic component but the analysis of the coding DNA variants in rare disease patients is an enormously complex procedure, which often impedes successful molecular diagnostics. In this situation a combined analysis of the patients HPO-coded phenotypic features and the genomic characteristics of the variants can be of substantial help. In this case the HPO project and the associated algorithms are helpful: it is therefore an important component for phenotype-driven translational research and prioritization of disease-relavant genomic variations.
Collapse
Affiliation(s)
- S Köhler
- Berlin Institute of Health (BIH), Anna-Louisa-Karsch-Str. 2, 10178, Berlin, Deutschland.
- Einstein Center Digital Future, Wilhelmstr. 67, 10117, Berlin, Deutschland.
- NeuroCure Clinical Research Center, Charité-Universitätsmedizin Berlin, Augustenburger Platz 1, 13353, Berlin, Deutschland.
| |
Collapse
|
7
|
Exploring Approaches for Detecting Protein Functional Similarity within an Orthology-based Framework. Sci Rep 2017; 7:381. [PMID: 28336965 PMCID: PMC5428484 DOI: 10.1038/s41598-017-00465-5] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2016] [Accepted: 02/28/2017] [Indexed: 11/21/2022] Open
Abstract
Protein functional similarity based on gene ontology (GO) annotations serves as a powerful tool when comparing proteins on a functional level in applications such as protein-protein interaction prediction, gene prioritization, and disease gene discovery. Functional similarity (FS) is usually quantified by combining the GO hierarchy with an annotation corpus that links genes and gene products to GO terms. One large group of algorithms involves calculation of GO term semantic similarity (SS) between all the terms annotating the two proteins, followed by a second step, described as “mixing strategy”, which involves combining the SS values to yield the final FS value. Due to the variability of protein annotation caused e.g. by annotation bias, this value cannot be reliably compared on an absolute scale. We therefore introduce a similarity z-score that takes into account the FS background distribution of each protein. For a selection of popular SS measures and mixing strategies we demonstrate moderate accuracy improvement when using z-scores in a benchmark that aims to separate orthologous cases from random gene pairs and discuss in this context the impact of annotation corpus choice. The approach has been implemented in Frela, a fast high-throughput public web server for protein FS calculation and interpretation.
Collapse
|
8
|
Kulmanov M, Hoehndorf R. Evaluating the effect of annotation size on measures of semantic similarity. J Biomed Semantics 2017; 8:7. [PMID: 28193260 PMCID: PMC5307803 DOI: 10.1186/s13326-017-0119-z] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2016] [Accepted: 02/01/2017] [Indexed: 01/29/2023] Open
Abstract
Background Ontologies are widely used as metadata in biological and biomedical datasets. Measures of semantic similarity utilize ontologies to determine how similar two entities annotated with classes from ontologies are, and semantic similarity is increasingly applied in applications ranging from diagnosis of disease to investigation in gene networks and functions of gene products. Results Here, we analyze a large number of semantic similarity measures and the sensitivity of similarity values to the number of annotations of entities, difference in annotation size and to the depth or specificity of annotation classes. We find that most similarity measures are sensitive to the number of annotations of entities, difference in annotation size as well as to the depth of annotation classes; well-studied and richly annotated entities will usually show higher similarity than entities with only few annotations even in the absence of any biological relation. Conclusions Our findings may have significant impact on the interpretation of results that rely on measures of semantic similarity, and we demonstrate how the sensitivity to annotation size can lead to a bias when using semantic similarity to predict protein-protein interactions. Electronic supplementary material The online version of this article (doi:10.1186/s13326-017-0119-z) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Maxat Kulmanov
- Computational Bioscience Research Center, King Abdullah University of Science and Technology, Thuwal, 23955-6900, Saudi Arabia.,Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology, Thuwal, 23955-6900, Saudi Arabia
| | - Robert Hoehndorf
- Computational Bioscience Research Center, King Abdullah University of Science and Technology, Thuwal, 23955-6900, Saudi Arabia. .,Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology, Thuwal, 23955-6900, Saudi Arabia.
| |
Collapse
|
9
|
Smedley D, Robinson PN. Phenotype-driven strategies for exome prioritization of human Mendelian disease genes. Genome Med 2015; 7:81. [PMID: 26229552 PMCID: PMC4520011 DOI: 10.1186/s13073-015-0199-2] [Citation(s) in RCA: 77] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/16/2023] Open
Abstract
Whole exome sequencing has altered the way in which rare diseases are diagnosed and disease genes identified. Hundreds of novel disease-associated genes have been characterized by whole exome sequencing in the past five years, yet the identification of disease-causing mutations is often challenging because of the large number of rare variants that are being revealed. Gene prioritization aims to rank the most probable candidate genes towards the top of a list of potentially pathogenic variants. A promising new approach involves the computational comparison of the phenotypic abnormalities of the individual being investigated with those previously associated with human diseases or genetically modified model organisms. In this review, we compare and contrast the strengths and weaknesses of current phenotype-driven computational algorithms, including Phevor, Phen-Gen, eXtasy and two algorithms developed by our groups called PhenIX and Exomiser. Computational phenotype analysis can substantially improve the performance of exome analysis pipelines.
Collapse
Affiliation(s)
- Damian Smedley
- />Skarnes Faculty Group, Wellcome Trust Sanger Institute, Hinxton, UK
| | - Peter N. Robinson
- />Institute for Medical Genetics and Human Genetics, Charité-Universitätsmedizin Berlin, Berlin, Germany
- />Max Planck Institute for Molecular Genetics, Ihnestrasse, 14195 Berlin, Germany
- />Berlin Brandenburg Center for Regenerative Therapies (BCRT), Charité-Universitätsmedizin Berlin, Augustenburger Platz, 13353 Berlin, Germany
- />Institute for Bioinformatics, Department of Mathematics and Computer Science, Freie Universität Berlin, Takustrasse, 14195 Berlin, Germany
| |
Collapse
|
10
|
Bauer S, Köhler S, Schulz MH, Robinson PN. Bayesian ontology querying for accurate and noise-tolerant semantic searches. Bioinformatics 2012; 28:2502-8. [PMID: 22843981 DOI: 10.1093/bioinformatics/bts471] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Ontologies provide a structured representation of the concepts of a domain of knowledge as well as the relations between them. Attribute ontologies are used to describe the characteristics of the items of a domain, such as the functions of proteins or the signs and symptoms of disease, which opens the possibility of searching a database of items for the best match to a list of observed or desired attributes. However, naive search methods do not perform well on realistic data because of noise in the data, imprecision in typical queries and because individual items may not display all attributes of the category they belong to. RESULTS We present a method for combining ontological analysis with Bayesian networks to deal with noise, imprecision and attribute frequencies and demonstrate an application of our method as a differential diagnostic support system for human genetics. AVAILABILITY We provide an implementation for the algorithm and the benchmark at http://compbio.charite.de/boqa/. CONTACT Sebastian.Bauer@charite.de or Peter.Robinson@charite.de SUPPLEMENTARY INFORMATION Supplementary Material for this article is available at Bioinformatics online.
Collapse
Affiliation(s)
- Sebastian Bauer
- Institute for Medical Genetics and Human Genetics, Charité-Universitätsmedizin Berlin, Augustenburger Platz 1, 13353 Berlin, Germany.
| | | | | | | |
Collapse
|