1
|
Cacheiro P, Lawson S, Van den Veyver IB, Marengo G, Zocche D, Murray SA, Duyzend M, Robinson PN, Smedley D. Lethal phenotypes in Mendelian disorders. Genet Med 2024; 26:101141. [PMID: 38629401 DOI: 10.1016/j.gim.2024.101141] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2024] [Revised: 04/08/2024] [Accepted: 04/09/2024] [Indexed: 04/26/2024] Open
Abstract
PURPOSE Existing resources that characterize the essentiality status of genes are based on either proliferation assessment in human cell lines, viability evaluation in mouse knockouts, or constraint metrics derived from human population sequencing studies. Several repositories document phenotypic annotations for rare disorders; however, there is a lack of comprehensive reporting on lethal phenotypes. METHODS We queried Online Mendelian Inheritance in Man for terms related to lethality and classified all Mendelian genes according to the earliest age of death recorded for the associated disorders, from prenatal death to no reports of premature death. We characterized the genes across these lethality categories, examined the evidence on viability from mouse models and explored how this information could be used for novel gene discovery. RESULTS We developed the Lethal Phenotypes Portal to showcase this curated catalog of human essential genes. Differences in the mode of inheritance, physiological systems affected, and disease class were found for genes in different lethality categories, as well as discrepancies between the lethal phenotypes observed in mouse and human. CONCLUSION We anticipate that this resource will aid clinicians in the diagnosis of early lethal conditions and assist researchers in investigating the properties that make these genes essential for human development.
Collapse
Affiliation(s)
- Pilar Cacheiro
- William Harvey Research Institute, Queen Mary University of London, London, United Kingdom
| | - Samantha Lawson
- ITS Research, Queen Mary University of London, London, United Kingdom
| | - Ignatia B Van den Veyver
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX; Department of Obstetrics and Gynecology, Baylor College of Medicine, Houston, TX
| | - Gabriel Marengo
- William Harvey Research Institute, Queen Mary University of London, London, United Kingdom
| | - David Zocche
- North West Thames Regional Genetics Service, Northwick Park and St Mark's Hospitals, London, United Kingdom
| | | | - Michael Duyzend
- Massachusetts General Hospital, Boston, MA; Program in Medical and Population Genetics, The Broad Institute of MIT and Harvard, Cambridge, MA; Division of Genetics and Genomics, Department of Pediatrics, Boston Children's Hospital and Harvard Medical School, Boston, MA
| | - Peter N Robinson
- Berlin Institute of Health at Charité-Universitätsmedizin Berlin, Berlin, Germany
| | - Damian Smedley
- William Harvey Research Institute, Queen Mary University of London, London, United Kingdom.
| |
Collapse
|
2
|
Bridges Y, de Souza V, Cortes KG, Haendel M, Harris NL, Korn DR, Marinakis NM, Matentzoglu N, McLaughlin JA, Mungall CJ, Osumi-Sutherland D, Robinson PN, Smedley D, Jacobsen JO. Towards a standard benchmark for variant and gene prioritisation algorithms: PhEval - Phenotypic inference Evaluation framework. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.06.13.598672. [PMID: 38915571 PMCID: PMC11195176 DOI: 10.1101/2024.06.13.598672] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/26/2024]
Abstract
Background Computational approaches to support rare disease diagnosis are challenging to build, requiring the integration of complex data types such as ontologies, gene-to-phenotype associations, and cross-species data into variant and gene prioritisation algorithms (VGPAs). However, the performance of VGPAs has been difficult to measure and is impacted by many factors, for example, ontology structure, annotation completeness or changes to the underlying algorithm. Assertions of the capabilities of VGPAs are often not reproducible, in part because there is no standardised, empirical framework and openly available patient data to assess the efficacy of VGPAs - ultimately hindering the development of effective prioritisation tools. Results In this paper, we present our benchmarking tool, PhEval, which aims to provide a standardised and empirical framework to evaluate phenotype-driven VGPAs. The inclusion of standardised test corpora and test corpus generation tools in the PhEval suite of tools allows open benchmarking and comparison of methods on standardised data sets. Conclusions PhEval and the standardised test corpora solve the issues of patient data availability and experimental tooling configuration when benchmarking and comparing rare disease VGPAs. By providing standardised data on patient cohorts from real-world case-reports and controlling the configuration of evaluated VGPAs, PhEval enables transparent, portable, comparable and reproducible benchmarking of VGPAs. As these tools are often a key component of many rare disease diagnostic pipelines, a thorough and standardised method of assessment is essential for improving patient diagnosis and care.
Collapse
Affiliation(s)
- Yasemin Bridges
- William Harvey Research Institute, Barts and the London School of Medicine and Dentistry, Queen Mary University of London, London, EC1M 6BQ, UK
| | - Vinicius de Souza
- European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Katherina G Cortes
- School of Public Health, University of Colorado Anschutz Medical Campus, Aurora, CO, 80045, USA
| | - Melissa Haendel
- Department of Genetics, University of North Carolina, Chapel Hill, Chapel Hill, NC, 27599, USA
| | - Nomi L Harris
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
| | - Daniel R Korn
- Department of Genetics, University of North Carolina, Chapel Hill, Chapel Hill, NC, 27599, USA
| | - Nikolaos M Marinakis
- Laboratory of Medical Genetics, National and Kapodistrian University of Athens, Athens, 11527, Greece
| | | | - James A McLaughlin
- Samples, Phenotypes, and Ontologies (SPOT), European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Christopher J Mungall
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
| | | | - Peter N Robinson
- Berlin Institute of Health, Charité - Universitätsmedizin Berlin, Berlin, 10117, Germany
| | - Damian Smedley
- William Harvey Research Institute, Barts and the London School of Medicine and Dentistry, Queen Mary University of London, London, EC1M 6BQ, UK
| | - Julius Ob Jacobsen
- William Harvey Research Institute, Barts and the London School of Medicine and Dentistry, Queen Mary University of London, London, EC1M 6BQ, UK
| |
Collapse
|
3
|
Mullen KR, Tammen I, Matentzoglu NA, Mather M, Mungall CJ, Haendel MA, Nicholas FW, Toro S. The Vertebrate Breed Ontology: Towards Effective Breed Data Standardization. ARXIV 2024:arXiv:2406.02623v1. [PMID: 38883236 PMCID: PMC11177956] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/18/2024]
Abstract
Background – Limited universally adopted data standards in veterinary science hinders data interoperability and therefore integration and comparison; this ultimately impedes application of existing information-based tools to support advancement in veterinary diagnostics, treatments, and precision medicine. Hypothesis/Objectives – Creation of a Vertebrate Breed Ontology (VBO) as a single, coherent logic-based standard for documenting breed names in animal health, production and research-related records will improve data use capabilities in veterinary and comparative medicine. Animals – No live animals were used in this study. Methods – A list of breed names and related information was compiled from relevant sources, organizations, communities, and experts using manual and computational approaches to create VBO. Each breed is represented by a VBO term that includes all provenance and the breed's related information as metadata. VBO terms are classified using description logic to allow computational applications and Artificial Intelligence-readiness. Results – VBO is an open, community-driven ontology representing over 19,000 livestock and companion animal breeds covering 41 species. Breeds are classified based on community and expert conventions (e.g., horse breed, cattle breed). This classification is supported by relations to the breeds' genus and species indicated by NCBI Taxonomy terms. Relationships between VBO terms, e.g. relating breeds to their foundation stock, provide additional context to support advanced data analytics. VBO term metadata includes common names and synonyms, breed identifiers/codes, and attributed cross-references to other databases. Conclusion and clinical importance – Veterinary data interoperability and computability can be enhanced by the adoption of VBO as a source of standard breed names in databases and veterinary electronic health records.
Collapse
|
4
|
Danis D, Bamshad MJ, Bridges Y, Cacheiro P, Carmody LC, Chong JX, Coleman B, Dalgleish R, Freeman PJ, Graefe ASL, Groza T, Jacobsen JOB, Klocperk A, Kusters M, Ladewig MS, Marcello AJ, Mattina T, Mungall CJ, Munoz-Torres MC, Reese JT, Rehburg F, Reis BCS, Schuetz C, Smedley D, Strauss T, Sundaramurthi JC, Thun S, Wissink K, Wagstaff JF, Zocche D, Haendel MA, Robinson PN. A corpus of GA4GH Phenopackets: case-level phenotyping for genomic diagnostics and discovery. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.05.29.24308104. [PMID: 38854034 PMCID: PMC11160806 DOI: 10.1101/2024.05.29.24308104] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2024]
Abstract
The Global Alliance for Genomics and Health (GA4GH) Phenopacket Schema was released in 2022 and approved by ISO as a standard for sharing clinical and genomic information about an individual, including phenotypic descriptions, numerical measurements, genetic information, diagnoses, and treatments. A phenopacket can be used as an input file for software that supports phenotype-driven genomic diagnostics and for algorithms that facilitate patient classification and stratification for identifying new diseases and treatments. There has been a great need for a collection of phenopackets to test software pipelines and algorithms. Here, we present phenopacket-store. Version 0.1.12 of phenopacket-store includes 4916 phenopackets representing 277 Mendelian and chromosomal diseases associated with 236 genes, and 2872 unique pathogenic alleles curated from 605 different publications. This represents the first large-scale collection of case-level, standardized phenotypic information derived from case reports in the literature with detailed descriptions of the clinical data and will be useful for many purposes, including the development and testing of software for prioritizing genes and diseases in diagnostic genomics, machine learning analysis of clinical phenotype data, patient stratification, and genotype-phenotype correlations. This corpus also provides best-practice examples for curating literature-derived data using the GA4GH Phenopacket Schema.
Collapse
Affiliation(s)
- Daniel Danis
- The Jackson Institute for Genomic Medicine, 10 Discovery Drive, Farmington CT 06032, USA
- Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Berlin, Germany
| | - Michael J Bamshad
- Department of Pediatrics, Division of Genetic Medicine, University of Washington, 1959 NE Pacific Street, Box 357371, Seattle, WA 98195, USA
- Brotman-Baty Institute for Precision Medicine, 1959 NE Pacific Street, Box 357657, Seattle WA 98195, USA
- Department of Pediatrics, Division of Genetic Medicine, Seattle Children's Hospital, Seattle, WA 98195, USA
| | - Yasemin Bridges
- William Harvey Research Institute, Queen Mary University of London, London, UK
| | - Pilar Cacheiro
- William Harvey Research Institute, Queen Mary University of London, London, UK
| | - Leigh C Carmody
- The Jackson Institute for Genomic Medicine, 10 Discovery Drive, Farmington CT 06032, USA
| | - Jessica X Chong
- Department of Pediatrics, Division of Genetic Medicine, University of Washington, 1959 NE Pacific Street, Box 357371, Seattle, WA 98195, USA
- Brotman-Baty Institute for Precision Medicine, 1959 NE Pacific Street, Box 357657, Seattle WA 98195, USA
| | - Ben Coleman
- Department of Genetics and Genome Sciences, University of Connecticut Health Center, Farmington, CT, USA
- The Jackson Institute for Genomic Medicine, 10 Discovery Drive, Farmington CT 06032, USA
| | - Raymond Dalgleish
- Department of Genetics and Genome Biology, University of Leicester, Leicester, UK
| | - Peter J Freeman
- Division of Informatics, Imaging and Data Science, The University of Manchester, Manchester, UK
| | - Adam S L Graefe
- Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Berlin, Germany
| | - Tudor Groza
- Rare Care Centre, Perth Children's Hospital, Nedlands, WA 6009, Australia
- SingHealth Duke-NUS Institute of Precision Medicine, 5 Hospital Drive Level 9, Singapore 169609, Singapore
- Telethon Kids Institute, Nedlands, WA 6009, Australia
| | - Julius O B Jacobsen
- William Harvey Research Institute, Queen Mary University of London, London, UK
| | - Adam Klocperk
- Department of Immunology, 2nd Faculty of Medicine, Charles University and University Hospital in Motol, Prague, Czech Republic
| | - Maaike Kusters
- Department of Paediatric Immunology, Great Ormond Street Hospital for Children NHS Foundation Trust, London, UK
- University College London Institute of Child Health, London, United Kingdom
| | - Markus S Ladewig
- Department of Ophthalmology, University Clinic Marburg - Campus Fulda, Fulda, Germany
| | - Anthony J Marcello
- Department of Pediatrics, Division of Genetic Medicine, University of Washington, 1959 NE Pacific Street, Box 357371, Seattle, WA 98195, USA
| | - Teresa Mattina
- Medica Genetics University of Catania Italy
- Morgagni foundation and Clinic, Catania, Italy
| | - Christopher J Mungall
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Monica C Munoz-Torres
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Ccampus
| | - Justin T Reese
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Filip Rehburg
- Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Berlin, Germany
| | - Bárbara C S Reis
- Department of Immunology, National Institute of Women's, Children's and Adolescents' Health Fernandes Figueira, Rio de Janeiro, Brazil
- High Complexity Laboratory, National Institute of Women's, Children's and Adolescents' Health Fernandes Figueira, Rio de Janeiro, Brazil
| | - Catharina Schuetz
- Department of Pediatrics, Faculty of Medicine and University Hospital Carl Gustav Carus, Technische Universität Dresden, Dresden, Germany
- University Center for Rare Diseases, Faculty of Medicine and University Hospital Carl Gustav Carus, Technische Universität Dresden, Dresden, Germany
| | - Damian Smedley
- William Harvey Research Institute, Queen Mary University of London, London, UK
| | - Timmy Strauss
- Department of Pediatrics, Faculty of Medicine and University Hospital Carl Gustav Carus, Technische Universität Dresden, Dresden, Germany
- University Center for Rare Diseases, Faculty of Medicine and University Hospital Carl Gustav Carus, Technische Universität Dresden, Dresden, Germany
| | | | - Sylvia Thun
- Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Berlin, Germany
| | - Kyran Wissink
- Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Berlin, Germany
- Utrecht University, Utrecht, the Netherlands
| | | | - David Zocche
- North West Thames Regional Genetics Service, Northwick Park & St Mark's Hospitals, London, UK
| | | | - Peter N Robinson
- Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Berlin, Germany
- The Jackson Institute for Genomic Medicine, 10 Discovery Drive, Farmington CT 06032, USA
- ELLIS-European Laboratory for Learning and Intelligent Systems
| |
Collapse
|
5
|
Cacheiro P, Lawson S, Van den Veyver IB, Marengo G, Zocche D, Murray SA, Duyzend M, Robinson PN, Smedley D. Lethal phenotypes in Mendelian disorders. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.01.12.24301168. [PMID: 38260283 PMCID: PMC10802756 DOI: 10.1101/2024.01.12.24301168] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/24/2024]
Abstract
Essential genes are those whose function is required for cell proliferation and/or organism survival. A gene's intolerance to loss-of-function can be allocated within a spectrum, as opposed to being considered a binary feature, since this function might be essential at different stages of development, genetic backgrounds or other contexts. Existing resources that collect and characterise the essentiality status of genes are based on either proliferation assessment in human cell lines, embryonic and postnatal viability evaluation in different model organisms, and gene metrics such as intolerance to variation scores derived from human population sequencing studies. There are also several repositories available that document phenotypic annotations for rare disorders in humans such as the Online Mendelian Inheritance in Man (OMIM) and the Human Phenotype Ontology (HPO) knowledgebases. This raises the prospect of being able to use clinical data, including lethality as the most severe phenotypic manifestation, to further our characterisation of gene essentiality. Here we queried OMIM for terms related to lethality and classified all Mendelian genes into categories, according to the earliest age of death recorded for the associated disorders, from prenatal death to no reports of premature death. To showcase this curated catalogue of human essential genes, we developed the Lethal Phenotypes Portal (https://lethalphenotypes.research.its.qmul.ac.uk), where we also explore the relationships between these lethality categories, constraint metrics and viability in cell lines and mouse. Further analysis of the genes in these categories reveals differences in the mode of inheritance of the associated disorders, physiological systems affected and disease class. We highlight how the phenotypic similarity between genes in the same lethality category combined with gene family/group information can be used for novel disease gene discovery. Finally, we explore the overlaps and discrepancies between the lethal phenotypes observed in mouse and human and discuss potential explanations that include differences in transcriptional regulation, functional compensation and molecular disease mechanisms. We anticipate that this resource will aid clinicians in the diagnosis of early lethal conditions and assist researchers in investigating the properties that make these genes essential for human development.
Collapse
Affiliation(s)
- Pilar Cacheiro
- William Harvey Research Institute, Queen Mary University of London, London, UK
| | | | - Ignatia B. Van den Veyver
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
- Department of Obstetrics and Gynecology, Baylor College of Medicine, Houston, TX, USA
| | - Gabriel Marengo
- William Harvey Research Institute, Queen Mary University of London, London, UK
| | - David Zocche
- North West Thames Regional Genetics Service, Northwick Park & St Mark’s Hospitals, London, UK
| | | | | | - Peter N. Robinson
- Berlin Institute of Health at Charité-Universitätsmedizin Berlin, Berlin, Germany
| | - Damian Smedley
- William Harvey Research Institute, Queen Mary University of London, London, UK
| |
Collapse
|
6
|
Rigden DJ, Fernández XM. The 2024 Nucleic Acids Research database issue and the online molecular biology database collection. Nucleic Acids Res 2024; 52:D1-D9. [PMID: 38035367 PMCID: PMC10767945 DOI: 10.1093/nar/gkad1173] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2023] [Accepted: 11/23/2023] [Indexed: 12/02/2023] Open
Abstract
The 2024 Nucleic Acids Research database issue contains 180 papers from across biology and neighbouring disciplines. There are 90 papers reporting on new databases and 83 updates from resources previously published in the Issue. Updates from databases most recently published elsewhere account for a further seven. Nucleic acid databases include the new NAKB for structural information and updates from Genbank, ENA, GEO, Tarbase and JASPAR. The Issue's Breakthrough Article concerns NMPFamsDB for novel prokaryotic protein families and the AlphaFold Protein Structure Database has an important update. Metabolism is covered by updates from Reactome, Wikipathways and Metabolights. Microbes are covered by RefSeq, UNITE, SPIRE and P10K; viruses by ViralZone and PhageScope. Medically-oriented databases include the familiar COSMIC, Drugbank and TTD. Genomics-related resources include Ensembl, UCSC Genome Browser and Monarch. New arrivals cover plant imaging (OPIA and PlantPAD) and crop plants (SoyMD, TCOD and CropGS-Hub). The entire Database Issue is freely available online on the Nucleic Acids Research website (https://academic.oup.com/nar). Over the last year the NAR online Molecular Biology Database Collection has been updated, reviewing 1060 entries, adding 97 new resources and eliminating 388 discontinued URLs bringing the current total to 1959 databases. It is available at http://www.oxfordjournals.org/nar/database/c/.
Collapse
Affiliation(s)
- Daniel J Rigden
- Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Crown Street, Liverpool L69 7ZB, UK
| | | |
Collapse
|