1
|
Hramyka D, Sczakiel HL, Zhao MX, Stolpe O, Nieminen M, Adam R, Danyel M, Einicke L, Hägerling R, Knaus A, Mundlos S, Schwartzmann S, Seelow D, Ehmke N, Mensah M, Boschann F, Beule D, Holtgrewe M. REEV: review, evaluate and explain variants. Nucleic Acids Res 2024; 52:W148-W158. [PMID: 38769069 PMCID: PMC11223839 DOI: 10.1093/nar/gkae366] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2024] [Revised: 04/07/2024] [Accepted: 05/03/2024] [Indexed: 05/22/2024] Open
Abstract
In the era of high throughput sequencing, special software is required for the clinical evaluation of genetic variants. We developed REEV (Review, Evaluate and Explain Variants), a user-friendly platform for clinicians and researchers in the field of rare disease genetics. Supporting data was aggregated from public data sources. We compared REEV with seven other tools for clinical variant evaluation. REEV (semi-)automatically fills individual ACMG criteria facilitating variant interpretation. REEV can store disease and phenotype data related to a case to use these for phenotype similarity measures. Users can create public permanent links for individual variants that can be saved as browser bookmarks and shared. REEV may help in the fast diagnostic assessment of genetic variants in a clinical as well as in a research context. REEV (https://reev.bihealth.org/) is free and open to all users and there is no login requirement.
Collapse
Affiliation(s)
- Dzmitry Hramyka
- Berlin Institute of Health, Core Unit Bioinformatics, Berlin, Germany
| | - Henrike Lisa Sczakiel
- Institute of Medical Genetics and Human Genetics, Charité - Universitätsmedizin Berlin, Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany
- BIH Biomedical Innovation Academy, Clinician Scientist Program, Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Berlin, Germany
- RG Development & Disease, Max Planck Institute for Molecular Genetics, Berlin, Germany
| | - Max Xiaohang Zhao
- Berlin Institute of Health, Core Unit Bioinformatics, Berlin, Germany
- Institute of Medical Genetics and Human Genetics, Charité - Universitätsmedizin Berlin, Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany
- Berlin Institute of Health, Berlin, Germany
| | - Oliver Stolpe
- Berlin Institute of Health, Core Unit Bioinformatics, Berlin, Germany
| | - Mikko Nieminen
- Berlin Institute of Health, Core Unit Bioinformatics, Berlin, Germany
| | - Ronja Adam
- Institute of Medical Genetics and Human Genetics, Charité - Universitätsmedizin Berlin, Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany
- Berlin Institute of Health, Berlin, Germany
| | - Magdalena Danyel
- Institute of Medical Genetics and Human Genetics, Charité - Universitätsmedizin Berlin, Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany
- BIH Biomedical Innovation Academy, Clinician Scientist Program, Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Berlin, Germany
| | - Lara Einicke
- Institute of Medical Genetics and Human Genetics, Charité - Universitätsmedizin Berlin, Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany
- Berlin Institute of Health, Berlin, Germany
| | - René Hägerling
- Institute of Medical Genetics and Human Genetics, Charité - Universitätsmedizin Berlin, Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany
- BIH Biomedical Innovation Academy, Clinician Scientist Program, Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Berlin, Germany
- RG Development & Disease, Max Planck Institute for Molecular Genetics, Berlin, Germany
- Berlin Institute of Health, BIH Center for Regenerative Therapies, Berlin, Germany
| | - Alexej Knaus
- Institute for Genomic Statistics and Bioinformatics, University Hospital Bonn, Germany
| | - Stefan Mundlos
- Institute of Medical Genetics and Human Genetics, Charité - Universitätsmedizin Berlin, Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany
- RG Development & Disease, Max Planck Institute for Molecular Genetics, Berlin, Germany
| | - Sarina Schwartzmann
- Institute of Medical Genetics and Human Genetics, Charité - Universitätsmedizin Berlin, Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany
| | - Dominik Seelow
- Institute of Medical Genetics and Human Genetics, Charité - Universitätsmedizin Berlin, Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany
- Berlin Institute of Health, Berlin, Germany
| | - Nadja Ehmke
- Institute of Medical Genetics and Human Genetics, Charité - Universitätsmedizin Berlin, Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany
- Berlin Institute of Health, Berlin, Germany
| | - Martin Atta Mensah
- Institute of Medical Genetics and Human Genetics, Charité - Universitätsmedizin Berlin, Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany
- BIH Biomedical Innovation Academy, Digital Clinician Scientist Program, Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Berlin, Germany
| | - Felix Boschann
- Institute of Medical Genetics and Human Genetics, Charité - Universitätsmedizin Berlin, Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany
- BIH Biomedical Innovation Academy, Clinician Scientist Program, Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Berlin, Germany
- Berlin Institute of Health, Berlin, Germany
| | - Dieter Beule
- Berlin Institute of Health, Core Unit Bioinformatics, Berlin, Germany
- Max Delbrück Center for Molecular Medicine in the Helmholtz Association, Berlin, Germany
| | - Manuel Holtgrewe
- Berlin Institute of Health, Core Unit Bioinformatics, Berlin, Germany
| |
Collapse
|
2
|
Bridges Y, de Souza V, Cortes KG, Haendel M, Harris NL, Korn DR, Marinakis NM, Matentzoglu N, McLaughlin JA, Mungall CJ, Osumi-Sutherland D, Robinson PN, Smedley D, Jacobsen JO. Towards a standard benchmark for variant and gene prioritisation algorithms: PhEval - Phenotypic inference Evaluation framework. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.06.13.598672. [PMID: 38915571 PMCID: PMC11195176 DOI: 10.1101/2024.06.13.598672] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/26/2024]
Abstract
Background Computational approaches to support rare disease diagnosis are challenging to build, requiring the integration of complex data types such as ontologies, gene-to-phenotype associations, and cross-species data into variant and gene prioritisation algorithms (VGPAs). However, the performance of VGPAs has been difficult to measure and is impacted by many factors, for example, ontology structure, annotation completeness or changes to the underlying algorithm. Assertions of the capabilities of VGPAs are often not reproducible, in part because there is no standardised, empirical framework and openly available patient data to assess the efficacy of VGPAs - ultimately hindering the development of effective prioritisation tools. Results In this paper, we present our benchmarking tool, PhEval, which aims to provide a standardised and empirical framework to evaluate phenotype-driven VGPAs. The inclusion of standardised test corpora and test corpus generation tools in the PhEval suite of tools allows open benchmarking and comparison of methods on standardised data sets. Conclusions PhEval and the standardised test corpora solve the issues of patient data availability and experimental tooling configuration when benchmarking and comparing rare disease VGPAs. By providing standardised data on patient cohorts from real-world case-reports and controlling the configuration of evaluated VGPAs, PhEval enables transparent, portable, comparable and reproducible benchmarking of VGPAs. As these tools are often a key component of many rare disease diagnostic pipelines, a thorough and standardised method of assessment is essential for improving patient diagnosis and care.
Collapse
Affiliation(s)
- Yasemin Bridges
- William Harvey Research Institute, Barts and the London School of Medicine and Dentistry, Queen Mary University of London, London, EC1M 6BQ, UK
| | - Vinicius de Souza
- European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Katherina G Cortes
- School of Public Health, University of Colorado Anschutz Medical Campus, Aurora, CO, 80045, USA
| | - Melissa Haendel
- Department of Genetics, University of North Carolina, Chapel Hill, Chapel Hill, NC, 27599, USA
| | - Nomi L Harris
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
| | - Daniel R Korn
- Department of Genetics, University of North Carolina, Chapel Hill, Chapel Hill, NC, 27599, USA
| | - Nikolaos M Marinakis
- Laboratory of Medical Genetics, National and Kapodistrian University of Athens, Athens, 11527, Greece
| | | | - James A McLaughlin
- Samples, Phenotypes, and Ontologies (SPOT), European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Christopher J Mungall
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
| | | | - Peter N Robinson
- Berlin Institute of Health, Charité - Universitätsmedizin Berlin, Berlin, 10117, Germany
| | - Damian Smedley
- William Harvey Research Institute, Barts and the London School of Medicine and Dentistry, Queen Mary University of London, London, EC1M 6BQ, UK
| | - Julius Ob Jacobsen
- William Harvey Research Institute, Barts and the London School of Medicine and Dentistry, Queen Mary University of London, London, EC1M 6BQ, UK
| |
Collapse
|
3
|
Danis D, Bamshad MJ, Bridges Y, Cacheiro P, Carmody LC, Chong JX, Coleman B, Dalgleish R, Freeman PJ, Graefe ASL, Groza T, Jacobsen JOB, Klocperk A, Kusters M, Ladewig MS, Marcello AJ, Mattina T, Mungall CJ, Munoz-Torres MC, Reese JT, Rehburg F, Reis BCS, Schuetz C, Smedley D, Strauss T, Sundaramurthi JC, Thun S, Wissink K, Wagstaff JF, Zocche D, Haendel MA, Robinson PN. A corpus of GA4GH Phenopackets: case-level phenotyping for genomic diagnostics and discovery. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.05.29.24308104. [PMID: 38854034 PMCID: PMC11160806 DOI: 10.1101/2024.05.29.24308104] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2024]
Abstract
The Global Alliance for Genomics and Health (GA4GH) Phenopacket Schema was released in 2022 and approved by ISO as a standard for sharing clinical and genomic information about an individual, including phenotypic descriptions, numerical measurements, genetic information, diagnoses, and treatments. A phenopacket can be used as an input file for software that supports phenotype-driven genomic diagnostics and for algorithms that facilitate patient classification and stratification for identifying new diseases and treatments. There has been a great need for a collection of phenopackets to test software pipelines and algorithms. Here, we present phenopacket-store. Version 0.1.12 of phenopacket-store includes 4916 phenopackets representing 277 Mendelian and chromosomal diseases associated with 236 genes, and 2872 unique pathogenic alleles curated from 605 different publications. This represents the first large-scale collection of case-level, standardized phenotypic information derived from case reports in the literature with detailed descriptions of the clinical data and will be useful for many purposes, including the development and testing of software for prioritizing genes and diseases in diagnostic genomics, machine learning analysis of clinical phenotype data, patient stratification, and genotype-phenotype correlations. This corpus also provides best-practice examples for curating literature-derived data using the GA4GH Phenopacket Schema.
Collapse
Affiliation(s)
- Daniel Danis
- The Jackson Institute for Genomic Medicine, 10 Discovery Drive, Farmington CT 06032, USA
- Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Berlin, Germany
| | - Michael J Bamshad
- Department of Pediatrics, Division of Genetic Medicine, University of Washington, 1959 NE Pacific Street, Box 357371, Seattle, WA 98195, USA
- Brotman-Baty Institute for Precision Medicine, 1959 NE Pacific Street, Box 357657, Seattle WA 98195, USA
- Department of Pediatrics, Division of Genetic Medicine, Seattle Children's Hospital, Seattle, WA 98195, USA
| | - Yasemin Bridges
- William Harvey Research Institute, Queen Mary University of London, London, UK
| | - Pilar Cacheiro
- William Harvey Research Institute, Queen Mary University of London, London, UK
| | - Leigh C Carmody
- The Jackson Institute for Genomic Medicine, 10 Discovery Drive, Farmington CT 06032, USA
| | - Jessica X Chong
- Department of Pediatrics, Division of Genetic Medicine, University of Washington, 1959 NE Pacific Street, Box 357371, Seattle, WA 98195, USA
- Brotman-Baty Institute for Precision Medicine, 1959 NE Pacific Street, Box 357657, Seattle WA 98195, USA
| | - Ben Coleman
- Department of Genetics and Genome Sciences, University of Connecticut Health Center, Farmington, CT, USA
- The Jackson Institute for Genomic Medicine, 10 Discovery Drive, Farmington CT 06032, USA
| | - Raymond Dalgleish
- Department of Genetics and Genome Biology, University of Leicester, Leicester, UK
| | - Peter J Freeman
- Division of Informatics, Imaging and Data Science, The University of Manchester, Manchester, UK
| | - Adam S L Graefe
- Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Berlin, Germany
| | - Tudor Groza
- Rare Care Centre, Perth Children's Hospital, Nedlands, WA 6009, Australia
- SingHealth Duke-NUS Institute of Precision Medicine, 5 Hospital Drive Level 9, Singapore 169609, Singapore
- Telethon Kids Institute, Nedlands, WA 6009, Australia
| | - Julius O B Jacobsen
- William Harvey Research Institute, Queen Mary University of London, London, UK
| | - Adam Klocperk
- Department of Immunology, 2nd Faculty of Medicine, Charles University and University Hospital in Motol, Prague, Czech Republic
| | - Maaike Kusters
- Department of Paediatric Immunology, Great Ormond Street Hospital for Children NHS Foundation Trust, London, UK
- University College London Institute of Child Health, London, United Kingdom
| | - Markus S Ladewig
- Department of Ophthalmology, University Clinic Marburg - Campus Fulda, Fulda, Germany
| | - Anthony J Marcello
- Department of Pediatrics, Division of Genetic Medicine, University of Washington, 1959 NE Pacific Street, Box 357371, Seattle, WA 98195, USA
| | - Teresa Mattina
- Medica Genetics University of Catania Italy
- Morgagni foundation and Clinic, Catania, Italy
| | - Christopher J Mungall
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Monica C Munoz-Torres
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Ccampus
| | - Justin T Reese
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Filip Rehburg
- Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Berlin, Germany
| | - Bárbara C S Reis
- Department of Immunology, National Institute of Women's, Children's and Adolescents' Health Fernandes Figueira, Rio de Janeiro, Brazil
- High Complexity Laboratory, National Institute of Women's, Children's and Adolescents' Health Fernandes Figueira, Rio de Janeiro, Brazil
| | - Catharina Schuetz
- Department of Pediatrics, Faculty of Medicine and University Hospital Carl Gustav Carus, Technische Universität Dresden, Dresden, Germany
- University Center for Rare Diseases, Faculty of Medicine and University Hospital Carl Gustav Carus, Technische Universität Dresden, Dresden, Germany
| | - Damian Smedley
- William Harvey Research Institute, Queen Mary University of London, London, UK
| | - Timmy Strauss
- Department of Pediatrics, Faculty of Medicine and University Hospital Carl Gustav Carus, Technische Universität Dresden, Dresden, Germany
- University Center for Rare Diseases, Faculty of Medicine and University Hospital Carl Gustav Carus, Technische Universität Dresden, Dresden, Germany
| | | | - Sylvia Thun
- Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Berlin, Germany
| | - Kyran Wissink
- Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Berlin, Germany
- Utrecht University, Utrecht, the Netherlands
| | | | - David Zocche
- North West Thames Regional Genetics Service, Northwick Park & St Mark's Hospitals, London, UK
| | | | - Peter N Robinson
- Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Berlin, Germany
- The Jackson Institute for Genomic Medicine, 10 Discovery Drive, Farmington CT 06032, USA
- ELLIS-European Laboratory for Learning and Intelligent Systems
| |
Collapse
|
4
|
Zhang D, Zhao R, Xian G, Kou Y, Ma W. A new model construction based on the knowledge graph for mining elite polyphenotype genes in crops. FRONTIERS IN PLANT SCIENCE 2024; 15:1361716. [PMID: 38571713 PMCID: PMC10987776 DOI: 10.3389/fpls.2024.1361716] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/26/2023] [Accepted: 03/04/2024] [Indexed: 04/05/2024]
Abstract
Identifying polyphenotype genes that simultaneously regulate important agronomic traits (e.g., plant height, yield, and disease resistance) is critical for developing novel high-quality crop varieties. Predicting the associations between genes and traits requires the organization and analysis of multi-dimensional scientific data. The existing methods for establishing the relationships between genomic data and phenotypic data can only elucidate the associations between genes and individual traits. However, there are relatively few methods for detecting elite polyphenotype genes. In this study, a knowledge graph for traits regulating-genes was constructed by collecting data from the PubMed database and eight other databases related to the staple food crops rice, maize, and wheat as well as the model plant Arabidopsis thaliana. On the basis of the knowledge graph, a model for predicting traits regulating-genes was constructed by combining the data attributes of the gene nodes and the topological relationship attributes of the gene nodes. Additionally, a scoring method for predicting the genes regulating specific traits was developed to screen for elite polyphenotype genes. A total of 125,591 nodes and 547,224 semantic relationships were included in the knowledge graph. The accuracy of the knowledge graph-based model for predicting traits regulating-genes was 0.89, the precision rate was 0.91, the recall rate was 0.96, and the F1 value was 0.94. Moreover, 4,447 polyphenotype genes for 31 trait combinations were identified, among which the rice polyphenotype gene IPA1 and the A. thaliana polyphenotype gene CUC2 were verified via a literature search. Furthermore, the wheat gene TraesCS5A02G275900 was revealed as a potential polyphenotype gene that will need to be further characterized. Meanwhile, the result of venn diagram analysis between the polyphenotype gene datasets (consists of genes that are predicted by our model) and the transcriptome gene datasets (consists of genes that were differential expression in response to disease, drought or salt) showed approximately 70% and 54% polyphenotype genes were identified in the transcriptome datasets of Arabidopsis and rice, respectively. The application of the model driven by knowledge graph for predicting traits regulating-genes represents a novel method for detecting elite polyphenotype genes.
Collapse
Affiliation(s)
- Dandan Zhang
- Agricultural Information Institute of Chinese Academy of Agricultural Sciences, Beijing, China
| | - Ruixue Zhao
- Agricultural Information Institute of Chinese Academy of Agricultural Sciences, Beijing, China
- Key Laboratory of Agricultural Integration Publishing Knowledge Mining and Knowledge Service, National Press and Publication Administration, Beijing, China
| | - Guojian Xian
- Agricultural Information Institute of Chinese Academy of Agricultural Sciences, Beijing, China
- Key Laboratory of Agricultural Big Data, Ministry of Agriculture and Rural Affairs, Beijing, China
| | - Yuantao Kou
- Agricultural Information Institute of Chinese Academy of Agricultural Sciences, Beijing, China
- Key Laboratory of Agricultural Integration Publishing Knowledge Mining and Knowledge Service, National Press and Publication Administration, Beijing, China
| | - Weilu Ma
- Agricultural Information Institute of Chinese Academy of Agricultural Sciences, Beijing, China
| |
Collapse
|
5
|
Bhasin MA, Knaus A, Incardona P, Schmid A, Holtgrewe M, Elbracht M, Krawitz PM, Hsieh TC. Enhancing Variant Prioritization in VarFish through On-Premise Computational Facial Analysis. Genes (Basel) 2024; 15:370. [PMID: 38540429 PMCID: PMC10969976 DOI: 10.3390/genes15030370] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2024] [Revised: 03/03/2024] [Accepted: 03/13/2024] [Indexed: 06/14/2024] Open
Abstract
Genomic variant prioritization is crucial for identifying disease-associated genetic variations. Integrating facial and clinical feature analyses into this process enhances performance. This study demonstrates the integration of facial analysis (GestaltMatcher) and Human Phenotype Ontology analysis (CADA) within VarFish, an open-source variant analysis framework. Challenges related to non-open-source components were addressed by providing an open-source version of GestaltMatcher, facilitating on-premise facial analysis to address data privacy concerns. Performance evaluation on 163 patients recruited from a German multi-center study of rare diseases showed PEDIA's superior accuracy in variant prioritization compared to individual scores. This study highlights the importance of further benchmarking and future integration of advanced facial analysis approaches aligned with ACMG guidelines to enhance variant classification.
Collapse
Affiliation(s)
- Meghna Ahuja Bhasin
- Institute for Genomic Statistics and Bioinformatics, University Hospital Bonn, Rheinische Friedrich-Wilhelms-Universität Bonn, 53127 Bonn, Germany; (M.A.B.); (A.K.); (P.I.); (A.S.); (P.M.K.)
| | - Alexej Knaus
- Institute for Genomic Statistics and Bioinformatics, University Hospital Bonn, Rheinische Friedrich-Wilhelms-Universität Bonn, 53127 Bonn, Germany; (M.A.B.); (A.K.); (P.I.); (A.S.); (P.M.K.)
| | - Pietro Incardona
- Institute for Genomic Statistics and Bioinformatics, University Hospital Bonn, Rheinische Friedrich-Wilhelms-Universität Bonn, 53127 Bonn, Germany; (M.A.B.); (A.K.); (P.I.); (A.S.); (P.M.K.)
- Core Unit for Bioinformatics Data Analysis, Medical Faculty, University of Bonn, 53127 Bonn, Germany
| | - Alexander Schmid
- Institute for Genomic Statistics and Bioinformatics, University Hospital Bonn, Rheinische Friedrich-Wilhelms-Universität Bonn, 53127 Bonn, Germany; (M.A.B.); (A.K.); (P.I.); (A.S.); (P.M.K.)
| | - Manuel Holtgrewe
- CUBI—Core Unit Bioinformatics, Berlin Institute of Health, 10117 Berlin, Germany;
| | - Miriam Elbracht
- Institute for Human Genetics and Genomic Medicine, Medical Faculty, RWTH Aachen University, 52062 Aachen, Germany;
| | - Peter M. Krawitz
- Institute for Genomic Statistics and Bioinformatics, University Hospital Bonn, Rheinische Friedrich-Wilhelms-Universität Bonn, 53127 Bonn, Germany; (M.A.B.); (A.K.); (P.I.); (A.S.); (P.M.K.)
| | - Tzung-Chien Hsieh
- Institute for Genomic Statistics and Bioinformatics, University Hospital Bonn, Rheinische Friedrich-Wilhelms-Universität Bonn, 53127 Bonn, Germany; (M.A.B.); (A.K.); (P.I.); (A.S.); (P.M.K.)
| |
Collapse
|
6
|
Balachandran S, Prada-Medina CA, Mensah MA, Kakar N, Nagel I, Pozojevic J, Audain E, Hitz MP, Kircher M, Sreenivasan VKA, Spielmann M. STIGMA: Single-cell tissue-specific gene prioritization using machine learning. Am J Hum Genet 2024; 111:338-349. [PMID: 38228144 PMCID: PMC10870135 DOI: 10.1016/j.ajhg.2023.12.011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2023] [Revised: 12/01/2023] [Accepted: 12/07/2023] [Indexed: 01/18/2024] Open
Abstract
Clinical exome and genome sequencing have revolutionized the understanding of human disease genetics. Yet many genes remain functionally uncharacterized, complicating the establishment of causal disease links for genetic variants. While several scoring methods have been devised to prioritize these candidate genes, these methods fall short of capturing the expression heterogeneity across cell subpopulations within tissues. Here, we introduce single-cell tissue-specific gene prioritization using machine learning (STIGMA), an approach that leverages single-cell RNA-seq (scRNA-seq) data to prioritize candidate genes associated with rare congenital diseases. STIGMA prioritizes genes by learning the temporal dynamics of gene expression across cell types during healthy organogenesis. To assess the efficacy of our framework, we applied STIGMA to mouse limb and human fetal heart scRNA-seq datasets. In a cohort of individuals with congenital limb malformation, STIGMA prioritized 469 variants in 345 genes, with UBA2 as a notable example. For congenital heart defects, we detected 34 genes harboring nonsynonymous de novo variants (nsDNVs) in two or more individuals from a set of 7,958 individuals, including the ortholog of Prdm1, which is associated with hypoplastic left ventricle and hypoplastic aortic arch. Overall, our findings demonstrate that STIGMA effectively prioritizes tissue-specific candidate genes by utilizing single-cell transcriptome data. The ability to capture the heterogeneity of gene expression across cell populations makes STIGMA a powerful tool for the discovery of disease-associated genes and facilitates the identification of causal variants underlying human genetic disorders.
Collapse
Affiliation(s)
- Saranya Balachandran
- Institute of Human Genetics, University Hospital Schleswig-Holstein, University of Lübeck and Kiel University, Lübeck, Germany
| | - Cesar A Prada-Medina
- Human Molecular Genetics Group, Max Planck Institute for Molecular Genetics, 14195 Berlin, Germany
| | - Martin A Mensah
- Institut für Medizinische Genetik und Humangenetik, Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Augustenburger Platz 1, 13353 Berlin, Germany; BIH Charité Digital Clinician Scientist Program, BIH Biomedical Innovation Academy, Anna-Louisa-Karsch-Strasse 2, 10178 Berlin, Germany; RG Development & Disease, Max Planck Institute for Molecular Genetics, 14195 Berlin, Germany
| | - Naseebullah Kakar
- Institute of Human Genetics, University Hospital Schleswig-Holstein, University of Lübeck and Kiel University, Lübeck, Germany; Department of Biotechnology, BUITEMS, Quetta, Pakistan
| | - Inga Nagel
- Institute of Human Genetics, University Hospital Schleswig-Holstein, University of Lübeck and Kiel University, Lübeck, Germany
| | - Jelena Pozojevic
- Institute of Human Genetics, University Hospital Schleswig-Holstein, University of Lübeck and Kiel University, Lübeck, Germany
| | - Enrique Audain
- Institute of Medical Genetics, Carl von Ossietzky University, 26129 Oldenburg, Germany; DZHK e.V. (German Center for Cardiovascular Research), Partner Site Hamburg/Kiel/Lübeck; Department of Congenital Heart Disease and Pediatric Cardiology, University Hospital of Schleswig-Holstein, 24105 Kiel, Germany
| | - Marc-Phillip Hitz
- Institute of Medical Genetics, Carl von Ossietzky University, 26129 Oldenburg, Germany; DZHK e.V. (German Center for Cardiovascular Research), Partner Site Hamburg/Kiel/Lübeck; Department of Congenital Heart Disease and Pediatric Cardiology, University Hospital of Schleswig-Holstein, 24105 Kiel, Germany
| | - Martin Kircher
- Institute of Human Genetics, University Hospital Schleswig-Holstein, University of Lübeck and Kiel University, Lübeck, Germany
| | - Varun K A Sreenivasan
- Institute of Human Genetics, University Hospital Schleswig-Holstein, University of Lübeck and Kiel University, Lübeck, Germany.
| | - Malte Spielmann
- Institute of Human Genetics, University Hospital Schleswig-Holstein, University of Lübeck and Kiel University, Lübeck, Germany; Human Molecular Genetics Group, Max Planck Institute for Molecular Genetics, 14195 Berlin, Germany; DZHK e.V. (German Center for Cardiovascular Research), Partner Site Hamburg/Kiel/Lübeck.
| |
Collapse
|
7
|
Yang J, Liu C, Deng W, Wu D, Weng C, Zhou Y, Wang K. Enhancing phenotype recognition in clinical notes using large language models: PhenoBCBERT and PhenoGPT. PATTERNS (NEW YORK, N.Y.) 2024; 5:100887. [PMID: 38264716 PMCID: PMC10801236 DOI: 10.1016/j.patter.2023.100887] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/14/2023] [Revised: 10/25/2023] [Accepted: 11/06/2023] [Indexed: 01/25/2024]
Abstract
To enhance phenotype recognition in clinical notes of genetic diseases, we developed two models-PhenoBCBERT and PhenoGPT-for expanding the vocabularies of Human Phenotype Ontology (HPO) terms. While HPO offers a standardized vocabulary for phenotypes, existing tools often fail to capture the full scope of phenotypes due to limitations from traditional heuristic or rule-based approaches. Our models leverage large language models to automate the detection of phenotype terms, including those not in the current HPO. We compare these models with PhenoTagger, another HPO recognition tool, and found that our models identify a wider range of phenotype concepts, including previously uncharacterized ones. Our models also show strong performance in case studies on biomedical literature. We evaluate the strengths and weaknesses of BERT- and GPT-based models in aspects such as architecture and accuracy. Overall, our models enhance automated phenotype detection from clinical texts, improving downstream analyses on human diseases.
Collapse
Affiliation(s)
- Jingye Yang
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Department of Mathematics, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Cong Liu
- Department of Biomedical Informatics, Columbia University, New York, NY 10032, USA
| | - Wendy Deng
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Da Wu
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Chunhua Weng
- Department of Biomedical Informatics, Columbia University, New York, NY 10032, USA
| | - Yunyun Zhou
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Biostatistics and Bioinformatics Facility, Fox Chase Cancer Center, Philadelphia, PA 19111, USA
| | - Kai Wang
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Department of Pathology and Laboratory Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| |
Collapse
|
8
|
Su C, Hou Y, Levin M, Zhang R, Wang F. Protocol to implement a computational pipeline for biomedical discovery based on a biomedical knowledge graph. STAR Protoc 2023; 4:102666. [PMID: 37883224 PMCID: PMC10630678 DOI: 10.1016/j.xpro.2023.102666] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2023] [Revised: 09/06/2023] [Accepted: 10/03/2023] [Indexed: 10/28/2023] Open
Abstract
Biomedical knowledge graphs (BKGs) provide a new paradigm for managing abundant biomedical knowledge efficiently. Today's artificial intelligence techniques enable mining BKGs to discover new knowledge. Here, we present a protocol for implementing a computational pipeline for biomedical knowledge discovery (BKD) based on a BKG. We describe steps of the pipeline including data processing, implementing BKD based on knowledge graph embeddings, and prediction result interpretation. We detail how our pipeline can be used for drug repurposing hypothesis generation for Parkinson's disease. For complete details on the use and execution of this protocol, please refer to Su et al.1.
Collapse
Affiliation(s)
- Chang Su
- Department of Population Health Sciences, Weill Cornell Medicine, New York, NY 10065, USA.
| | - Yu Hou
- Department of Surgery, University of Minnesota, Minneapolis, MN 55455, USA
| | - Michael Levin
- Bioengineering Department, College of Engineering, Temple University, Philadelphia, PA 19122, USA
| | - Rui Zhang
- Department of Surgery, University of Minnesota, Minneapolis, MN 55455, USA
| | - Fei Wang
- Department of Population Health Sciences, Weill Cornell Medicine, New York, NY 10065, USA.
| |
Collapse
|
9
|
Xu Z, Li Q, Marchionni L, Wang K. PhenoSV: interpretable phenotype-aware model for the prioritization of genes affected by structural variants. Nat Commun 2023; 14:7805. [PMID: 38016949 PMCID: PMC10684511 DOI: 10.1038/s41467-023-43651-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2023] [Accepted: 11/15/2023] [Indexed: 11/30/2023] Open
Abstract
Structural variants (SVs) represent a major source of genetic variation associated with phenotypic diversity and disease susceptibility. While long-read sequencing can discover over 20,000 SVs per human genome, interpreting their functional consequences remains challenging. Existing methods for identifying disease-related SVs focus on deletion/duplication only and cannot prioritize individual genes affected by SVs, especially for noncoding SVs. Here, we introduce PhenoSV, a phenotype-aware machine-learning model that interprets all major types of SVs and genes affected. PhenoSV segments and annotates SVs with diverse genomic features and employs a transformer-based architecture to predict their impacts under a multiple-instance learning framework. With phenotype information, PhenoSV further utilizes gene-phenotype associations to prioritize phenotype-related SVs. Evaluation on extensive human SV datasets covering all SV types demonstrates PhenoSV's superior performance over competing methods. Applications in diseases suggest that PhenoSV can determine disease-related genes from SVs. A web server and a command-line tool for PhenoSV are available at https://phenosv.wglab.org .
Collapse
Affiliation(s)
- Zhuoran Xu
- Graduate Group in Genomics and Computational Biology, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, 19104, USA
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY, 10065, USA
| | - Quan Li
- Princess Margaret Cancer Centre, University Health Network, University of Toronto, Toronto, ON, M5G2C1, Canada
| | - Luigi Marchionni
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY, 10065, USA
| | - Kai Wang
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA.
- Department of Pathology and Laboratory Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA.
| |
Collapse
|
10
|
Yang J, Liu C, Deng W, Wu D, Weng C, Zhou Y, Wang K. Enhancing Phenotype Recognition in Clinical Notes Using Large Language Models: PhenoBCBERT and PhenoGPT. ARXIV 2023:arXiv:2308.06294v2. [PMID: 37986722 PMCID: PMC10659449] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 11/22/2023]
Abstract
To enhance phenotype recognition in clinical notes of genetic diseases, we developed two models - PhenoBCBERT and PhenoGPT - for expanding the vocabularies of Human Phenotype Ontology (HPO) terms. While HPO offers a standardized vocabulary for phenotypes, existing tools often fail to capture the full scope of phenotypes, due to limitations from traditional heuristic or rule-based approaches. Our models leverage large language models (LLMs) to automate the detection of phenotype terms, including those not in the current HPO. We compared these models to PhenoTagger, another HPO recognition tool, and found that our models identify a wider range of phenotype concepts, including previously uncharacterized ones. Our models also showed strong performance in case studies on biomedical literature. We evaluated the strengths and weaknesses of BERT-based and GPT-based models in aspects such as architecture and accuracy. Overall, our models enhance automated phenotype detection from clinical texts, improving downstream analyses on human diseases.
Collapse
Affiliation(s)
- Jingye Yang
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Department of Mathematics, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Cong Liu
- Department of Biomedical Informatics, Columbia University, New York, NY 10032, USA
| | - Wendy Deng
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Da Wu
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Chunhua Weng
- Department of Biomedical Informatics, Columbia University, New York, NY 10032, USA
| | - Yunyun Zhou
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Biostatistics and Bioinformatics facility, Fox Chase Cancer Center, Philadelphia, PA 19111, USA
| | - Kai Wang
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Department of Pathology and Laboratory Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| |
Collapse
|
11
|
Caufield JH, Putman T, Schaper K, Unni DR, Hegde H, Callahan TJ, Cappelletti L, Moxon SAT, Ravanmehr V, Carbon S, Chan LE, Cortes K, Shefchek KA, Elsarboukh G, Balhoff J, Fontana T, Matentzoglu N, Bruskiewich RM, Thessen AE, Harris NL, Munoz-Torres MC, Haendel MA, Robinson PN, Joachimiak MP, Mungall CJ, Reese JT. KG-Hub-building and exchanging biological knowledge graphs. Bioinformatics 2023; 39:btad418. [PMID: 37389415 PMCID: PMC10336030 DOI: 10.1093/bioinformatics/btad418] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2023] [Revised: 05/09/2023] [Accepted: 06/29/2023] [Indexed: 07/01/2023] Open
Abstract
MOTIVATION Knowledge graphs (KGs) are a powerful approach for integrating heterogeneous data and making inferences in biology and many other domains, but a coherent solution for constructing, exchanging, and facilitating the downstream use of KGs is lacking. RESULTS Here we present KG-Hub, a platform that enables standardized construction, exchange, and reuse of KGs. Features include a simple, modular extract-transform-load pattern for producing graphs compliant with Biolink Model (a high-level data model for standardizing biological data), easy integration of any OBO (Open Biological and Biomedical Ontologies) ontology, cached downloads of upstream data sources, versioned and automatically updated builds with stable URLs, web-browsable storage of KG artifacts on cloud infrastructure, and easy reuse of transformed subgraphs across projects. Current KG-Hub projects span use cases including COVID-19 research, drug repurposing, microbial-environmental interactions, and rare disease research. KG-Hub is equipped with tooling to easily analyze and manipulate KGs. KG-Hub is also tightly integrated with graph machine learning (ML) tools which allow automated graph ML, including node embeddings and training of models for link prediction and node classification. AVAILABILITY AND IMPLEMENTATION https://kghub.org.
Collapse
Affiliation(s)
- J Harry Caufield
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, United States
| | - Tim Putman
- Anschutz Medical Campus, University of Colorado, Aurora, CO 80045, United States
| | - Kevin Schaper
- Anschutz Medical Campus, University of Colorado, Aurora, CO 80045, United States
| | - Deepak R Unni
- SIB Swiss Institute of Bioinformatics, Basel 1015, Switzerland
| | - Harshad Hegde
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, United States
| | - Tiffany J Callahan
- Department of Biomedical Informatics, Columbia University Irving Medical Center, New York, NY 10032, United States
| | - Luca Cappelletti
- Department of Computer Science, University of Milano, Milan 20126, Italy
| | - Sierra A T Moxon
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, United States
| | - Vida Ravanmehr
- Department of Lymphoma-Myeloma, MD Anderson Cancer Center, Houston, TX 77030, United States
| | - Seth Carbon
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, United States
| | - Lauren E Chan
- College of Public Health and Human Sciences, Oregon State University, Corvallis, OR 97331, United States
| | - Katherina Cortes
- Anschutz Medical Campus, University of Colorado, Aurora, CO 80045, United States
| | - Kent A Shefchek
- Anschutz Medical Campus, University of Colorado, Aurora, CO 80045, United States
| | - Glass Elsarboukh
- Anschutz Medical Campus, University of Colorado, Aurora, CO 80045, United States
| | - Jim Balhoff
- Renaissance Computing Institute, University of North Carolina, Chapel Hill, NC 27517, United States
| | - Tommaso Fontana
- Dipartimento di Elettronica, Informazione e Bioingegneria, Politecnico di Milano, Milan 20133, Italy
| | | | | | - Anne E Thessen
- Anschutz Medical Campus, University of Colorado, Aurora, CO 80045, United States
| | - Nomi L Harris
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, United States
| | | | - Melissa A Haendel
- Anschutz Medical Campus, University of Colorado, Aurora, CO 80045, United States
| | - Peter N Robinson
- The Jackson Laboratory for Genomic Medicine, Farmington, CT 06032, United States
| | - Marcin P Joachimiak
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, United States
| | - Christopher J Mungall
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, United States
| | - Justin T Reese
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, United States
| |
Collapse
|
12
|
Lesmann H, Klinkhammer H, M. Krawitz PDMDPP. The future role of facial image analysis in ACMG classification guidelines. MED GENET-BERLIN 2023; 35:115-121. [PMID: 38840866 PMCID: PMC10842539 DOI: 10.1515/medgen-2023-2014] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/07/2024]
Abstract
The use of next-generation sequencing (NGS) has dramatically improved the diagnosis of rare diseases. However, the analysis of genomic data has become complex with the increasing detection of variants by exome and genome sequencing. The American College of Medical Genetics and Genomics (ACMG) and the Association for Molecular Pathology (AMP) developed a 5-tier classification scheme in 2015 for variant interpretation, that has since been widely adopted. Despite efforts to minimise discrepancies in the application of these criteria, inconsistencies still occur. Further specifications for individual genes were developed by Variant Curation Expert Panels (VCEPs) of the Clinical Genome Resource (ClinGen) consortium, that also take into consideration gene or disease specific features. For instance, in disorders with a highly characerstic facial gestalt a "phenotypic match" (PP4) has higher pathogenic evidence than e.g. in a non-syndromic form of intellectual disability. With computational approaches for quantifying the similarity of dysmorphic features results of such analysis can now be used in a refined Bayesian framework for the ACMG/AMP criteria.
Collapse
Affiliation(s)
- Hellen Lesmann
- University of Bonn, Medical Faculty & University Hospital BonnInstitute of Human GeneticsVenusberg-Campus 153127BonnGermany
| | - Hannah Klinkhammer
- University of BonnInstitute for Genomic Statistics and BioinformaticsBonnGermany
| | | |
Collapse
|
13
|
Su C, Hou Y, Zhou M, Rajendran S, Maasch JRA, Abedi Z, Zhang H, Bai Z, Cuturrufo A, Guo W, Chaudhry FF, Ghahramani G, Tang J, Cheng F, Li Y, Zhang R, DeKosky ST, Bian J, Wang F. Biomedical discovery through the integrative biomedical knowledge hub (iBKH). iScience 2023; 26:106460. [PMID: 37020958 PMCID: PMC10068563 DOI: 10.1016/j.isci.2023.106460] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2022] [Revised: 09/20/2022] [Accepted: 03/16/2023] [Indexed: 04/01/2023] Open
Abstract
The abundance of biomedical knowledge gained from biological experiments and clinical practices is an invaluable resource for biomedicine. The emerging biomedical knowledge graphs (BKGs) provide an efficient and effective way to manage the abundant knowledge in biomedical and life science. In this study, we created a comprehensive BKG called the integrative Biomedical Knowledge Hub (iBKH) by harmonizing and integrating information from diverse biomedical resources. To make iBKH easily accessible for biomedical research, we developed a web-based, user-friendly graphical portal that allows fast and interactive knowledge retrieval. Additionally, we also implemented an efficient and scalable graph learning pipeline for discovering novel biomedical knowledge in iBKH. As a proof of concept, we performed our iBKH-based method for computational in-silico drug repurposing for Alzheimer's disease. The iBKH is publicly available.
Collapse
Affiliation(s)
- Chang Su
- Department of Health Service Administration and Policy, College of Public Health, Temple University, Philadelphia, PA 19122, USA
| | - Yu Hou
- Department of Population Health Sciences, Weill Cornell Medicine, New York, NY 10065, USA
- Department of Surgery, University of Minnesota, Minneapolis, MN 55455, USA
| | - Manqi Zhou
- Department of Computational Biology, Cornell University, Ithaca, NY 14850, USA
| | - Suraj Rajendran
- Tri-Institutional Computational Biology & Medicine Program, Cornell University, New York, NY 10065, USA
| | | | - Zehra Abedi
- Department of Population Health Sciences, Weill Cornell Medicine, New York, NY 10065, USA
| | - Haotan Zhang
- Department of Physiology and Biophysics, Weill Cornell Medicine, New York, NY 10065, USA
| | - Zilong Bai
- Department of Population Health Sciences, Weill Cornell Medicine, New York, NY 10065, USA
| | | | - Winston Guo
- Department of Medicine, Weill Cornell Medicine, New York, NY 10021, USA
| | - Fayzan F. Chaudhry
- Department of Physiology and Biophysics, Weill Cornell Medicine, New York, NY 10065, USA
| | - Gregory Ghahramani
- Department of Physiology and Biophysics, Weill Cornell Medicine, New York, NY 10065, USA
| | - Jian Tang
- Mila-Quebec AI Institute and HEC Montreal, Montreal, QC H2S 3H1, Canada
| | - Feixiong Cheng
- Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, Cleveland, OH 44195, USA
- Department of Molecular Medicine, Cleveland Clinic Lerner College of Medicine, Case Western Reserve University, Cleveland, OH 44195, USA
- Case Comprehensive Cancer Center, Case Western Reserve University School of Medicine, Cleveland, OH 44106, USA
| | - Yue Li
- School of Computer Science, McGill University, Montreal, QC H3A 0C6, Canada
| | - Rui Zhang
- Department of Surgery, University of Minnesota, Minneapolis, MN 55455, USA
| | - Steven T. DeKosky
- Department of Neurology, College of Medicine, University of Florida, Gainesville, FL 32610, USA
| | - Jiang Bian
- Department of Health Outcomes & Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL 32610, USA
| | - Fei Wang
- Department of Population Health Sciences, Weill Cornell Medicine, New York, NY 10065, USA
| |
Collapse
|
14
|
Ladewig MS, Jacobsen JOB, Wagner AH, Danis D, El Kassaby B, Gargano M, Groza T, Baudis M, Steinhaus R, Seelow D, Bechrakis NE, Mungall CJ, Schofield PN, Elemento O, Smith L, McMurry JA, Munoz‐Torres M, Haendel MA, Robinson PN. GA4GH Phenopackets: A Practical Introduction. ADVANCED GENETICS (HOBOKEN, N.J.) 2023; 4:2200016. [PMID: 36910590 PMCID: PMC10000265 DOI: 10.1002/ggn2.202200016] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/28/2022] [Revised: 06/30/2022] [Indexed: 11/08/2022]
Abstract
The Global Alliance for Genomics and Health (GA4GH) is developing a suite of coordinated standards for genomics for healthcare. The Phenopacket is a new GA4GH standard for sharing disease and phenotype information that characterizes an individual person, linking that individual to detailed phenotypic descriptions, genetic information, diagnoses, and treatments. A detailed example is presented that illustrates how to use the schema to represent the clinical course of a patient with retinoblastoma, including demographic information, the clinical diagnosis, phenotypic features and clinical measurements, an examination of the extirpated tumor, therapies, and the results of genomic analysis. The Phenopacket Schema, together with other GA4GH data and technical standards, will enable data exchange and provide a foundation for the computational analysis of disease and phenotype information to improve our ability to diagnose and conduct research on all types of disorders, including cancer and rare diseases.
Collapse
Affiliation(s)
- Markus S. Ladewig
- Department of OphthalmologyKlinikum Saarbrücken66119SaarbrückenGermany
| | - Julius O. B. Jacobsen
- William Harvey Research InstituteCharterhouse SquareBarts and the London School of Medicine and Dentistry QueenQueen Mary University of LondonLondonEC1M 6BQUK
| | - Alex H. Wagner
- Departments of Pediatrics and Biomedical InformaticsThe Ohio State University College of MedicineColumbusOH43210USA
- The Steve and Cindy Rasmussen Institute for Genomic MedicineNationwide Children's HospitalColumbusOH43215USA
| | - Daniel Danis
- The Jackson Laboratory for Genomic Medicine10 Discovery DriveFarmingtonCT06032USA
| | - Baha El Kassaby
- The Jackson Laboratory for Genomic Medicine10 Discovery DriveFarmingtonCT06032USA
| | - Michael Gargano
- The Jackson Laboratory for Genomic Medicine10 Discovery DriveFarmingtonCT06032USA
| | - Tudor Groza
- European Molecular Biology LaboratoryEuropean Bioinformatics Institute (EMBL‐EBI)CambridgeCB10 1SDUK
| | - Michael Baudis
- Department of Molecular Life Sciences and Swiss Institute of BioinformaticsUniversity of ZurichZurichSwitzerland
| | - Robin Steinhaus
- Exploratory Diagnostic SciencesBerlin Institute of Health at Charité – Universitätsmedizin Berlin10178BerlinGermany
- Institute of Medical Genetics and Human GeneticsCharité – Universitätsmedizin BerlinCorporate Member of Freie Universität Berlin and Humboldt‐Universität zu Berlin13353BerlinGermany
| | - Dominik Seelow
- Exploratory Diagnostic SciencesBerlin Institute of Health at Charité – Universitätsmedizin Berlin10178BerlinGermany
- Institute of Medical Genetics and Human GeneticsCharité – Universitätsmedizin BerlinCorporate Member of Freie Universität Berlin and Humboldt‐Universität zu Berlin13353BerlinGermany
| | | | - Christopher J. Mungall
- Lawrence Berkeley National LaboratoryEnvironmental Genomics and Systems BiologyBerkeleyCA94720USA
| | - Paul N. Schofield
- Department of Physiology Development and NeuroscienceUniversity of CambridgeDowning StreetCambridgeCB2 3EGUK
- The Jackson LaboratoryBar HarborME04609USA
| | - Olivier Elemento
- Caryl and Israel Englander Institute for Precision MedicineWeill Cornell MedicineNew YorkNY10021USA
| | - Lindsay Smith
- Ontario Institute for Cancer ResearchAdaptive OncologyTorontoCAM5G0A3USA
- Global Alliance for Genomics and HealthTorontoCAM5G0A3USA
| | - Julie A. McMurry
- Center for Health AIUniversity of Colorado Anschutz Medical CampusAuroraCO80045USA
| | - Monica Munoz‐Torres
- Center for Health AIUniversity of Colorado Anschutz Medical CampusAuroraCO80045USA
| | - Melissa A. Haendel
- Center for Health AIUniversity of Colorado Anschutz Medical CampusAuroraCO80045USA
| | - Peter N. Robinson
- The Jackson Laboratory for Genomic Medicine10 Discovery DriveFarmingtonCT06032USA
- Institute for Systems GenomicsUniversity of ConnecticutFarmingtonCT06032USA
| |
Collapse
|
15
|
Deep Learning with Graph Convolutional Networks: An Overview and Latest Applications in Computational Intelligence. INT J INTELL SYST 2023. [DOI: 10.1155/2023/8342104] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/06/2023]
Abstract
Convolutional neural networks (CNNs) have received widespread attention due to their powerful modeling capabilities and have been successfully applied in natural language processing, image recognition, and other fields. On the other hand, traditional CNN can only deal with Euclidean spatial data. In contrast, many real-life scenarios, such as transportation networks, social networks, reference networks, and so on, exist in graph data. The creation of graph convolution operators and graph pooling is at the heart of migrating CNN to graph data analysis and processing. With the advancement of the Internet and technology, graph convolution network (GCN), as an innovative technology in artificial intelligence (AI), has received more and more attention. GCN has been widely used in different fields such as image processing, intelligent recommender system, knowledge-based graph, and other areas due to their excellent characteristics in processing non-European spatial data. At the same time, communication networks have also embraced AI technology in recent years, and AI serves as the brain of the future network and realizes the comprehensive intelligence of the future grid. Many complex communication network problems can be abstracted as graph-based optimization problems and solved by GCN, thus overcoming the limitations of traditional methods. This survey briefly describes the definition of graph-based machine learning, introduces different types of graph networks, summarizes the application of GCN in various research fields, analyzes the research status, and gives the future research direction.
Collapse
|
16
|
Guo L, Park J, Yi E, Marchi E, Hsieh TC, Kibalnyk Y, Moreno-Sáez Y, Biskup S, Puk O, Beger C, Li Q, Wang K, Voronova A, Krawitz PM, Lyon GJ. KBG syndrome: videoconferencing and use of artificial intelligence driven facial phenotyping in 25 new patients. Eur J Hum Genet 2022; 30:1244-1254. [PMID: 35970914 PMCID: PMC9626563 DOI: 10.1038/s41431-022-01171-1] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2022] [Revised: 05/26/2022] [Accepted: 07/26/2022] [Indexed: 02/04/2023] Open
Abstract
Genetic variants in Ankyrin Repeat Domain 11 (ANKRD11) and deletions in 16q24.3 are known to cause KBG syndrome, a rare syndrome associated with craniofacial, intellectual, and neurobehavioral anomalies. We report 25 unpublished individuals from 22 families with molecularly confirmed diagnoses. Twelve individuals have de novo variants, three have inherited variants, and one is inherited from a parent with low-level mosaicism. The mode of inheritance was unknown for nine individuals. Twenty are truncating variants, and the remaining five are missense (three of which are found in one family). We present a protocol emphasizing the use of videoconference and artificial intelligence (AI) in collecting and analyzing data for this rare syndrome. A single clinician interviewed 25 individuals throughout eight countries. Participants' medical records were reviewed, and data was uploaded to the Human Disease Gene website using Human Phenotype Ontology (HPO) terms. Photos of the participants were analyzed by the GestaltMatcher and DeepGestalt, Face2Gene platform (FDNA Inc, USA) algorithms. Within our cohort, common traits included short stature, macrodontia, anteverted nares, wide nasal bridge, wide nasal base, thick eyebrows, synophrys and hypertelorism. Behavioral issues and global developmental delays were widely present. Neurologic abnormalities including seizures and/or EEG abnormalities were common (44%), suggesting that early detection and seizure prophylaxis could be an important point of intervention. Almost a quarter (24%) were diagnosed with attention deficit hyperactivity disorder and 28% were diagnosed with autism spectrum disorder. Based on the data, we provide a set of recommendations regarding diagnostic and treatment approaches for KBG syndrome.
Collapse
Affiliation(s)
- Lily Guo
- grid.420001.70000 0000 9813 9625Department of Human Genetics, NYS Institute for Basic Research in Developmental Disabilities, 1050 Forest Hill Road, Staten Island, NY 10314 USA
| | - Jiyeon Park
- grid.420001.70000 0000 9813 9625Department of Human Genetics, NYS Institute for Basic Research in Developmental Disabilities, 1050 Forest Hill Road, Staten Island, NY 10314 USA
| | - Edward Yi
- grid.420001.70000 0000 9813 9625Department of Human Genetics, NYS Institute for Basic Research in Developmental Disabilities, 1050 Forest Hill Road, Staten Island, NY 10314 USA
| | - Elaine Marchi
- grid.420001.70000 0000 9813 9625Department of Human Genetics, NYS Institute for Basic Research in Developmental Disabilities, 1050 Forest Hill Road, Staten Island, NY 10314 USA
| | - Tzung-Chien Hsieh
- grid.10388.320000 0001 2240 3300Institute for Genomic Statistics and Bioinformatics, University Hospital Bonn, Rheinische Friedrich-Wilhelms-Universität Bonn, Bonn, Germany
| | - Yana Kibalnyk
- grid.17089.370000 0001 2190 316XDepartment of Medical Genetics, Faculty of Medicine & Dentistry, University of Alberta, Edmonton, AB Canada ,grid.17089.370000 0001 2190 316XDepartment of Cell Biology, Faculty of Medicine & Dentistry, University of Alberta, Edmonton, AB Canada
| | | | - Saskia Biskup
- CeGaT GmbH, Praxis für Humangenetik Tübingen, Tübingen, Germany
| | - Oliver Puk
- CeGaT GmbH, Praxis für Humangenetik Tübingen, Tübingen, Germany
| | - Carmela Beger
- grid.512442.40000 0004 0553 6293MVZ Labor Krone GbR, Filialpraxis für Humangenetik, Bielefeld, Germany
| | - Quan Li
- grid.17063.330000 0001 2157 2938Princess Margaret Cancer Centre, University Health Network, University of Toronto, Toronto, ON M5G2C1 Canada
| | - Kai Wang
- grid.239552.a0000 0001 0680 8770Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children’s Hospital of Philadelphia, Philadelphia, PA 19104 USA
| | - Anastassia Voronova
- grid.17089.370000 0001 2190 316XDepartment of Medical Genetics, Faculty of Medicine & Dentistry, University of Alberta, Edmonton, AB Canada ,grid.17089.370000 0001 2190 316XDepartment of Cell Biology, Faculty of Medicine & Dentistry, University of Alberta, Edmonton, AB Canada
| | - Peter M. Krawitz
- grid.10388.320000 0001 2240 3300Institute for Genomic Statistics and Bioinformatics, University Hospital Bonn, Rheinische Friedrich-Wilhelms-Universität Bonn, Bonn, Germany
| | - Gholson J. Lyon
- grid.420001.70000 0000 9813 9625Department of Human Genetics, NYS Institute for Basic Research in Developmental Disabilities, 1050 Forest Hill Road, Staten Island, NY 10314 USA ,grid.420001.70000 0000 9813 9625George A. Jervis Clinic, NYS Institute for Basic Research in Developmental Disabilities, 1050 Forest Hill Road, Staten Island, NY 10314 USA ,grid.212340.60000000122985718Biology PhD Program, The Graduate Center, The City University of New York, New York, NY USA
| |
Collapse
|