1
|
Parmar JM, Laing NG, Kennerson ML, Ravenscroft G. Genetics of inherited peripheral neuropathies and the next frontier: looking backwards to progress forwards. J Neurol Neurosurg Psychiatry 2024; 95:992-1001. [PMID: 38744462 PMCID: PMC11503175 DOI: 10.1136/jnnp-2024-333436] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/18/2024] [Accepted: 04/10/2024] [Indexed: 05/16/2024]
Abstract
Inherited peripheral neuropathies (IPNs) encompass a clinically and genetically heterogeneous group of disorders causing length-dependent degeneration of peripheral autonomic, motor and/or sensory nerves. Despite gold-standard diagnostic testing for pathogenic variants in over 100 known associated genes, many patients with IPN remain genetically unsolved. Providing patients with a diagnosis is critical for reducing their 'diagnostic odyssey', improving clinical care, and for informed genetic counselling. The last decade of massively parallel sequencing technologies has seen a rapid increase in the number of newly described IPN-associated gene variants contributing to IPN pathogenesis. However, the scarcity of additional families and functional data supporting variants in potential novel genes is prolonging patient diagnostic uncertainty and contributing to the missing heritability of IPNs. We review the last decade of IPN disease gene discovery to highlight novel genes, structural variation and short tandem repeat expansions contributing to IPN pathogenesis. From the lessons learnt, we provide our vision for IPN research as we anticipate the future, providing examples of emerging technologies, resources and tools that we propose that will expedite the genetic diagnosis of unsolved IPN families.
Collapse
Affiliation(s)
- Jevin M Parmar
- Rare Disease Genetics and Functional Genomics, Harry Perkins Institute of Medical Research, Perth, Western Australia, Australia
- Centre for Medical Research, Faculty of Health and Medical Sciences, The University of Western Australia, Perth, Western Australia, Australia
| | - Nigel G Laing
- Centre for Medical Research, Faculty of Health and Medical Sciences, The University of Western Australia, Perth, Western Australia, Australia
- Preventive Genetics, Harry Perkins Institute of Medical Research, Perth, Western Australia, Australia
| | - Marina L Kennerson
- Northcott Neuroscience Laboratory, ANZAC Research Institute, Concord, New South Wales, Australia
- Molecular Medicine Laboratory, Concord Hospital, Concord, New South Wales, Australia
| | - Gianina Ravenscroft
- Rare Disease Genetics and Functional Genomics, Harry Perkins Institute of Medical Research, Perth, Western Australia, Australia
- Centre for Medical Research, Faculty of Health and Medical Sciences, The University of Western Australia, Perth, Western Australia, Australia
| |
Collapse
|
2
|
Green DJ, Michaud V, Lasseaux E, Plaisant C, Fitzgerald T, Birney E, Black GC, Arveiler B, Sergouniotis PI. The co-occurrence of genetic variants in the TYR and OCA2 genes confers susceptibility to albinism. Nat Commun 2024; 15:8436. [PMID: 39349469 PMCID: PMC11443028 DOI: 10.1038/s41467-024-52763-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2022] [Accepted: 09/19/2024] [Indexed: 10/02/2024] Open
Abstract
Although rare genetic conditions are mostly caused by DNA sequence alterations that functionally disrupt individual genes, large-scale studies using genome sequencing have started to unmask additional complexity. Understanding how combinations of variants in different genes shape human phenotypes is expected to provide important insights into the clinical and genetic heterogeneity of rare disorders. Here, we use albinism, an archetypal rare condition associated with hypopigmentation, as an exemplar for the study of genetic interactions. We analyse data from the Genomics England 100,000 Genomes Project alongside a cohort of 1120 individuals with albinism, and investigate the effect of dual heterozygosity for the combination of two established albinism-related variants: TYR:c.1205 G > A (p.Arg402Gln) [rs1126809] and OCA2:c.1327 G > A (p.Val443Ile) [rs74653330]. As each of these changes alone is insufficient to cause disease when present in the heterozygous state, we sought evidence of synergistic effects. We show that, when both variants are present, the probability of receiving a diagnosis of albinism is significantly increased (odds ratio 12.8; 95% confidence interval 6.0 - 24.7; p-value 2.1 ×10-8). Further analyses in an independent cohort, the UK Biobank, support this finding and highlight that heterozygosity for the TYR:c.1205 G > A and OCA2:c.1327 G > A variant combination is associated with statistically significant alterations in visual acuity and central retinal thickness (traits that are considered albinism endophenotypes). The approach discussed in this report opens up new avenues for the investigation of oligogenic patterns in apparently Mendelian disorders.
Collapse
Affiliation(s)
- David J Green
- Division of Evolution, Infection and Genomics, School of Biological Sciences, Faculty of Biology, Medicine and Health, University of Manchester, Manchester, UK
| | - Vincent Michaud
- Department of Medical Genetics, University Hospital of Bordeaux, Bordeaux, France
- INSERM U1211, Rare Diseases, Genetics and Metabolism, University of Bordeaux, Bordeaux, France
| | - Eulalie Lasseaux
- Department of Medical Genetics, University Hospital of Bordeaux, Bordeaux, France
| | - Claudio Plaisant
- Department of Medical Genetics, University Hospital of Bordeaux, Bordeaux, France
| | - Tomas Fitzgerald
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL- EBI), Wellcome Genome Campus, Cambridge, UK
| | - Ewan Birney
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL- EBI), Wellcome Genome Campus, Cambridge, UK
| | - Graeme C Black
- Division of Evolution, Infection and Genomics, School of Biological Sciences, Faculty of Biology, Medicine and Health, University of Manchester, Manchester, UK
- Manchester Centre for Genomic Medicine, Saint Mary's Hospital, Manchester University NHS Foundation Trust, Manchester, UK
| | - Benoît Arveiler
- Department of Medical Genetics, University Hospital of Bordeaux, Bordeaux, France
- INSERM U1211, Rare Diseases, Genetics and Metabolism, University of Bordeaux, Bordeaux, France
| | - Panagiotis I Sergouniotis
- Division of Evolution, Infection and Genomics, School of Biological Sciences, Faculty of Biology, Medicine and Health, University of Manchester, Manchester, UK.
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL- EBI), Wellcome Genome Campus, Cambridge, UK.
- Manchester Centre for Genomic Medicine, Saint Mary's Hospital, Manchester University NHS Foundation Trust, Manchester, UK.
- Manchester Royal Eye Hospital, Manchester University NHS Foundation Trust, Manchester, UK.
| |
Collapse
|
3
|
Lin YJ, Menon AS, Hu Z, Brenner SE. Variant Impact Predictor database (VIPdb), version 2: trends from three decades of genetic variant impact predictors. Hum Genomics 2024; 18:90. [PMID: 39198917 PMCID: PMC11360829 DOI: 10.1186/s40246-024-00663-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2024] [Accepted: 08/19/2024] [Indexed: 09/01/2024] Open
Abstract
BACKGROUND Variant interpretation is essential for identifying patients' disease-causing genetic variants amongst the millions detected in their genomes. Hundreds of Variant Impact Predictors (VIPs), also known as Variant Effect Predictors (VEPs), have been developed for this purpose, with a variety of methodologies and goals. To facilitate the exploration of available VIP options, we have created the Variant Impact Predictor database (VIPdb). RESULTS The Variant Impact Predictor database (VIPdb) version 2 presents a collection of VIPs developed over the past three decades, summarizing their characteristics, ClinGen calibrated scores, CAGI assessment results, publication details, access information, and citation patterns. We previously summarized 217 VIPs and their features in VIPdb in 2019. Building upon this foundation, we identified and categorized an additional 190 VIPs, resulting in a total of 407 VIPs in VIPdb version 2. The majority of the VIPs have the capacity to predict the impacts of single nucleotide variants and nonsynonymous variants. More VIPs tailored to predict the impacts of insertions and deletions have been developed since the 2010s. In contrast, relatively few VIPs are dedicated to the prediction of splicing, structural, synonymous, and regulatory variants. The increasing rate of citations to VIPs reflects the ongoing growth in their use, and the evolving trends in citations reveal development in the field and individual methods. CONCLUSIONS VIPdb version 2 summarizes 407 VIPs and their features, potentially facilitating VIP exploration for various variant interpretation applications. VIPdb is available at https://genomeinterpretation.org/vipdb.
Collapse
Affiliation(s)
- Yu-Jen Lin
- Department of Molecular and Cell Biology, University of California, Berkeley, CA, 94720, USA
- Center for Computational Biology, University of California, Berkeley, CA, 94720, USA
| | - Arul S Menon
- Department of Molecular and Cell Biology, University of California, Berkeley, CA, 94720, USA
- College of Computing, Data Science, and Society, University of California, Berkeley, CA, 94720, USA
| | - Zhiqiang Hu
- Department of Plant and Microbial Biology, University of California, 111 Koshland Hall #3102, Berkeley, CA, 94720-3102, USA
- Illumina, Foster City, CA, 94404, USA
| | - Steven E Brenner
- Department of Molecular and Cell Biology, University of California, Berkeley, CA, 94720, USA.
- Center for Computational Biology, University of California, Berkeley, CA, 94720, USA.
- College of Computing, Data Science, and Society, University of California, Berkeley, CA, 94720, USA.
- Department of Plant and Microbial Biology, University of California, 111 Koshland Hall #3102, Berkeley, CA, 94720-3102, USA.
| |
Collapse
|
4
|
Moth CW, Sheehan JH, Mamun AA, Sivley RM, Gulsevin A, Rinker D, Capra JA, Meiler J. VUStruct: a compute pipeline for high throughput and personalized structural biology. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.08.06.606224. [PMID: 39149406 PMCID: PMC11326201 DOI: 10.1101/2024.08.06.606224] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/17/2024]
Abstract
Effective diagnosis and treatment of rare genetic disorders requires the interpretation of a patient's genetic variants of unknown significance (VUSs). Today, clinical decision-making is primarily guided by gene-phenotype association databases and DNA-based scoring methods. Our web-accessible variant analysis pipeline, VUStruct, supplements these established approaches by deeply analyzing the downstream molecular impact of variation in context of 3D protein structure. VUStruct's growing impact is fueled by the co-proliferation of protein 3D structural models, gene sequencing, compute power, and artificial intelligence. Contextualizing VUSs in protein 3D structural models also illuminates longitudinal genomics studies and biochemical bench research focused on VUS, and we created VUStruct for clinicians and researchers alike. We now introduce VUStruct to the broad scientific community as a mature, web-facing, extensible, High Performance Computing (HPC) software pipeline. VUStruct maps missense variants onto automatically selected protein structures and launches a broad range of analyses. These include energy-based assessments of protein folding and stability, pathogenicity prediction through spatial clustering analysis, and machine learning (ML) predictors of binding surface disruptions and nearby post-translational modification sites. The pipeline also considers the entire input set of VUS and identifies genes potentially involved in digenic disease. VUStruct's utility in clinical rare disease genome interpretation has been demonstrated through its analysis of over 175 Undiagnosed Disease Network (UDN) Patient cases. VUStruct-leveraged hypotheses have often informed clinicians in their consideration of additional patient testing, and we report here details from two cases where VUStruct was key to their solution. We also note successes with academic research collaborators, for whom VUStruct has informed research directions in both computational genomics and wet lab studies.
Collapse
Affiliation(s)
- Christopher W. Moth
- Departments of Chemistry, Pharmacology, and Biomedical Informatics; Center for Structural Biology and Institute of Chemical Biology; Vanderbilt Univ., Nashville, TN 37232, USA
| | - Jonathan H. Sheehan
- Division of Infection Diseases, Milliken Dept. of Internal Medicine, Washington Univ. of Medicine in St. Louis, MO 63110, USA
| | - Abdullah Al Mamun
- Departments of Chemistry, Pharmacology, and Biomedical Informatics; Center for Structural Biology and Institute of Chemical Biology; Vanderbilt Univ., Nashville, TN 37232, USA
| | | | - Alican Gulsevin
- Department of Pharmaceutical Sciences, College of Pharmacy and Health Sciences, Butler University, Indianapolis, IN 46208, USA
| | - David Rinker
- Department of Biological Sciences, Evolutionary Studies Initiative; Vanderbilt Univ., Nashville, TN 37232, USA
| | - John A. Capra
- Bakar Computational Health Science Institute and Department of Epidemiology and Biostatistics, Univ. of California San Francisco, CA 94143, USA
| | - Jens Meiler
- Departments of Chemistry, Pharmacology, and Biomedical Informatics; Center for Structural Biology and Institute of Chemical Biology; Vanderbilt Univ., Nashville, TN 37232, USA
- Leipzig University Medical School, Institute for Drug Discovery, Brüderstraße 34, 04103 Leipzig, Germany
| |
Collapse
|
5
|
Lu S, Niu Z, Qiao X. Exploring the Genotype-Phenotype Correlations in a Child with Inherited Seizure and Thrombocytopenia by Digenic Network Analysis. Genes (Basel) 2024; 15:1004. [PMID: 39202364 PMCID: PMC11353731 DOI: 10.3390/genes15081004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2024] [Revised: 07/24/2024] [Accepted: 07/27/2024] [Indexed: 09/03/2024] Open
Abstract
Understanding the correlation between genotype and phenotype remains challenging for modern genetics. Digenic network analysis may provide useful models for understanding complex phenotypes that traditional Mendelian monogenic models cannot explain. Clinical data, whole exome sequencing data, in silico, and machine learning analysis were combined to construct a digenic network that may help unveil the complex genotype-phenotype correlations in a child presenting with inherited seizures and thrombocytopenia. The proband inherited a maternal heterozygous missense variant in SCN1A (NM_001165963.4:c.2722G>A) and a paternal heterozygous missense variant in MYH9 (NM_002473.6:c.3323A>C). In silico analysis showed that these two variants may be pathogenic for inherited seizures and thrombocytopenia in the proband. Moreover, focusing on 230 epilepsy-associated genes and 35 thrombopoiesis genes, variant call format data of the proband were analyzed using machine learning tools (VarCoPP 2.0) and Digenic Effect predictor. A digenic network was constructed, and SCN1A and MYH9 were found to be core genes in the network. Further analysis showed that MYH9 might be a modifier of SCN1A, and the variant in MYH9 might not only influence the severity of SCN1A-related seizure but also lead to thrombocytopenia in the bone marrow. In addition, another eight variants might also be co-factors that account for the proband's complex phenotypes. Our data show that as a supplement to the traditional Mendelian monogenic model, digenic network analysis may provide reasonable models for the explanation of complex genotype-phenotype correlations.
Collapse
Affiliation(s)
| | | | - Xiaohong Qiao
- Department of Pediatrics, Tongji Hospital, Tongji University School of Medicine, 389 Xincun Road, Shanghai 200065, China; (S.L.); (Z.N.)
| |
Collapse
|
6
|
Lin YJ, Menon AS, Hu Z, Brenner SE. Variant Impact Predictor database (VIPdb), version 2: Trends from 25 years of genetic variant impact predictors. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.06.25.600283. [PMID: 38979289 PMCID: PMC11230257 DOI: 10.1101/2024.06.25.600283] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/10/2024]
Abstract
Background Variant interpretation is essential for identifying patients' disease-causing genetic variants amongst the millions detected in their genomes. Hundreds of Variant Impact Predictors (VIPs), also known as Variant Effect Predictors (VEPs), have been developed for this purpose, with a variety of methodologies and goals. To facilitate the exploration of available VIP options, we have created the Variant Impact Predictor database (VIPdb). Results The Variant Impact Predictor database (VIPdb) version 2 presents a collection of VIPs developed over the past 25 years, summarizing their characteristics, ClinGen calibrated scores, CAGI assessment results, publication details, access information, and citation patterns. We previously summarized 217 VIPs and their features in VIPdb in 2019. Building upon this foundation, we identified and categorized an additional 186 VIPs, resulting in a total of 403 VIPs in VIPdb version 2. The majority of the VIPs have the capacity to predict the impacts of single nucleotide variants and nonsynonymous variants. More VIPs tailored to predict the impacts of insertions and deletions have been developed since the 2010s. In contrast, relatively few VIPs are dedicated to the prediction of splicing, structural, synonymous, and regulatory variants. The increasing rate of citations to VIPs reflects the ongoing growth in their use, and the evolving trends in citations reveal development in the field and individual methods. Conclusions VIPdb version 2 summarizes 403 VIPs and their features, potentially facilitating VIP exploration for various variant interpretation applications. Availability VIPdb version 2 is available at https://genomeinterpretation.org/vipdb.
Collapse
Affiliation(s)
- Yu-Jen Lin
- Department of Molecular and Cell Biology, University of California, Berkeley, California 94720, USA
- Center for Computational Biology, University of California, Berkeley, California 94720, USA
| | - Arul S. Menon
- Department of Molecular and Cell Biology, University of California, Berkeley, California 94720, USA
- College of Computing, Data Science, and Society, University of California, Berkeley, California 94720, USA
| | - Zhiqiang Hu
- Department of Plant and Microbial Biology, University of California, Berkeley, California 94720, USA
- Currently at: Illumina, Foster City, California 94404, USA
| | - Steven E. Brenner
- Department of Molecular and Cell Biology, University of California, Berkeley, California 94720, USA
- Center for Computational Biology, University of California, Berkeley, California 94720, USA
- College of Computing, Data Science, and Society, University of California, Berkeley, California 94720, USA
- Department of Plant and Microbial Biology, University of California, Berkeley, California 94720, USA
| |
Collapse
|
7
|
Nachtegael C, De Stefani J, Cnudde A, Lenaerts T. DUVEL: an active-learning annotated biomedical corpus for the recognition of oligogenic combinations. Database (Oxford) 2024; 2024:baae039. [PMID: 38805753 PMCID: PMC11131422 DOI: 10.1093/database/baae039] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2024] [Revised: 04/17/2024] [Accepted: 05/13/2024] [Indexed: 05/30/2024]
Abstract
While biomedical relation extraction (bioRE) datasets have been instrumental in the development of methods to support biocuration of single variants from texts, no datasets are currently available for the extraction of digenic or even oligogenic variant relations, despite the reports in literature that epistatic effects between combinations of variants in different loci (or genes) are important to understand disease etiologies. This work presents the creation of a unique dataset of oligogenic variant combinations, geared to train tools to help in the curation of scientific literature. To overcome the hurdles associated with the number of unlabelled instances and the cost of expertise, active learning (AL) was used to optimize the annotation, thus getting assistance in finding the most informative subset of samples to label. By pre-annotating 85 full-text articles containing the relevant relations from the Oligogenic Diseases Database (OLIDA) with PubTator, text fragments featuring potential digenic variant combinations, i.e. gene-variant-gene-variant, were extracted. The resulting fragments of texts were annotated with ALAMBIC, an AL-based annotation platform. The resulting dataset, called DUVEL, is used to fine-tune four state-of-the-art biomedical language models: BiomedBERT, BiomedBERT-large, BioLinkBERT and BioM-BERT. More than 500 000 text fragments were considered for annotation, finally resulting in a dataset with 8442 fragments, 794 of them being positive instances, covering 95% of the original annotated articles. When applied to gene-variant pair detection, BiomedBERT-large achieves the highest F1 score (0.84) after fine-tuning, demonstrating significant improvement compared to the non-fine-tuned model, underlining the relevance of the DUVEL dataset. This study shows how AL may play an important role in the creation of bioRE dataset relevant for biomedical curation applications. DUVEL provides a unique biomedical corpus focusing on 4-ary relations between two genes and two variants. It is made freely available for research on GitHub and Hugging Face. Database URL: https://huggingface.co/datasets/cnachteg/duvel or https://doi.org/10.57967/hf/1571.
Collapse
Affiliation(s)
- Charlotte Nachtegael
- Interuniversity Institute of Bioinformatics in Brussels, Université Libre de Bruxelles-Vrije Universiteit Brussel, Boulevard du Triomphe, CP 263, Brussels 1050, Belgium
- Machine Learning Group, Université Libre de Bruxelles, Boulevard du Triomphe, CP 212, Brussels 1050, Belgium
| | - Jacopo De Stefani
- Machine Learning Group, Université Libre de Bruxelles, Boulevard du Triomphe, CP 212, Brussels 1050, Belgium
| | - Anthony Cnudde
- Machine Learning Group, Université Libre de Bruxelles, Boulevard du Triomphe, CP 212, Brussels 1050, Belgium
- Pharmacologie, Pharmacothérapie et Suivi Pharmaceutique, Université Libre de Bruxelles, Boulevard du Triomphe, CP 205, Brussels 1050, Belgium
| | - Tom Lenaerts
- Interuniversity Institute of Bioinformatics in Brussels, Université Libre de Bruxelles-Vrije Universiteit Brussel, Boulevard du Triomphe, CP 263, Brussels 1050, Belgium
- Machine Learning Group, Université Libre de Bruxelles, Boulevard du Triomphe, CP 212, Brussels 1050, Belgium
- Artificial Intelligence Laboratory, Vrije Universiteit Brussel, Pleinlaan 2, Brussels 1050, Belgium
| |
Collapse
|
8
|
Cowan QT, Gu S, Gu W, Ranzau BL, Simonson TS, Komor AC. Development of multiplexed orthogonal base editor (MOBE) systems. Nat Biotechnol 2024:10.1038/s41587-024-02240-0. [PMID: 38773305 DOI: 10.1038/s41587-024-02240-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2022] [Accepted: 04/10/2024] [Indexed: 05/23/2024]
Abstract
Base editors (BEs) enable efficient, programmable installation of point mutations while avoiding the use of double-strand breaks. Simultaneous application of two or more different BEs, such as an adenine BE (which converts A·T base pairs to G·C) and a cytosine BE (which converts C·G base pairs to T·A), is not feasible because guide RNA crosstalk results in non-orthogonal editing, with all BEs modifying all target loci. Here we engineer both adenine BEs and cytosine BEs that can be orthogonally multiplexed by using RNA aptamer-coat protein systems to recruit the DNA-modifying enzymes directly to the guide RNAs. We generate four multiplexed orthogonal BE systems that enable rates of precise co-occurring edits of up to 7.1% in the same DNA strand without enrichment or selection strategies. The addition of a fluorescent enrichment strategy increases co-occurring edit rates up to 24.8% in human cells. These systems are compatible with expanded protospacer adjacent motif and high-fidelity Cas9 variants, function well in multiple cell types, have equivalent or reduced off-target propensities compared with their parental systems and can model disease-relevant point mutation combinations.
Collapse
Affiliation(s)
- Quinn T Cowan
- Department of Chemistry and Biochemistry, University of California San Diego, La Jolla, CA, USA
| | - Sifeng Gu
- Department of Chemistry and Biochemistry, University of California San Diego, La Jolla, CA, USA
| | - Wanjun Gu
- Department of Medicine, Division of Pulmonary, Critical Care, Sleep Medicine, and Physiology, University of California San Diego, La Jolla, CA, USA
| | - Brodie L Ranzau
- Department of Chemistry and Biochemistry, University of California San Diego, La Jolla, CA, USA
| | - Tatum S Simonson
- Department of Medicine, Division of Pulmonary, Critical Care, Sleep Medicine, and Physiology, University of California San Diego, La Jolla, CA, USA
| | - Alexis C Komor
- Department of Chemistry and Biochemistry, University of California San Diego, La Jolla, CA, USA.
| |
Collapse
|
9
|
Neuhofer CM, Prokisch H. Digenic Inheritance in Rare Disorders and Mitochondrial Disease-Crossing the Frontier to a More Comprehensive Understanding of Etiology. Int J Mol Sci 2024; 25:4602. [PMID: 38731822 PMCID: PMC11083678 DOI: 10.3390/ijms25094602] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2024] [Revised: 04/10/2024] [Accepted: 04/12/2024] [Indexed: 05/13/2024] Open
Abstract
Our understanding of rare disease genetics has been shaped by a monogenic disease model. While the traditional monogenic disease model has been successful in identifying numerous disease-associated genes and significantly enlarged our knowledge in the field of human genetics, it has limitations in explaining phenomena like phenotypic variability and reduced penetrance. Widening the perspective beyond Mendelian inheritance has the potential to enable a better understanding of disease complexity in rare disorders. Digenic inheritance is the simplest instance of a non-Mendelian disorder, characterized by the functional interplay of variants in two disease-contributing genes. Known digenic disease causes show a range of pathomechanisms underlying digenic interplay, including direct and indirect gene product interactions as well as epigenetic modifications. This review aims to systematically explore the background of digenic inheritance in rare disorders, the approaches and challenges when investigating digenic inheritance, and the current evidence for digenic inheritance in mitochondrial disorders.
Collapse
Affiliation(s)
- Christiane M. Neuhofer
- Institute of Human Genetics, University Medical Center, Technical University of Munich, Trogerstr. 32, 81675 Munich, Germany
- Institute of Neurogenomics, Computational Health Center, Helmholtz Centre Munich Neuherberg, Ingolstädter Landstraße 1, 85764 Oberschleißheim, Germany
- Institute of Human Genetics, Salzburger Landeskliniken, University Hospital of the Paracelsus Medical University, Müllner Hauptstraße 48, 5020 Salzburg, Austria
| | - Holger Prokisch
- Institute of Human Genetics, University Medical Center, Technical University of Munich, Trogerstr. 32, 81675 Munich, Germany
- Institute of Neurogenomics, Computational Health Center, Helmholtz Centre Munich Neuherberg, Ingolstädter Landstraße 1, 85764 Oberschleißheim, Germany
| |
Collapse
|
10
|
Gravel B, Renaux A, Papadimitriou S, Smits G, Nowé A, Lenaerts T. Prioritization of oligogenic variant combinations in whole exomes. Bioinformatics 2024; 40:btae184. [PMID: 38603604 PMCID: PMC11037482 DOI: 10.1093/bioinformatics/btae184] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2023] [Revised: 01/29/2024] [Accepted: 04/10/2024] [Indexed: 04/13/2024] Open
Abstract
MOTIVATION Whole exome sequencing (WES) has emerged as a powerful tool for genetic research, enabling the collection of a tremendous amount of data about human genetic variation. However, properly identifying which variants are causative of a genetic disease remains an important challenge, often due to the number of variants that need to be screened. Expanding the screening to combinations of variants in two or more genes, as would be required under the oligogenic inheritance model, simply blows this problem out of proportion. RESULTS We present here the High-throughput oligogenic prioritizer (Hop), a novel prioritization method that uses direct oligogenic information at the variant, gene and gene pair level to detect digenic variant combinations in WES data. This method leverages information from a knowledge graph, together with specialized pathogenicity predictions in order to effectively rank variant combinations based on how likely they are to explain the patient's phenotype. The performance of Hop is evaluated in cross-validation on 36 120 synthetic exomes for training and 14 280 additional synthetic exomes for independent testing. Whereas the known pathogenic variant combinations are found in the top 20 in approximately 60% of the cross-validation exomes, 71% are found in the same ranking range when considering the independent set. These results provide a significant improvement over alternative approaches that depend simply on a monogenic assessment of pathogenicity, including early attempts for digenic ranking using monogenic pathogenicity scores. AVAILABILITY AND IMPLEMENTATION Hop is available at https://github.com/oligogenic/HOP.
Collapse
Affiliation(s)
- Barbara Gravel
- Interuniversity Institute of Bioinformatics in Brussels, Université Libre de Bruxelles-Vrije Universiteit Brussel, 1050 Brussels, Belgium
- Department of Computer Science, Machine Learning Group, Université Libre de Bruxelles, 1050 Brussels, Belgium
- Department of Computer Science, Artificial Intelligence Laboratory, Vrije Universiteit Brussels, 1050 Brussels, Belgium
| | - Alexandre Renaux
- Interuniversity Institute of Bioinformatics in Brussels, Université Libre de Bruxelles-Vrije Universiteit Brussel, 1050 Brussels, Belgium
- Department of Computer Science, Machine Learning Group, Université Libre de Bruxelles, 1050 Brussels, Belgium
- Department of Computer Science, Artificial Intelligence Laboratory, Vrije Universiteit Brussels, 1050 Brussels, Belgium
| | - Sofia Papadimitriou
- Interuniversity Institute of Bioinformatics in Brussels, Université Libre de Bruxelles-Vrije Universiteit Brussel, 1050 Brussels, Belgium
- Department of Computer Science, Machine Learning Group, Université Libre de Bruxelles, 1050 Brussels, Belgium
- Brussels Interuniversity Genomics High Throughput core (BRIGHTcore), UZ Brussel, Vrije Universiteit Brussel (VUB) - Université Libre de Bruxelles (ULB), 1090 Brussels, Belgium
| | - Guillaume Smits
- Interuniversity Institute of Bioinformatics in Brussels, Université Libre de Bruxelles-Vrije Universiteit Brussel, 1050 Brussels, Belgium
- Center of Human Genetics, Hôpital Erasme, Hôpital Universitaire de Bruxelles, Université Libre de Bruxelles, 1070 Brussels, Belgium
| | - Ann Nowé
- Interuniversity Institute of Bioinformatics in Brussels, Université Libre de Bruxelles-Vrije Universiteit Brussel, 1050 Brussels, Belgium
- Department of Computer Science, Artificial Intelligence Laboratory, Vrije Universiteit Brussels, 1050 Brussels, Belgium
| | - Tom Lenaerts
- Interuniversity Institute of Bioinformatics in Brussels, Université Libre de Bruxelles-Vrije Universiteit Brussel, 1050 Brussels, Belgium
- Department of Computer Science, Machine Learning Group, Université Libre de Bruxelles, 1050 Brussels, Belgium
- Department of Computer Science, Artificial Intelligence Laboratory, Vrije Universiteit Brussels, 1050 Brussels, Belgium
| |
Collapse
|
11
|
Long P, Wang L, Tan H, Quan R, Hu Z, Zeng M, Deng Z, Huang H, Greenbaum J, Deng H, Xiao H. Oligogenic basis of premature ovarian insufficiency: an observational study. J Ovarian Res 2024; 17:32. [PMID: 38310280 PMCID: PMC10837925 DOI: 10.1186/s13048-024-01351-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2023] [Accepted: 01/13/2024] [Indexed: 02/05/2024] Open
Abstract
BACKGROUND The etiology of premature ovarian insufficiency, that is, the loss of ovarian activity before 40 years of age, is complex. Studies suggest that genetic factors are involved in 20-25% of cases. The aim of this study was to explore the oligogenic basis of premature ovarian insufficiency. RESULTS Whole-exome sequencing of 93 patients with POI and whole-genome sequencing of 465 controls were performed. In the gene-burden analysis, multiple genetic variants, including those associated with DNA damage repair and meiosis, were more common in participants with premature ovarian insufficiency than in controls. The ORVAL-platform analysis confirmed the pathogenicity of the RAD52 and MSH6 combination. CONCLUSIONS The results of this study indicate that oligogenic inheritance is an important cause of premature ovarian insufficiency and provide insights into the biological mechanisms underlying premature ovarian insufficiency.
Collapse
Affiliation(s)
- Panpan Long
- Institute of Reproductive & Stem Cell Engineering, School of Basic Medical Science, Central South University, 88 Xiangya Road, Changsha, 410008, Hunan, China
- Center of Reproductive Health, School of Basic Medical Science, Central South University, Changsha, China
| | - Le Wang
- Institute of Reproductive & Stem Cell Engineering, School of Basic Medical Science, Central South University, 88 Xiangya Road, Changsha, 410008, Hunan, China
- Center of Reproductive Health, School of Basic Medical Science, Central South University, Changsha, China
- Biomedical Research Center, Hunan University of Medicine, Huaihua, China
| | - Hangjing Tan
- Institute of Reproductive & Stem Cell Engineering, School of Basic Medical Science, Central South University, 88 Xiangya Road, Changsha, 410008, Hunan, China
- Center of Reproductive Health, School of Basic Medical Science, Central South University, Changsha, China
| | - Ruping Quan
- Institute of Reproductive & Stem Cell Engineering, School of Basic Medical Science, Central South University, 88 Xiangya Road, Changsha, 410008, Hunan, China
- Center of Reproductive Health, School of Basic Medical Science, Central South University, Changsha, China
| | - Zihao Hu
- Institute of Reproductive & Stem Cell Engineering, School of Basic Medical Science, Central South University, 88 Xiangya Road, Changsha, 410008, Hunan, China
- Center of Reproductive Health, School of Basic Medical Science, Central South University, Changsha, China
| | - Minghua Zeng
- Institute of Reproductive & Stem Cell Engineering, School of Basic Medical Science, Central South University, 88 Xiangya Road, Changsha, 410008, Hunan, China
- Center of Reproductive Health, School of Basic Medical Science, Central South University, Changsha, China
| | - Ziheng Deng
- Institute of Reproductive & Stem Cell Engineering, School of Basic Medical Science, Central South University, 88 Xiangya Road, Changsha, 410008, Hunan, China
- Center of Reproductive Health, School of Basic Medical Science, Central South University, Changsha, China
| | - Hualin Huang
- Reproductive Medicine Center, Department of Obstetrics and Gynecology, The Second Xiangya Hospital, Central South University, Changsha, China
| | - Jonathan Greenbaum
- Center of Biomedical Informatics and Genomics, Deming Department of Medicine, Tulane University School of Medicine, New Orleans, LA, USA
| | - Hongwen Deng
- Center of Biomedical Informatics and Genomics, Deming Department of Medicine, Tulane University School of Medicine, New Orleans, LA, USA
| | - Hongmei Xiao
- Institute of Reproductive & Stem Cell Engineering, School of Basic Medical Science, Central South University, 88 Xiangya Road, Changsha, 410008, Hunan, China.
- Center of Reproductive Health, School of Basic Medical Science, Central South University, Changsha, China.
| |
Collapse
|
12
|
Yun M, Deng Z, Navetta-Modrov B, Xin B, Yang J, Nomani H, Aroniadis O, Gorevic PD, Yao Q. Genetic variations in NLRP3 and NLRP12 genes in adult-onset patients with autoinflammatory diseases: a comparative study. Front Immunol 2024; 14:1321370. [PMID: 38343435 PMCID: PMC10853347 DOI: 10.3389/fimmu.2023.1321370] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2023] [Accepted: 12/26/2023] [Indexed: 02/15/2024] Open
Abstract
Objectives Cryopyrin-associated periodic syndrome or NLRP3-associated autoinflammatory disease (NLRP3-AID) and NLRP12-AID are both Mendelian disorders with autosomal dominant inheritance. Both diseases are rare, primarily reported in the pediatric population, and are thought to be phenotypically indistinguishable. We provide the largest cohort of adult-onset patients and compared these diseases and the gene variant frequency to population controls. Methods A cohort of adult patients with AIDs were retrospectively studied. All underwent molecular testing for periodic fever syndrome gene panels after extensive and negative workups for systemic autoimmune and other related diseases. Patients were divided into Group 1- NLRP3-AID patients with NLRP3 variants (N=15), Group 2- NLRP12-AID with NLRP12 variants (N=14) and Group 3- both NLRP3 and NLRP12 (N=9) variants. Exome sequence data of two large control populations including the ARIC study were used to compare gene variant distribution and frequency. Results All 38 patients were Caucasian with women accounting for 82%. Median age at diagnosis was 41 ± 23 years and the disease duration at diagnosis was 14 ± 13 years. We identified statistically significant differences between the groups, notably that gastrointestinal symptoms as well as evaluations for same were significantly more frequent in patients with NLRP12 variants, and headaches/dizziness were less common among the NLRP12 patients. Livedo reticularis was noted in four patients, exclusively among NLRP12 carriers. Over 50% of patients in Groups 1 and 2 carry low-frequency disease-associated variants, while the remaining carry rare variants. We unprecedently identified digenic variants, i.e., the coexistence of NLRP3 and NLRP12, which were either both low frequency or low frequency/rare. Allele frequencies of all variants identified in our cohort were either absent or significantly lower in the control populations, further strengthening the evidence of susceptibility of these variants to SAID phenotypes. Conclusion Our comparative study shows that both NLRP3-AID and NLRP12-AID share similar clinical phenotypes, yet there are significant differences between them with regard to gastrointestinal and neurological symptoms. A spectrum of high to low genetic variations in both genes can contribute to SAID individually or in combination.
Collapse
Affiliation(s)
- Mark Yun
- Division of Rheumatology, Allergy and Immunology, Stony Brook University Renaissance School of Medicine, Stony Brook, NY, United States
| | - Zuoming Deng
- Biodata Mining and Discovery Section, National Institute of Arthritis and Musculoskeletal and Skin Diseases, Bethesda, MD, United States
| | - Brianne Navetta-Modrov
- Division of Rheumatology, Allergy and Immunology, Stony Brook University Renaissance School of Medicine, Stony Brook, NY, United States
| | - Baozhong Xin
- Molecular Diagnostics Laboratory, DDC Clinic for Special Needs Children, Middlefield, OH, United States
| | - Jie Yang
- Department of Family, Population and Preventive Medicine, Stony Brook University Renaissance School of Medicine, Stony Brook, NY, United States
| | - Hafsa Nomani
- Division of Rheumatology, Allergy and Immunology, Stony Brook University Renaissance School of Medicine, Stony Brook, NY, United States
| | - Olga Aroniadis
- Division of Gastroenterology and Hepatology, Stony Brook University Renaissance School of Medicine, Stony Brook, NY, United States
| | - Peter D. Gorevic
- Division of Rheumatology, Allergy and Immunology, Stony Brook University Renaissance School of Medicine, Stony Brook, NY, United States
| | - Qingping Yao
- Division of Rheumatology, Allergy and Immunology, Stony Brook University Renaissance School of Medicine, Stony Brook, NY, United States
| |
Collapse
|
13
|
Horackova K, Janatova M, Kleiblova P, Kleibl Z, Soukupova J. Early-Onset Ovarian Cancer <30 Years: What Do We Know about Its Genetic Predisposition? Int J Mol Sci 2023; 24:17020. [PMID: 38069345 PMCID: PMC10707471 DOI: 10.3390/ijms242317020] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2023] [Revised: 11/27/2023] [Accepted: 11/29/2023] [Indexed: 12/18/2023] Open
Abstract
Ovarian cancer (OC) is one of the leading causes of cancer-related deaths in women. Most patients are diagnosed with advanced epithelial OC in their late 60s, and early-onset adult OC diagnosed ≤30 years is rare, accounting for less than 5% of all OC cases. The most significant risk factor for OC development are germline pathogenic/likely pathogenic variants (GPVs) in OC predisposition genes (including BRCA1, BRCA2, BRIP1, RAD51C, RAD51D, Lynch syndrome genes, or BRIP1), which contribute to the development of over 20% of all OC cases. GPVs in BRCA1/BRCA2 are the most prevalent. The presence of a GPV directs tailored cancer risk-reducing strategies for OC patients and their relatives. Identification of OC patients with GPVs can also have therapeutic consequences. Despite the general assumption that early cancer onset indicates higher involvement of hereditary cancer predisposition, the presence of GPVs in early-onset OC is rare (<10% of patients), and their heritability is uncertain. This review summarizes the current knowledge on the genetic predisposition to early-onset OC, with a special focus on epithelial OC, and suggests other alternative genetic factors (digenic, oligogenic, polygenic heritability, genetic mosaicism, imprinting, etc.) that may influence the development of early-onset OC in adult women lacking GPVs in known OC predisposition genes.
Collapse
Affiliation(s)
- Klara Horackova
- Institute of Medical Biochemistry and Laboratory Diagnostics, First Faculty of Medicine, Charles University and General University Hospital in Prague, 128 00 Prague, Czech Republic; (K.H.); (M.J.); (P.K.); (Z.K.)
| | - Marketa Janatova
- Institute of Medical Biochemistry and Laboratory Diagnostics, First Faculty of Medicine, Charles University and General University Hospital in Prague, 128 00 Prague, Czech Republic; (K.H.); (M.J.); (P.K.); (Z.K.)
| | - Petra Kleiblova
- Institute of Medical Biochemistry and Laboratory Diagnostics, First Faculty of Medicine, Charles University and General University Hospital in Prague, 128 00 Prague, Czech Republic; (K.H.); (M.J.); (P.K.); (Z.K.)
- Institute of Biology and Medical Genetics, First Faculty of Medicine, Charles University and General University Hospital in Prague, 128 00 Prague, Czech Republic
| | - Zdenek Kleibl
- Institute of Medical Biochemistry and Laboratory Diagnostics, First Faculty of Medicine, Charles University and General University Hospital in Prague, 128 00 Prague, Czech Republic; (K.H.); (M.J.); (P.K.); (Z.K.)
- Institute of Pathological Physiology, First Faculty of Medicine, Charles University, 128 00 Prague, Czech Republic
| | - Jana Soukupova
- Institute of Medical Biochemistry and Laboratory Diagnostics, First Faculty of Medicine, Charles University and General University Hospital in Prague, 128 00 Prague, Czech Republic; (K.H.); (M.J.); (P.K.); (Z.K.)
| |
Collapse
|
14
|
Nomani H, Deng Z, Navetta-Modrov B, Yang J, Yun M, Aroniadis O, Gorevic P, Aksentijevich I, Yao Q. Implications of combined NOD2 and other gene mutations in autoinflammatory diseases. Front Immunol 2023; 14:1265404. [PMID: 37928541 PMCID: PMC10620916 DOI: 10.3389/fimmu.2023.1265404] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2023] [Accepted: 10/09/2023] [Indexed: 11/07/2023] Open
Abstract
NOD-like receptors (NLRs) are intracellular sensors associated with systemic autoinflammatory diseases (SAIDs). We investigated the largest monocentric cohort of patients with adult-onset SAIDs for coinheritance of low frequency and rare mutations in NOD2 and other autoinflammatory genes. Sixty-three patients underwent molecular testing for SAID gene panels after extensive clinical workups. Whole exome sequencing data from the large Atherosclerosis Risk in Communities (ARIC) study of individuals of European-American ancestry were used as control. Of 63 patients, 44 (69.8%) were found to carry combined gene variants in NOD2 and another gene (Group 1), and 19 (30.2%) were carriers only for NOD2 variants (Group 2). The genetic variant combinations in SAID patients were digenic in 66% (NOD2/MEFV, NOD2/NLRP12, NOD2/NLRP3, and NOD2/TNFRSF1A) and oligogenic in 34% of cases. These variant combinations were either absent or significantly less frequent in the control population. By phenotype-genotype correlation, approximately 40% of patients met diagnostic criteria for a specific SAID, and 60% had mixed diagnoses. There were no statistically significant differences in clinical manifestations between the two patient groups except for chest pain. Due to overlapping phenotypes and mixed genotypes, we have suggested a new term, "Mixed NLR-associated Autoinflammatory Disease ", to describe this disease scenario. Gene variant combinations are significant in patients with SAIDs primarily presenting with mixed clinical phenotypes. Our data support the proposition that immunological disease expression is modified by genetic background and environmental exposure. We provide a preliminary framework in diagnosis, management, and interpretation of the clinical scenario.
Collapse
Affiliation(s)
- Hafsa Nomani
- Division of Rheumatology, Allergy and Immunology, Stony Brook University Renaissance School of Medicine, Stony Brook, NY, United States
| | - Zuoming Deng
- Biodata Mining and Discovery Section, National Institute of Arthritis and Musculoskeletal and Skin Diseases, Bethesda, MD, United States
| | - Brianne Navetta-Modrov
- Division of Rheumatology, Allergy and Immunology, Stony Brook University Renaissance School of Medicine, Stony Brook, NY, United States
| | - Jie Yang
- Department of Family, Population and Preventive Medicine, Stony Brook University Renaissance School of Medicine, Stony Brook, NY, United States
| | - Mark Yun
- Division of Rheumatology, Allergy and Immunology, Stony Brook University Renaissance School of Medicine, Stony Brook, NY, United States
| | - Olga Aroniadis
- Division of Gastroenterology and Hepatology, Stony Brook University Renaissance School of Medicine, Stony Brook, NY, United States
| | - Peter Gorevic
- Division of Rheumatology, Allergy and Immunology, Stony Brook University Renaissance School of Medicine, Stony Brook, NY, United States
| | - Ivona Aksentijevich
- Inflammatory Disease Section, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, United States
| | - Qingping Yao
- Division of Rheumatology, Allergy and Immunology, Stony Brook University Renaissance School of Medicine, Stony Brook, NY, United States
| |
Collapse
|
15
|
Renaux A, Terwagne C, Cochez M, Tiddi I, Nowé A, Lenaerts T. A knowledge graph approach to predict and interpret disease-causing gene interactions. BMC Bioinformatics 2023; 24:324. [PMID: 37644440 PMCID: PMC10463539 DOI: 10.1186/s12859-023-05451-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2023] [Accepted: 08/22/2023] [Indexed: 08/31/2023] Open
Abstract
BACKGROUND Understanding the impact of gene interactions on disease phenotypes is increasingly recognised as a crucial aspect of genetic disease research. This trend is reflected by the growing amount of clinical research on oligogenic diseases, where disease manifestations are influenced by combinations of variants on a few specific genes. Although statistical machine-learning methods have been developed to identify relevant genetic variant or gene combinations associated with oligogenic diseases, they rely on abstract features and black-box models, posing challenges to interpretability for medical experts and impeding their ability to comprehend and validate predictions. In this work, we present a novel, interpretable predictive approach based on a knowledge graph that not only provides accurate predictions of disease-causing gene interactions but also offers explanations for these results. RESULTS We introduce BOCK, a knowledge graph constructed to explore disease-causing genetic interactions, integrating curated information on oligogenic diseases from clinical cases with relevant biomedical networks and ontologies. Using this graph, we developed a novel predictive framework based on heterogenous paths connecting gene pairs. This method trains an interpretable decision set model that not only accurately predicts pathogenic gene interactions, but also unveils the patterns associated with these diseases. A unique aspect of our approach is its ability to offer, along with each positive prediction, explanations in the form of subgraphs, revealing the specific entities and relationships that led to each pathogenic prediction. CONCLUSION Our method, built with interpretability in mind, leverages heterogenous path information in knowledge graphs to predict pathogenic gene interactions and generate meaningful explanations. This not only broadens our understanding of the molecular mechanisms underlying oligogenic diseases, but also presents a novel application of knowledge graphs in creating more transparent and insightful predictors for genetic research.
Collapse
Affiliation(s)
- Alexandre Renaux
- Interuniversity Institute of Bioinformatics in Brussels, Université Libre de Bruxelles - Vrije Universiteit Brussel, Brussels, Belgium
- Machine Learning Group, Université Libre de Bruxelles, Brussels, Belgium
- Artificial Intelligence lab, Vrije Universiteit Brussel, Brussels, Belgium
| | - Chloé Terwagne
- Interuniversity Institute of Bioinformatics in Brussels, Université Libre de Bruxelles - Vrije Universiteit Brussel, Brussels, Belgium
- Machine Learning Group, Université Libre de Bruxelles, Brussels, Belgium
| | - Michael Cochez
- Computer Science, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
- Discovery Lab, Elsevier, Amsterdam, The Netherlands
| | - Ilaria Tiddi
- Computer Science, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
| | - Ann Nowé
- Interuniversity Institute of Bioinformatics in Brussels, Université Libre de Bruxelles - Vrije Universiteit Brussel, Brussels, Belgium
- Artificial Intelligence lab, Vrije Universiteit Brussel, Brussels, Belgium
| | - Tom Lenaerts
- Interuniversity Institute of Bioinformatics in Brussels, Université Libre de Bruxelles - Vrije Universiteit Brussel, Brussels, Belgium
- Machine Learning Group, Université Libre de Bruxelles, Brussels, Belgium
- Artificial Intelligence lab, Vrije Universiteit Brussel, Brussels, Belgium
| |
Collapse
|
16
|
Versbraegen N, Gravel B, Nachtegael C, Renaux A, Verkinderen E, Nowé A, Lenaerts T, Papadimitriou S. Faster and more accurate pathogenic combination predictions with VarCoPP2.0. BMC Bioinformatics 2023; 24:179. [PMID: 37127601 PMCID: PMC10152795 DOI: 10.1186/s12859-023-05291-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2022] [Accepted: 04/14/2023] [Indexed: 05/03/2023] Open
Abstract
BACKGROUND The prediction of potentially pathogenic variant combinations in patients remains a key task in the field of medical genetics for the understanding and detection of oligogenic/multilocus diseases. Models tailored towards such cases can help shorten the gap of missing diagnoses and can aid researchers in dealing with the high complexity of the derived data. The predictor VarCoPP (Variant Combinations Pathogenicity Predictor) that was published in 2019 and identified potentially pathogenic variant combinations in gene pairs (bilocus variant combinations), was the first important step in this direction. Despite its usefulness and applicability, several issues still remained that hindered a better performance, such as its False Positive (FP) rate, the quality of its training set and its complex architecture. RESULTS We present VarCoPP2.0: the successor of VarCoPP that is a simplified, faster and more accurate predictive model identifying potentially pathogenic bilocus variant combinations. Results from cross-validation and on independent data sets reveal that VarCoPP2.0 has improved in terms of both sensitivity (95% in cross-validation and 98% during testing) and specificity (5% FP rate). At the same time, its running time shows a significant 150-fold decrease due to the selection of a simpler Balanced Random Forest model. Its positive training set now consists of variant combinations that are more confidently linked with evidence of pathogenicity, based on the confidence scores present in OLIDA, the Oligogenic Diseases Database ( https://olida.ibsquare.be ). The improvement of its performance is also attributed to a more careful selection of up-to-date features identified via an original wrapper method. We show that the combination of different variant and gene pair features together is important for predictions, highlighting the usefulness of integrating biological information at different levels. CONCLUSIONS Through its improved performance and faster execution time, VarCoPP2.0 enables a more accurate analysis of larger data sets linked to oligogenic diseases. Users can access the ORVAL platform ( https://orval.ibsquare.be ) to apply VarCoPP2.0 on their data.
Collapse
Affiliation(s)
- Nassim Versbraegen
- Machine Learning Group, Université Libre de Bruxelles, 1050, Brussels, Belgium.
- Interuniversity Institute of Bioinformatics in Brussels, Université Libre de Bruxelles-Vrije Universiteit Brussel, 1050, Brussels, Belgium.
| | - Barbara Gravel
- Machine Learning Group, Université Libre de Bruxelles, 1050, Brussels, Belgium
- Interuniversity Institute of Bioinformatics in Brussels, Université Libre de Bruxelles-Vrije Universiteit Brussel, 1050, Brussels, Belgium
- Artificial Intelligence Laboratory, Vrije Universiteit Brussel, 1050, Brussels, Belgium
| | - Charlotte Nachtegael
- Machine Learning Group, Université Libre de Bruxelles, 1050, Brussels, Belgium
- Interuniversity Institute of Bioinformatics in Brussels, Université Libre de Bruxelles-Vrije Universiteit Brussel, 1050, Brussels, Belgium
| | - Alexandre Renaux
- Machine Learning Group, Université Libre de Bruxelles, 1050, Brussels, Belgium
- Interuniversity Institute of Bioinformatics in Brussels, Université Libre de Bruxelles-Vrije Universiteit Brussel, 1050, Brussels, Belgium
- Artificial Intelligence Laboratory, Vrije Universiteit Brussel, 1050, Brussels, Belgium
| | - Emma Verkinderen
- Machine Learning Group, Université Libre de Bruxelles, 1050, Brussels, Belgium
- Interuniversity Institute of Bioinformatics in Brussels, Université Libre de Bruxelles-Vrije Universiteit Brussel, 1050, Brussels, Belgium
| | - Ann Nowé
- Interuniversity Institute of Bioinformatics in Brussels, Université Libre de Bruxelles-Vrije Universiteit Brussel, 1050, Brussels, Belgium
- Artificial Intelligence Laboratory, Vrije Universiteit Brussel, 1050, Brussels, Belgium
| | - Tom Lenaerts
- Machine Learning Group, Université Libre de Bruxelles, 1050, Brussels, Belgium
- Interuniversity Institute of Bioinformatics in Brussels, Université Libre de Bruxelles-Vrije Universiteit Brussel, 1050, Brussels, Belgium
- Artificial Intelligence Laboratory, Vrije Universiteit Brussel, 1050, Brussels, Belgium
| | - Sofia Papadimitriou
- Machine Learning Group, Université Libre de Bruxelles, 1050, Brussels, Belgium
- Interuniversity Institute of Bioinformatics in Brussels, Université Libre de Bruxelles-Vrije Universiteit Brussel, 1050, Brussels, Belgium
- Artificial Intelligence Laboratory, Vrije Universiteit Brussel, 1050, Brussels, Belgium
| |
Collapse
|
17
|
Papadimitriou S, Gravel B, Nachtegael C, De Baere E, Loeys B, Vikkula M, Smits G, Lenaerts T. Toward reporting standards for the pathogenicity of variant combinations involved in multilocus/oligogenic diseases. HGG ADVANCES 2022; 4:100165. [PMID: 36578772 PMCID: PMC9791921 DOI: 10.1016/j.xhgg.2022.100165] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/03/2022] Open
Abstract
Although standards and guidelines for the interpretation of variants identified in genes that cause Mendelian disorders have been developed, this is not the case for more complex genetic models including variant combinations in multiple genes. During a large curation process conducted on 318 research articles presenting oligogenic variant combinations, we encountered several recurring issues concerning their proper reporting and pathogenicity assessment. These mainly concern the absence of strong evidence that refutes a monogenic model and the lack of a proper genetic and functional assessment of the joint effect of the involved variants. With the increasing accumulation of such cases, it has become essential to develop standards and guidelines on how these oligogenic/multilocus variant combinations should be interpreted, validated, and reported in order to provide high-quality data and supporting evidence to the scientific community.
Collapse
Affiliation(s)
- Sofia Papadimitriou
- Interuniversity Institute of Bioinformatics in Brussels, Université Libre de Bruxelles-Vrije Universiteit Brussel, 1050 Brussels, Belgium,Machine Learning Group, Université Libre de Bruxelles, 1050 Brussels, Belgium,Artificial Intelligence Laboratory, Vrije Universiteit Brussel, 1050 Brussels, Belgium,Corresponding author
| | - Barbara Gravel
- Interuniversity Institute of Bioinformatics in Brussels, Université Libre de Bruxelles-Vrije Universiteit Brussel, 1050 Brussels, Belgium,Machine Learning Group, Université Libre de Bruxelles, 1050 Brussels, Belgium,Artificial Intelligence Laboratory, Vrije Universiteit Brussel, 1050 Brussels, Belgium
| | - Charlotte Nachtegael
- Interuniversity Institute of Bioinformatics in Brussels, Université Libre de Bruxelles-Vrije Universiteit Brussel, 1050 Brussels, Belgium,Machine Learning Group, Université Libre de Bruxelles, 1050 Brussels, Belgium
| | - Elfride De Baere
- Center for Medical Genetics, Ghent University Hospital, Department of Biomolecular Medicine, Ghent University, 9000 Ghent, Belgium
| | - Bart Loeys
- Center for Medical Genetics, Antwerp University Hospital/University of Antwerp, 2650 Antwerp, Belgium
| | - Miikka Vikkula
- Human Molecular Genetics, de Duve Institute, UCLouvain, Brussels, Belgium
| | - Guillaume Smits
- Interuniversity Institute of Bioinformatics in Brussels, Université Libre de Bruxelles-Vrije Universiteit Brussel, 1050 Brussels, Belgium,Center of Human Genetics, Hôpital Erasme, Université Libre de Bruxelles, 1070 Brussels, Belgium,Hôpital Universitaire des Enfants Reine Fabiola, Université Libre de Bruxelles, 1020 Brussels, Belgium
| | - Tom Lenaerts
- Interuniversity Institute of Bioinformatics in Brussels, Université Libre de Bruxelles-Vrije Universiteit Brussel, 1050 Brussels, Belgium,Machine Learning Group, Université Libre de Bruxelles, 1050 Brussels, Belgium,Artificial Intelligence Laboratory, Vrije Universiteit Brussel, 1050 Brussels, Belgium,Corresponding author
| |
Collapse
|