1
|
Neuhofer CM, Prokisch H. Digenic Inheritance in Rare Disorders and Mitochondrial Disease-Crossing the Frontier to a More Comprehensive Understanding of Etiology. Int J Mol Sci 2024; 25:4602. [PMID: 38731822 PMCID: PMC11083678 DOI: 10.3390/ijms25094602] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2024] [Revised: 04/10/2024] [Accepted: 04/12/2024] [Indexed: 05/13/2024] Open
Abstract
Our understanding of rare disease genetics has been shaped by a monogenic disease model. While the traditional monogenic disease model has been successful in identifying numerous disease-associated genes and significantly enlarged our knowledge in the field of human genetics, it has limitations in explaining phenomena like phenotypic variability and reduced penetrance. Widening the perspective beyond Mendelian inheritance has the potential to enable a better understanding of disease complexity in rare disorders. Digenic inheritance is the simplest instance of a non-Mendelian disorder, characterized by the functional interplay of variants in two disease-contributing genes. Known digenic disease causes show a range of pathomechanisms underlying digenic interplay, including direct and indirect gene product interactions as well as epigenetic modifications. This review aims to systematically explore the background of digenic inheritance in rare disorders, the approaches and challenges when investigating digenic inheritance, and the current evidence for digenic inheritance in mitochondrial disorders.
Collapse
Affiliation(s)
- Christiane M. Neuhofer
- Institute of Human Genetics, University Medical Center, Technical University of Munich, Trogerstr. 32, 81675 Munich, Germany
- Institute of Neurogenomics, Computational Health Center, Helmholtz Centre Munich Neuherberg, Ingolstädter Landstraße 1, 85764 Oberschleißheim, Germany
- Institute of Human Genetics, Salzburger Landeskliniken, University Hospital of the Paracelsus Medical University, Müllner Hauptstraße 48, 5020 Salzburg, Austria
| | - Holger Prokisch
- Institute of Human Genetics, University Medical Center, Technical University of Munich, Trogerstr. 32, 81675 Munich, Germany
- Institute of Neurogenomics, Computational Health Center, Helmholtz Centre Munich Neuherberg, Ingolstädter Landstraße 1, 85764 Oberschleißheim, Germany
| |
Collapse
|
2
|
Gravel B, Renaux A, Papadimitriou S, Smits G, Nowé A, Lenaerts T. Prioritization of oligogenic variant combinations in whole exomes. Bioinformatics 2024; 40:btae184. [PMID: 38603604 PMCID: PMC11037482 DOI: 10.1093/bioinformatics/btae184] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2023] [Revised: 01/29/2024] [Accepted: 04/10/2024] [Indexed: 04/13/2024] Open
Abstract
MOTIVATION Whole exome sequencing (WES) has emerged as a powerful tool for genetic research, enabling the collection of a tremendous amount of data about human genetic variation. However, properly identifying which variants are causative of a genetic disease remains an important challenge, often due to the number of variants that need to be screened. Expanding the screening to combinations of variants in two or more genes, as would be required under the oligogenic inheritance model, simply blows this problem out of proportion. RESULTS We present here the High-throughput oligogenic prioritizer (Hop), a novel prioritization method that uses direct oligogenic information at the variant, gene and gene pair level to detect digenic variant combinations in WES data. This method leverages information from a knowledge graph, together with specialized pathogenicity predictions in order to effectively rank variant combinations based on how likely they are to explain the patient's phenotype. The performance of Hop is evaluated in cross-validation on 36 120 synthetic exomes for training and 14 280 additional synthetic exomes for independent testing. Whereas the known pathogenic variant combinations are found in the top 20 in approximately 60% of the cross-validation exomes, 71% are found in the same ranking range when considering the independent set. These results provide a significant improvement over alternative approaches that depend simply on a monogenic assessment of pathogenicity, including early attempts for digenic ranking using monogenic pathogenicity scores. AVAILABILITY AND IMPLEMENTATION Hop is available at https://github.com/oligogenic/HOP.
Collapse
Affiliation(s)
- Barbara Gravel
- Interuniversity Institute of Bioinformatics in Brussels, Université Libre de Bruxelles-Vrije Universiteit Brussel, 1050 Brussels, Belgium
- Department of Computer Science, Machine Learning Group, Université Libre de Bruxelles, 1050 Brussels, Belgium
- Department of Computer Science, Artificial Intelligence Laboratory, Vrije Universiteit Brussels, 1050 Brussels, Belgium
| | - Alexandre Renaux
- Interuniversity Institute of Bioinformatics in Brussels, Université Libre de Bruxelles-Vrije Universiteit Brussel, 1050 Brussels, Belgium
- Department of Computer Science, Machine Learning Group, Université Libre de Bruxelles, 1050 Brussels, Belgium
- Department of Computer Science, Artificial Intelligence Laboratory, Vrije Universiteit Brussels, 1050 Brussels, Belgium
| | - Sofia Papadimitriou
- Interuniversity Institute of Bioinformatics in Brussels, Université Libre de Bruxelles-Vrije Universiteit Brussel, 1050 Brussels, Belgium
- Department of Computer Science, Machine Learning Group, Université Libre de Bruxelles, 1050 Brussels, Belgium
- Brussels Interuniversity Genomics High Throughput core (BRIGHTcore), UZ Brussel, Vrije Universiteit Brussel (VUB) - Université Libre de Bruxelles (ULB), 1090 Brussels, Belgium
| | - Guillaume Smits
- Interuniversity Institute of Bioinformatics in Brussels, Université Libre de Bruxelles-Vrije Universiteit Brussel, 1050 Brussels, Belgium
- Center of Human Genetics, Hôpital Erasme, Hôpital Universitaire de Bruxelles, Université Libre de Bruxelles, 1070 Brussels, Belgium
| | - Ann Nowé
- Interuniversity Institute of Bioinformatics in Brussels, Université Libre de Bruxelles-Vrije Universiteit Brussel, 1050 Brussels, Belgium
- Department of Computer Science, Artificial Intelligence Laboratory, Vrije Universiteit Brussels, 1050 Brussels, Belgium
| | - Tom Lenaerts
- Interuniversity Institute of Bioinformatics in Brussels, Université Libre de Bruxelles-Vrije Universiteit Brussel, 1050 Brussels, Belgium
- Department of Computer Science, Machine Learning Group, Université Libre de Bruxelles, 1050 Brussels, Belgium
- Department of Computer Science, Artificial Intelligence Laboratory, Vrije Universiteit Brussels, 1050 Brussels, Belgium
| |
Collapse
|
3
|
Long P, Wang L, Tan H, Quan R, Hu Z, Zeng M, Deng Z, Huang H, Greenbaum J, Deng H, Xiao H. Oligogenic basis of premature ovarian insufficiency: an observational study. J Ovarian Res 2024; 17:32. [PMID: 38310280 PMCID: PMC10837925 DOI: 10.1186/s13048-024-01351-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2023] [Accepted: 01/13/2024] [Indexed: 02/05/2024] Open
Abstract
BACKGROUND The etiology of premature ovarian insufficiency, that is, the loss of ovarian activity before 40 years of age, is complex. Studies suggest that genetic factors are involved in 20-25% of cases. The aim of this study was to explore the oligogenic basis of premature ovarian insufficiency. RESULTS Whole-exome sequencing of 93 patients with POI and whole-genome sequencing of 465 controls were performed. In the gene-burden analysis, multiple genetic variants, including those associated with DNA damage repair and meiosis, were more common in participants with premature ovarian insufficiency than in controls. The ORVAL-platform analysis confirmed the pathogenicity of the RAD52 and MSH6 combination. CONCLUSIONS The results of this study indicate that oligogenic inheritance is an important cause of premature ovarian insufficiency and provide insights into the biological mechanisms underlying premature ovarian insufficiency.
Collapse
Affiliation(s)
- Panpan Long
- Institute of Reproductive & Stem Cell Engineering, School of Basic Medical Science, Central South University, 88 Xiangya Road, Changsha, 410008, Hunan, China
- Center of Reproductive Health, School of Basic Medical Science, Central South University, Changsha, China
| | - Le Wang
- Institute of Reproductive & Stem Cell Engineering, School of Basic Medical Science, Central South University, 88 Xiangya Road, Changsha, 410008, Hunan, China
- Center of Reproductive Health, School of Basic Medical Science, Central South University, Changsha, China
- Biomedical Research Center, Hunan University of Medicine, Huaihua, China
| | - Hangjing Tan
- Institute of Reproductive & Stem Cell Engineering, School of Basic Medical Science, Central South University, 88 Xiangya Road, Changsha, 410008, Hunan, China
- Center of Reproductive Health, School of Basic Medical Science, Central South University, Changsha, China
| | - Ruping Quan
- Institute of Reproductive & Stem Cell Engineering, School of Basic Medical Science, Central South University, 88 Xiangya Road, Changsha, 410008, Hunan, China
- Center of Reproductive Health, School of Basic Medical Science, Central South University, Changsha, China
| | - Zihao Hu
- Institute of Reproductive & Stem Cell Engineering, School of Basic Medical Science, Central South University, 88 Xiangya Road, Changsha, 410008, Hunan, China
- Center of Reproductive Health, School of Basic Medical Science, Central South University, Changsha, China
| | - Minghua Zeng
- Institute of Reproductive & Stem Cell Engineering, School of Basic Medical Science, Central South University, 88 Xiangya Road, Changsha, 410008, Hunan, China
- Center of Reproductive Health, School of Basic Medical Science, Central South University, Changsha, China
| | - Ziheng Deng
- Institute of Reproductive & Stem Cell Engineering, School of Basic Medical Science, Central South University, 88 Xiangya Road, Changsha, 410008, Hunan, China
- Center of Reproductive Health, School of Basic Medical Science, Central South University, Changsha, China
| | - Hualin Huang
- Reproductive Medicine Center, Department of Obstetrics and Gynecology, The Second Xiangya Hospital, Central South University, Changsha, China
| | - Jonathan Greenbaum
- Center of Biomedical Informatics and Genomics, Deming Department of Medicine, Tulane University School of Medicine, New Orleans, LA, USA
| | - Hongwen Deng
- Center of Biomedical Informatics and Genomics, Deming Department of Medicine, Tulane University School of Medicine, New Orleans, LA, USA
| | - Hongmei Xiao
- Institute of Reproductive & Stem Cell Engineering, School of Basic Medical Science, Central South University, 88 Xiangya Road, Changsha, 410008, Hunan, China.
- Center of Reproductive Health, School of Basic Medical Science, Central South University, Changsha, China.
| |
Collapse
|
4
|
Versbraegen N, Gravel B, Nachtegael C, Renaux A, Verkinderen E, Nowé A, Lenaerts T, Papadimitriou S. Faster and more accurate pathogenic combination predictions with VarCoPP2.0. BMC Bioinformatics 2023; 24:179. [PMID: 37127601 PMCID: PMC10152795 DOI: 10.1186/s12859-023-05291-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2022] [Accepted: 04/14/2023] [Indexed: 05/03/2023] Open
Abstract
BACKGROUND The prediction of potentially pathogenic variant combinations in patients remains a key task in the field of medical genetics for the understanding and detection of oligogenic/multilocus diseases. Models tailored towards such cases can help shorten the gap of missing diagnoses and can aid researchers in dealing with the high complexity of the derived data. The predictor VarCoPP (Variant Combinations Pathogenicity Predictor) that was published in 2019 and identified potentially pathogenic variant combinations in gene pairs (bilocus variant combinations), was the first important step in this direction. Despite its usefulness and applicability, several issues still remained that hindered a better performance, such as its False Positive (FP) rate, the quality of its training set and its complex architecture. RESULTS We present VarCoPP2.0: the successor of VarCoPP that is a simplified, faster and more accurate predictive model identifying potentially pathogenic bilocus variant combinations. Results from cross-validation and on independent data sets reveal that VarCoPP2.0 has improved in terms of both sensitivity (95% in cross-validation and 98% during testing) and specificity (5% FP rate). At the same time, its running time shows a significant 150-fold decrease due to the selection of a simpler Balanced Random Forest model. Its positive training set now consists of variant combinations that are more confidently linked with evidence of pathogenicity, based on the confidence scores present in OLIDA, the Oligogenic Diseases Database ( https://olida.ibsquare.be ). The improvement of its performance is also attributed to a more careful selection of up-to-date features identified via an original wrapper method. We show that the combination of different variant and gene pair features together is important for predictions, highlighting the usefulness of integrating biological information at different levels. CONCLUSIONS Through its improved performance and faster execution time, VarCoPP2.0 enables a more accurate analysis of larger data sets linked to oligogenic diseases. Users can access the ORVAL platform ( https://orval.ibsquare.be ) to apply VarCoPP2.0 on their data.
Collapse
Affiliation(s)
- Nassim Versbraegen
- Machine Learning Group, Université Libre de Bruxelles, 1050, Brussels, Belgium.
- Interuniversity Institute of Bioinformatics in Brussels, Université Libre de Bruxelles-Vrije Universiteit Brussel, 1050, Brussels, Belgium.
| | - Barbara Gravel
- Machine Learning Group, Université Libre de Bruxelles, 1050, Brussels, Belgium
- Interuniversity Institute of Bioinformatics in Brussels, Université Libre de Bruxelles-Vrije Universiteit Brussel, 1050, Brussels, Belgium
- Artificial Intelligence Laboratory, Vrije Universiteit Brussel, 1050, Brussels, Belgium
| | - Charlotte Nachtegael
- Machine Learning Group, Université Libre de Bruxelles, 1050, Brussels, Belgium
- Interuniversity Institute of Bioinformatics in Brussels, Université Libre de Bruxelles-Vrije Universiteit Brussel, 1050, Brussels, Belgium
| | - Alexandre Renaux
- Machine Learning Group, Université Libre de Bruxelles, 1050, Brussels, Belgium
- Interuniversity Institute of Bioinformatics in Brussels, Université Libre de Bruxelles-Vrije Universiteit Brussel, 1050, Brussels, Belgium
- Artificial Intelligence Laboratory, Vrije Universiteit Brussel, 1050, Brussels, Belgium
| | - Emma Verkinderen
- Machine Learning Group, Université Libre de Bruxelles, 1050, Brussels, Belgium
- Interuniversity Institute of Bioinformatics in Brussels, Université Libre de Bruxelles-Vrije Universiteit Brussel, 1050, Brussels, Belgium
| | - Ann Nowé
- Interuniversity Institute of Bioinformatics in Brussels, Université Libre de Bruxelles-Vrije Universiteit Brussel, 1050, Brussels, Belgium
- Artificial Intelligence Laboratory, Vrije Universiteit Brussel, 1050, Brussels, Belgium
| | - Tom Lenaerts
- Machine Learning Group, Université Libre de Bruxelles, 1050, Brussels, Belgium
- Interuniversity Institute of Bioinformatics in Brussels, Université Libre de Bruxelles-Vrije Universiteit Brussel, 1050, Brussels, Belgium
- Artificial Intelligence Laboratory, Vrije Universiteit Brussel, 1050, Brussels, Belgium
| | - Sofia Papadimitriou
- Machine Learning Group, Université Libre de Bruxelles, 1050, Brussels, Belgium
- Interuniversity Institute of Bioinformatics in Brussels, Université Libre de Bruxelles-Vrije Universiteit Brussel, 1050, Brussels, Belgium
- Artificial Intelligence Laboratory, Vrije Universiteit Brussel, 1050, Brussels, Belgium
| |
Collapse
|
5
|
New Developments and Possibilities in Reanalysis and Reinterpretation of Whole Exome Sequencing Datasets for Unsolved Rare Diseases Using Machine Learning Approaches. Int J Mol Sci 2022; 23:ijms23126792. [PMID: 35743235 PMCID: PMC9224427 DOI: 10.3390/ijms23126792] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2022] [Revised: 06/13/2022] [Accepted: 06/15/2022] [Indexed: 11/21/2022] Open
Abstract
Rare diseases impact the lives of 300 million people in the world. Rapid advances in bioinformatics and genomic technologies have enabled the discovery of causes of 20–30% of rare diseases. However, most rare diseases have remained as unsolved enigmas to date. Newer tools and availability of high throughput sequencing data have enabled the reanalysis of previously undiagnosed patients. In this review, we have systematically compiled the latest developments in the discovery of the genetic causes of rare diseases using machine learning methods. Importantly, we have detailed methods available to reanalyze existing whole exome sequencing data of unsolved rare diseases. We have identified different reanalysis methodologies to solve problems associated with sequence alterations/mutations, variation re-annotation, protein stability, splice isoform malfunctions and oligogenic analysis. In addition, we give an overview of new developments in the field of rare disease research using whole genome sequencing data and other omics.
Collapse
|
6
|
Recent innovations and in-depth aspects of post-genome wide association study (Post-GWAS) to understand the genetic basis of complex phenotypes. Heredity (Edinb) 2021; 127:485-497. [PMID: 34689168 PMCID: PMC8626474 DOI: 10.1038/s41437-021-00479-w] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2020] [Revised: 10/13/2021] [Accepted: 10/13/2021] [Indexed: 12/13/2022] Open
Abstract
In the past decade, the high throughput and low cost of sequencing/genotyping approaches have led to the accumulation of a large amount of data from genome-wide association studies (GWASs). The first aim of this review is to highlight how post-GWAS analysis can be used make sense of the obtained associations. Novel directions for integrating GWAS results with other resources, such as somatic mutation, metabolite-transcript, and transcriptomic data, are also discussed; these approaches can help us move beyond each individual data point and provide valuable information about complex trait genetics. In addition, cross-phenotype association tests, when the loci detected by GWASs have significant associations with multiple traits, are reviewed to provide biologically informative results for use in real-time applications. This review also discusses the challenges of identifying interactions between genetic mutations (epistasis) and mutations of loci affecting more than one trait (pleiotropy) as underlying causes of cross-phenotype associations; these challenges can be overcome using post-GWAS analysis. Genetic similarities between phenotypes that can be revealed using post-GWAS analysis are also discussed. In summary, different methodologies of post-GWAS analysis are now available, enhancing the value of information obtained from GWAS results, and facilitating application in both humans and nonhuman species. However, precise methods still need to be developed to overcome challenges in the field and uncover the genetic underpinnings of complex traits.
Collapse
|
7
|
Mukherjee S, Cogan JD, Newman JH, Phillips JA, Hamid R, Meiler J, Capra JA. Identifying digenic disease genes via machine learning in the Undiagnosed Diseases Network. Am J Hum Genet 2021; 108:1946-1963. [PMID: 34529933 PMCID: PMC8546038 DOI: 10.1016/j.ajhg.2021.08.010] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2020] [Accepted: 08/25/2021] [Indexed: 12/20/2022] Open
Abstract
Rare diseases affect millions of people worldwide, and discovering their genetic causes is challenging. More than half of the individuals analyzed by the Undiagnosed Diseases Network (UDN) remain undiagnosed. The central hypothesis of this work is that many of these rare genetic disorders are caused by multiple variants in more than one gene. However, given the large number of variants in each individual genome, experimentally evaluating combinations of variants for potential to cause disease is currently infeasible. To address this challenge, we developed the digenic predictor (DiGePred), a random forest classifier for identifying candidate digenic disease gene pairs by features derived from biological networks, genomics, evolutionary history, and functional annotations. We trained the DiGePred classifier by using DIDA, the largest available database of known digenic-disease-causing gene pairs, and several sets of non-digenic gene pairs, including variant pairs derived from unaffected relatives of UDN individuals. DiGePred achieved high precision and recall in cross-validation and on a held-out test set (PR area under the curve > 77%), and we further demonstrate its utility by using digenic pairs from the recent literature. In contrast to other approaches, DiGePred also appropriately controls the number of false positives when applied in realistic clinical settings. Finally, to enable the rapid screening of variant gene pairs for digenic disease potential, we freely provide the predictions of DiGePred on all human gene pairs. Our work enables the discovery of genetic causes for rare non-monogenic diseases by providing a means to rapidly evaluate variant gene pairs for the potential to cause digenic disease.
Collapse
Affiliation(s)
- Souhrid Mukherjee
- Department of Biological Sciences, Vanderbilt University, Nashville, TN 37235, USA
| | - Joy D Cogan
- Department of Pediatrics, Division of Medical Genetics and Genomic Medicine, Vanderbilt University School of Medicine, Nashville, TN 37232, USA
| | - John H Newman
- Pulmonary Hypertension Center, Division of Allergy, Pulmonary, and Critical Care Medicine, Vanderbilt University Medical Center, Nashville, TN 37232, USA
| | - John A Phillips
- Department of Pediatrics, Division of Medical Genetics and Genomic Medicine, Vanderbilt University School of Medicine, Nashville, TN 37232, USA
| | - Rizwan Hamid
- Department of Pediatrics, Division of Medical Genetics and Genomic Medicine, Vanderbilt University School of Medicine, Nashville, TN 37232, USA
| | - Jens Meiler
- Department of Chemistry, Vanderbilt University, Nashville, TN 37235, USA; Department of Pharmacology, Vanderbilt University, Nashville, TN 37235, USA; Center for Structural Biology, Vanderbilt University, Nashville, TN 37235, USA; Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN 37232, USA; Institute for Drug Discovery, Leipzig University Medical School, Leipzig 04103, Germany; Department of Chemistry, Leipzig University, Leipzig 04109, Germany; Department of Computer Science, Leipzig University, Leipzig 04109, Germany.
| | - John A Capra
- Department of Biological Sciences, Vanderbilt University, Nashville, TN 37235, USA; Center for Structural Biology, Vanderbilt University, Nashville, TN 37235, USA; Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN 37232, USA; Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN 37232, USA; Bakar Computational Health Sciences Institute and Department of Epidemiology and Biostatistics, University of California, San Francisco, San Francisco, CA 94143, USA.
| |
Collapse
|
8
|
Kafkas Ş, Althubaiti S, Gkoutos GV, Hoehndorf R, Schofield PN. Linking common human diseases to their phenotypes; development of a resource for human phenomics. J Biomed Semantics 2021; 12:17. [PMID: 34425897 PMCID: PMC8383460 DOI: 10.1186/s13326-021-00249-x] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2021] [Accepted: 07/30/2021] [Indexed: 11/11/2022] Open
Abstract
Background In recent years a large volume of clinical genomics data has become available due to rapid advances in sequencing technologies. Efficient exploitation of this genomics data requires linkage to patient phenotype profiles. Current resources providing disease-phenotype associations are not comprehensive, and they often do not have broad coverage of the disease terminologies, particularly ICD-10, which is still the primary terminology used in clinical settings. Methods We developed two approaches to gather disease-phenotype associations. First, we used a text mining method that utilizes semantic relations in phenotype ontologies, and applies statistical methods to extract associations between diseases in ICD-10 and phenotype ontology classes from the literature. Second, we developed a semi-automatic way to collect ICD-10–phenotype associations from existing resources containing known relationships. Results We generated four datasets. Two of them are independent datasets linking diseases to their phenotypes based on text mining and semi-automatic strategies. The remaining two datasets are generated from these datasets and cover a subset of ICD-10 classes of common diseases contained in UK Biobank. We extensively validated our text mined and semi-automatically curated datasets by: comparing them against an expert-curated validation dataset containing disease–phenotype associations, measuring their similarity to disease–phenotype associations found in public databases, and assessing how well they could be used to recover gene–disease associations using phenotype similarity. Conclusion We find that our text mining method can produce phenotype annotations of diseases that are correct but often too general to have significant information content, or too specific to accurately reflect the typical manifestations of the sporadic disease. On the other hand, the datasets generated from integrating multiple knowledgebases are more complete (i.e., cover more of the required phenotype annotations for a given disease). We make all data freely available at 10.5281/zenodo.4726713. Supplementary Information The online version contains supplementary material available at (10.1186/s13326-021-00249-x).
Collapse
Affiliation(s)
- Şenay Kafkas
- Computational Bioscience Research Center (CBRC), Computer, Electrical, and Mathematical Sciences & Engineering Division, King Abdullah University of Science and Technology, 4700 KAUST, Thuwal, 23955, Saudi Arabia
| | - Sara Althubaiti
- Computational Bioscience Research Center (CBRC), Computer, Electrical, and Mathematical Sciences & Engineering Division, King Abdullah University of Science and Technology, 4700 KAUST, Thuwal, 23955, Saudi Arabia
| | - Georgios V Gkoutos
- Health Data Research UK, Midlands site, Edgbaston, Birmingham, B15 2TT, United Kingdom.,Institute of Cancer and Genomic Sciences, University of Birmingham, Edgbaston, Birmingham, B15 2TT, United Kingdom
| | - Robert Hoehndorf
- Computational Bioscience Research Center (CBRC), Computer, Electrical, and Mathematical Sciences & Engineering Division, King Abdullah University of Science and Technology, 4700 KAUST, Thuwal, 23955, Saudi Arabia.
| | - Paul N Schofield
- Department of Physiology, Development & Neuroscience, University of Cambridge, Downing Street, Cambridge, CB2 3EG, United Kingdom
| |
Collapse
|
9
|
Rahit KMTH, Tarailo-Graovac M. Genetic Modifiers and Rare Mendelian Disease. Genes (Basel) 2020; 11:E239. [PMID: 32106447 PMCID: PMC7140819 DOI: 10.3390/genes11030239] [Citation(s) in RCA: 83] [Impact Index Per Article: 20.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2020] [Accepted: 02/21/2020] [Indexed: 12/11/2022] Open
Abstract
Despite advances in high-throughput sequencing that have revolutionized the discovery of gene defects in rare Mendelian diseases, there are still gaps in translating individual genome variation to observed phenotypic outcomes. While we continue to improve genomics approaches to identify primary disease-causing variants, it is evident that no genetic variant acts alone. In other words, some other variants in the genome (genetic modifiers) may alleviate (suppress) or exacerbate (enhance) the severity of the disease, resulting in the variability of phenotypic outcomes. Thus, to truly understand the disease, we need to consider how the disease-causing variants interact with the rest of the genome in an individual. Here, we review the current state-of-the-field in the identification of genetic modifiers in rare Mendelian diseases and discuss the potential for future approaches that could bridge the existing gap.
Collapse
Affiliation(s)
- K. M. Tahsin Hassan Rahit
- Departments of Biochemistry, Molecular Biology and Medical Genetics, Cumming School of Medicine, University of Calgary, Calgary, AB T2N 4N1, Canada;
- Alberta Children’s Hospital Research Institute, University of Calgary, Calgary, AB T2N 4N1, Canada
| | - Maja Tarailo-Graovac
- Departments of Biochemistry, Molecular Biology and Medical Genetics, Cumming School of Medicine, University of Calgary, Calgary, AB T2N 4N1, Canada;
- Alberta Children’s Hospital Research Institute, University of Calgary, Calgary, AB T2N 4N1, Canada
| |
Collapse
|