1
|
Barbitoff YA, Ushakov MO, Lazareva TE, Nasykhova YA, Glotov AS, Predeus AV. Bioinformatics of germline variant discovery for rare disease diagnostics: current approaches and remaining challenges. Brief Bioinform 2024; 25:bbad508. [PMID: 38271481 PMCID: PMC10810331 DOI: 10.1093/bib/bbad508] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2023] [Revised: 11/18/2023] [Accepted: 12/12/2023] [Indexed: 01/27/2024] Open
Abstract
Next-generation sequencing (NGS) has revolutionized the field of rare disease diagnostics. Whole exome and whole genome sequencing are now routinely used for diagnostic purposes; however, the overall diagnosis rate remains lower than expected. In this work, we review current approaches used for calling and interpretation of germline genetic variants in the human genome, and discuss the most important challenges that persist in the bioinformatic analysis of NGS data in medical genetics. We describe and attempt to quantitatively assess the remaining problems, such as the quality of the reference genome sequence, reproducible coverage biases, or variant calling accuracy in complex regions of the genome. We also discuss the prospects of switching to the complete human genome assembly or the human pan-genome and important caveats associated with such a switch. We touch on arguably the hardest problem of NGS data analysis for medical genomics, namely, the annotation of genetic variants and their subsequent interpretation. We highlight the most challenging aspects of annotation and prioritization of both coding and non-coding variants. Finally, we demonstrate the persistent prevalence of pathogenic variants in the coding genome, and outline research directions that may enhance the efficiency of NGS-based disease diagnostics.
Collapse
Affiliation(s)
- Yury A Barbitoff
- Dpt. of Genomic Medicine, D.O. Ott Research Institute of Obstetrics, Gynaecology, and Reproductology, Mendeleevskaya line 3, 199034, St. Petersburg, Russia
- Bioinformatics Institute, Kentemirovskaya st. 2A, 197342, St. Petersburg, Russia
| | - Mikhail O Ushakov
- Dpt. of Genomic Medicine, D.O. Ott Research Institute of Obstetrics, Gynaecology, and Reproductology, Mendeleevskaya line 3, 199034, St. Petersburg, Russia
| | - Tatyana E Lazareva
- Dpt. of Genomic Medicine, D.O. Ott Research Institute of Obstetrics, Gynaecology, and Reproductology, Mendeleevskaya line 3, 199034, St. Petersburg, Russia
| | - Yulia A Nasykhova
- Dpt. of Genomic Medicine, D.O. Ott Research Institute of Obstetrics, Gynaecology, and Reproductology, Mendeleevskaya line 3, 199034, St. Petersburg, Russia
| | - Andrey S Glotov
- Dpt. of Genomic Medicine, D.O. Ott Research Institute of Obstetrics, Gynaecology, and Reproductology, Mendeleevskaya line 3, 199034, St. Petersburg, Russia
| | - Alexander V Predeus
- Bioinformatics Institute, Kentemirovskaya st. 2A, 197342, St. Petersburg, Russia
| |
Collapse
|
2
|
Degalez F, Jehl F, Muret K, Bernard M, Lecerf F, Lagoutte L, Désert C, Pitel F, Klopp C, Lagarrigue S. Watch Out for a Second SNP: Focus on Multi-Nucleotide Variants in Coding Regions and Rescued Stop-Gained. Front Genet 2021; 12:659287. [PMID: 34306009 PMCID: PMC8293744 DOI: 10.3389/fgene.2021.659287] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2021] [Accepted: 05/27/2021] [Indexed: 12/30/2022] Open
Abstract
Most single-nucleotide polymorphisms (SNPs) are located in non-coding regions, but the fraction usually studied is harbored in protein-coding regions because potential impacts on proteins are relatively easy to predict by popular tools such as the Variant Effect Predictor. These tools annotate variants independently without considering the potential effect of grouped or haplotypic variations, often called "multi-nucleotide variants" (MNVs). Here, we used a large RNA-seq dataset to survey MNVs, comprising 382 chicken samples originating from 11 populations analyzed in the companion paper in which 9.5M SNPs- including 3.3M SNPs with reliable genotypes-were detected. We focused our study on in-codon MNVs and evaluate their potential mis-annotation. Using GATK HaplotypeCaller read-based phasing results, we identified 2,965 MNVs observed in at least five individuals located in 1,792 genes. We found 41.1% of them showing a novel impact when compared to the effect of their constituent SNPs analyzed separately. The biggest impact variation flux concerns the originally annotated stop-gained consequences, for which around 95% were rescued; this flux is followed by the missense consequences for which 37% were reannotated with a different amino acid. We then present in more depth the rescued stop-gained MNVs and give an illustration in the SLC27A4 gene. As previously shown in human datasets, our results in chicken demonstrate the value of haplotype-aware variant annotation, and the interest to consider MNVs in the coding region, particularly when searching for severe functional consequence such as stop-gained variants.
Collapse
Affiliation(s)
- Fabien Degalez
- INRAE, INSTITUT AGRO, PEGASE UMR 1348, Saint-Gilles, France
| | - Frédéric Jehl
- INRAE, INSTITUT AGRO, PEGASE UMR 1348, Saint-Gilles, France
| | - Kévin Muret
- INRAE, INSTITUT AGRO, PEGASE UMR 1348, Saint-Gilles, France
| | - Maria Bernard
- INRAE, SIGENAE, Genotoul Bioinfo MIAT, Castanet-Tolosan, France.,INRAE, AgroParisTech, Université Paris-Saclay, GABI UMR 1313, Jouy-en-Josas, France
| | | | | | - Colette Désert
- INRAE, INSTITUT AGRO, PEGASE UMR 1348, Saint-Gilles, France
| | - Frédérique Pitel
- INRAE, INPT, ENVT, Université de Toulouse, GenPhySE UMR 1388, Castanet-Tolosan, France
| | | | | |
Collapse
|
3
|
Whole-genome analysis of noncoding genetic variations identifies multiscale regulatory element perturbations associated with Hirschsprung disease. Genome Res 2020; 30:1618-1632. [PMID: 32948616 PMCID: PMC7605255 DOI: 10.1101/gr.264473.120] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2020] [Accepted: 09/14/2020] [Indexed: 12/16/2022]
Abstract
It is widely recognized that noncoding genetic variants play important roles in many human diseases, but there are multiple challenges that hinder the identification of functional disease-associated noncoding variants. The number of noncoding variants can be many times that of coding variants; many of them are not functional but in linkage disequilibrium with the functional ones; different variants can have epistatic effects; different variants can affect the same genes or pathways in different individuals; and some variants are related to each other not by affecting the same gene but by affecting the binding of the same upstream regulator. To overcome these difficulties, we propose a novel analysis framework that considers convergent impacts of different genetic variants on protein binding, which provides multiscale information about disease-associated perturbations of regulatory elements, genes, and pathways. Applying it to our whole-genome sequencing data of 918 short-segment Hirschsprung disease patients and matched controls, we identify various novel genes not detected by standard single-variant and region-based tests, functionally centering on neural crest migration and development. Our framework also identifies upstream regulators whose binding is influenced by the noncoding variants. Using human neural crest cells, we confirm cell stage-specific regulatory roles of three top novel regulatory elements on our list, respectively in the RET, RASGEF1A, and PIK3C2B loci. In the PIK3C2B regulatory element, we further show that a noncoding variant found only in the patients affects the binding of the gliogenesis regulator NFIA, with a corresponding up-regulation of multiple genes in the same topologically associating domain.
Collapse
|
4
|
Wang Q, Pierce-Hoffman E, Cummings BB, Alföldi J, Francioli LC, Gauthier LD, Hill AJ, O'Donnell-Luria AH, Karczewski KJ, MacArthur DG. Landscape of multi-nucleotide variants in 125,748 human exomes and 15,708 genomes. Nat Commun 2020; 11:2539. [PMID: 32461613 PMCID: PMC7253413 DOI: 10.1038/s41467-019-12438-5] [Citation(s) in RCA: 84] [Impact Index Per Article: 21.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2019] [Accepted: 09/09/2019] [Indexed: 12/31/2022] Open
Abstract
Multi-nucleotide variants (MNVs), defined as two or more nearby variants existing on the same haplotype in an individual, are a clinically and biologically important class of genetic variation. However, existing tools typically do not accurately classify MNVs, and understanding of their mutational origins remains limited. Here, we systematically survey MNVs in 125,748 whole exomes and 15,708 whole genomes from the Genome Aggregation Database (gnomAD). We identify 1,792,248 MNVs across the genome with constituent variants falling within 2 bp distance of one another, including 18,756 variants with a novel combined effect on protein sequence. Finally, we estimate the relative impact of known mutational mechanisms - CpG deamination, replication error by polymerase zeta, and polymerase slippage at repeat junctions - on the generation of MNVs. Our results demonstrate the value of haplotype-aware variant annotation, and refine our understanding of genome-wide mutational mechanisms of MNVs.
Collapse
Affiliation(s)
- Qingbo Wang
- Program in Medical and Population Genetics, The Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, 02114, USA
- Program in Bioinformatics and Integrative Genomics, Harvard Medical School, Boston, MA, 02115, USA
| | - Emma Pierce-Hoffman
- Program in Medical and Population Genetics, The Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA
| | - Beryl B Cummings
- Program in Medical and Population Genetics, The Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, 02114, USA
- Program in Biomedical and Biological Sciences, Harvard Medical School, Boston, MA, 02115, USA
| | - Jessica Alföldi
- Program in Medical and Population Genetics, The Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, 02114, USA
| | - Laurent C Francioli
- Program in Medical and Population Genetics, The Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, 02114, USA
| | - Laura D Gauthier
- Program in Medical and Population Genetics, The Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA
- Data Sciences Platform, Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA
| | - Andrew J Hill
- Program in Medical and Population Genetics, The Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA
- Department of Genome Sciences, University of Washington, Seattle, WA, 98195, USA
| | - Anne H O'Donnell-Luria
- Program in Medical and Population Genetics, The Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, 02114, USA
| | - Konrad J Karczewski
- Program in Medical and Population Genetics, The Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, 02114, USA
| | - Daniel G MacArthur
- Program in Medical and Population Genetics, The Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA.
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, 02114, USA.
- Centre for Population Genomics, Garvan Institute of Medical Research, and UNSW Sydney, Sydney, Australia.
- Centre for Population Genomics, Murdoch Children's Research Institute, Melbourne, Australia.
| |
Collapse
|
5
|
Cheng SJ, Jiang S, Shi FY, Ding Y, Gao G. Systematic identification and annotation of multiple-variant compound effects at transcription factor binding sites in human genome. J Genet Genomics 2018; 45:373-379. [PMID: 30054217 DOI: 10.1016/j.jgg.2018.05.005] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2017] [Revised: 05/03/2018] [Accepted: 05/25/2018] [Indexed: 12/21/2022]
Abstract
Understanding the functional effects of genetic variants is crucial in modern genomics and genetics. Transcription factor binding sites (TFBSs) are one of the most important cis-regulatory elements. While multiple tools have been developed to assess functional effects of genetic variants at TFBSs, they usually assume that each variant works in isolation and neglect the potential "interference" among multiple variants within the same TFBS. In this study, we presented COPE-TFBS (Context-Oriented Predictor for variant Effect on Transcription Factor Binding Site), a novel method that considers sequence context to accurately predict variant effects on TFBSs. We systematically re-analyzed the sequencing data from both the 1000 Genomes Project and the Genotype-Tissue Expression (GTEx) Project via COPE-TFBS, and identified numbers of novel TFBSs, transformed TFBSs and discordantly annotated TFBSs resulting from multiple variants, further highlighting the necessity of sequence context in accurately annotating genetic variants. COPE-TFBS is freely available for academic use at http://cope.cbi.pku.edu.cn/.
Collapse
Affiliation(s)
- Si-Jin Cheng
- State Key Laboratory of Protein and Plant Gene Research, School of Life Sciences, Center for Bioinformatics, Peking University, Beijing 100871, China; Beijing Advanced Innovation Center for Genomics, Peking University, Beijing 100871, China
| | - Shuai Jiang
- State Key Laboratory of Protein and Plant Gene Research, School of Life Sciences, Center for Bioinformatics, Peking University, Beijing 100871, China; Beijing Advanced Innovation Center for Genomics, Peking University, Beijing 100871, China
| | - Fang-Yuan Shi
- State Key Laboratory of Protein and Plant Gene Research, School of Life Sciences, Center for Bioinformatics, Peking University, Beijing 100871, China; Beijing Advanced Innovation Center for Genomics, Peking University, Beijing 100871, China
| | - Yang Ding
- State Key Laboratory of Protein and Plant Gene Research, School of Life Sciences, Center for Bioinformatics, Peking University, Beijing 100871, China; Beijing Advanced Innovation Center for Genomics, Peking University, Beijing 100871, China
| | - Ge Gao
- State Key Laboratory of Protein and Plant Gene Research, School of Life Sciences, Center for Bioinformatics, Peking University, Beijing 100871, China; Beijing Advanced Innovation Center for Genomics, Peking University, Beijing 100871, China.
| |
Collapse
|
6
|
Duan Y, Dou S, Zhang H, Wu C, Wu M, Lu J. Linkage of A-to-I RNA Editing in Metazoans and the Impact on Genome Evolution. Mol Biol Evol 2018; 35:132-148. [PMID: 29048557 PMCID: PMC5850729 DOI: 10.1093/molbev/msx274] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
The adenosine-to-inosine (A-to-I) RNA editomes have been systematically characterized in various metazoan species, and many editing sites were found in clusters. However, it remains unclear whether the clustered editing sites tend to be linked in the same RNA molecules or not. By adopting a method originally designed to detect linkage disequilibrium of DNA mutations, we examined the editomes of ten metazoan species and detected extensive linkage of editing in Drosophila and cephalopods. The prevalent linkages of editing in these two clades, many of which are conserved between closely related species and might be associated with the adaptive proteomic recoding, are maintained by natural selection at the cost of genome evolution. Nevertheless, in worms and humans, we only detected modest proportions of linked editing events, the majority of which were not conserved. Furthermore, the linkage of editing in coding regions of worms and humans might be overall deleterious, which drives the evolution of DNA sites to escape promiscuous editing. Altogether, our results suggest that the linkage landscape of A-to-I editing has evolved during metazoan evolution. This present study also suggests that linkage of editing should be considered in elucidating the functional consequences of RNA editing.
Collapse
Affiliation(s)
- Yuange Duan
- State Key Laboratory of Protein and Plant Gene Research, Center for Bioinformatics, School of Life Sciences, Peking University, Beijing, China
- Peking-Tsinghua Center for Life Sciences, Peking University, Beijing, China
- Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, China
| | - Shengqian Dou
- State Key Laboratory of Protein and Plant Gene Research, Center for Bioinformatics, School of Life Sciences, Peking University, Beijing, China
- Peking-Tsinghua Center for Life Sciences, Peking University, Beijing, China
| | - Hong Zhang
- State Key Laboratory of Protein and Plant Gene Research, Center for Bioinformatics, School of Life Sciences, Peking University, Beijing, China
- Peking-Tsinghua Center for Life Sciences, Peking University, Beijing, China
| | - Changcheng Wu
- State Key Laboratory of Protein and Plant Gene Research, Center for Bioinformatics, School of Life Sciences, Peking University, Beijing, China
- Peking-Tsinghua Center for Life Sciences, Peking University, Beijing, China
| | - Mingming Wu
- State Key Laboratory of Protein and Plant Gene Research, Center for Bioinformatics, School of Life Sciences, Peking University, Beijing, China
- Peking-Tsinghua Center for Life Sciences, Peking University, Beijing, China
| | - Jian Lu
- State Key Laboratory of Protein and Plant Gene Research, Center for Bioinformatics, School of Life Sciences, Peking University, Beijing, China
- Peking-Tsinghua Center for Life Sciences, Peking University, Beijing, China
| |
Collapse
|
7
|
Soens ZT, Branch J, Wu S, Yuan Z, Li Y, Li H, Wang K, Xu M, Rajan L, Motta FL, Simões RT, Lopez-Solache I, Ajlan R, Birch DG, Zhao P, Porto FB, Sallum J, Koenekoop RK, Sui R, Chen R. Leveraging splice-affecting variant predictors and a minigene validation system to identify Mendelian disease-causing variants among exon-captured variants of uncertain significance. Hum Mutat 2017; 38:1521-1533. [PMID: 28714225 DOI: 10.1002/humu.23294] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2017] [Revised: 06/20/2017] [Accepted: 07/11/2017] [Indexed: 12/11/2022]
Abstract
The genetic heterogeneity of Mendelian disorders results in a significant proportion of patients that are unable to be assigned a confident molecular diagnosis after conventional exon sequencing and variant interpretation. Here, we evaluated how many patients with an inherited retinal disease (IRD) have variants of uncertain significance (VUS) that are disrupting splicing in a known IRD gene by means other than affecting the canonical dinucleotide splice site. Three in silico splice-affecting variant predictors were leveraged to annotate and prioritize variants for splicing functional validation. An in vitro minigene system was used to assay each variant's effect on splicing. Starting with 745 IRD patients lacking a confident molecular diagnosis, we validated 23 VUS as splicing variants that likely explain disease in 26 patients. Using our results, we optimized in silico score cutoffs to guide future variant interpretation. Variants that alter base pairs other than the canonical GT-AG dinucleotide are often not considered for their potential effect on RNA splicing but in silico tools and a minigene system can be utilized for the prioritization and validation of such splice-disrupting variants. These variants can be overlooked causes of human disease but can be identified using conventional exon sequencing with proper interpretation guidelines.
Collapse
Affiliation(s)
- Zachry T Soens
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas.,Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas
| | - Justin Branch
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas.,Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas
| | - Shijing Wu
- Department of Ophthalmology, Peking Union Medical College Hospital, Peking Union Medical College, Chinese Academy of Medical Sciences, Beijing, China
| | - Zhisheng Yuan
- Department of Ophthalmology, Peking Union Medical College Hospital, Peking Union Medical College, Chinese Academy of Medical Sciences, Beijing, China
| | - Yumei Li
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas.,Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas
| | - Hui Li
- Department of Ophthalmology, Peking Union Medical College Hospital, Peking Union Medical College, Chinese Academy of Medical Sciences, Beijing, China
| | - Keqing Wang
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas.,Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas
| | - Mingchu Xu
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas.,Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas
| | - Lavan Rajan
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas.,Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas
| | - Fabiana L Motta
- Department of Ophthalmology and Visual Sciences, Paulista School of Medicine, Federal University of São Paulo, São Paulo, Brazil
| | - Renata T Simões
- Department of Retina and Vitreous, Ophthalmologic Center of Minas Gerais, Belo Horizonte, Minas Gerais, Brazil.,Instituto de Ensino e Pesquisa da Santa Casa de Belo Horizonte Hospital - IEP/SCBH, Belo Horizonte, Minas Gerais, Brazil
| | - Irma Lopez-Solache
- McGill Ocular Genetics Laboratory and Centre, Department of Paediatric Surgery, Human Genetics, and Ophthalmology, McGill University Health Centre, Montreal, Quebec, Canada
| | - Radwan Ajlan
- McGill Ocular Genetics Laboratory and Centre, Department of Paediatric Surgery, Human Genetics, and Ophthalmology, McGill University Health Centre, Montreal, Quebec, Canada
| | - David G Birch
- Retina Foundation of the Southwest and Department of Ophthalmology, University of Texas Southwestern Medical Center, Dallas, Texas
| | - Peiquan Zhao
- Department of Ophthalmology, Xin Hua Hospital affiliated to Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Fernanda B Porto
- Department of Retina and Vitreous, Ophthalmologic Center of Minas Gerais, Belo Horizonte, Minas Gerais, Brazil.,Instituto de Ensino e Pesquisa da Santa Casa de Belo Horizonte Hospital - IEP/SCBH, Belo Horizonte, Minas Gerais, Brazil
| | - Juliana Sallum
- Department of Ophthalmology and Visual Sciences, Paulista School of Medicine, Federal University of São Paulo, São Paulo, Brazil
| | - Robert K Koenekoop
- McGill Ocular Genetics Laboratory and Centre, Department of Paediatric Surgery, Human Genetics, and Ophthalmology, McGill University Health Centre, Montreal, Quebec, Canada
| | - Ruifang Sui
- Department of Ophthalmology, Peking Union Medical College Hospital, Peking Union Medical College, Chinese Academy of Medical Sciences, Beijing, China
| | - Rui Chen
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas.,Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas.,Department of Biochemistry and Molecular Biology, Baylor College of Medicine, Houston, Texas.,Department of Structural and Computational Biology & Molecular Biophysics, Baylor College of Medicine, Houston, Texas
| |
Collapse
|