1
|
Alcázar-Fabra M, Østergaard E, Fernández-Ayala DJ, Desbats MA, Morbidoni V, Tomás-Gallado L, García-Corzo L, Blanquer-Roselló MDM, Bartlett AK, Sánchez-Cuesta A, Sena L, Cortés-Rodríguez A, Cascajo-Almenara MV, Pagliarini DJ, Trevisson E, Gronborg SW, Brea-Calvo G. Identification of a new COQ4 spliceogenic variant causing severe primary coenzyme Q deficiency. Mol Genet Metab Rep 2025; 42:101176. [PMID: 39759098 PMCID: PMC11699292 DOI: 10.1016/j.ymgmr.2024.101176] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2024] [Revised: 12/06/2024] [Accepted: 12/09/2024] [Indexed: 01/07/2025] Open
Abstract
Background and aims Primary Coenzyme Q (CoQ) deficiency caused by COQ4 defects is a clinically heterogeneous mitochondrial condition characterized by reduced levels of CoQ10 in tissues. Next-generation sequencing has lately boosted the genetic diagnosis of an increasing number of patients. Still, functional validation of new variants of uncertain significance is essential for an adequate diagnosis, proper clinical management, treatment, and genetic counseling. Materials and methods Both fibroblasts from a proband with COQ4 deficiency and a COQ4 knockout cell model have been characterized by a combination of biochemical and genetic analysis (HPLC lipid analysis, Oxygen consumption, minigene analysis, RNAseq, among others). Results Here, we report the case of a subject harboring a new variant of the COQ4 gene in compound heterozygosis, which shows severe clinical manifestations. We present the molecular characterization of this new pathogenic variant affecting the splicing of COQ4. Conclusion Our results highlight the importance of expanding the genetic analysis beyond the coding sequence to reduce the misdiagnosis of primary CoQ deficiency patients.
Collapse
Affiliation(s)
- María Alcázar-Fabra
- Andalusian Center of Developmental Biology (CABD), Universidad Pablo de Olavide-CSIC-JA, 41013 Seville, Spain
- Centre for Biomedical Research on Rare Diseases (CIBERER), Instituto de Salud Carlos III, 28029 Madrid, Spain
- Physiology, Anatomy and Cell Biology Department, Universidad Pablo de Olavide, 41013 Seville, Spain
| | - Elsebet Østergaard
- Department of Clinical Genetics, Copenhagen University Hospital Rigshospitalet, Blegdamsvej 9, 2100 Copenhagen, Denmark
- Department of Clinical Medicine, University of Copenhagen, Copenhagen, Denmark
| | - Daniel J.M. Fernández-Ayala
- Andalusian Center of Developmental Biology (CABD), Universidad Pablo de Olavide-CSIC-JA, 41013 Seville, Spain
- Centre for Biomedical Research on Rare Diseases (CIBERER), Instituto de Salud Carlos III, 28029 Madrid, Spain
- Physiology, Anatomy and Cell Biology Department, Universidad Pablo de Olavide, 41013 Seville, Spain
| | - María Andrea Desbats
- Clinical Genetics Unit, Department of Women's and Children's Health, University of Padova, 35128 Padova, Italy
- Istituto di Ricerca Pediatrica, Fondazione Città della Speranza, 35127 Padova, Italy
| | - Valeria Morbidoni
- Clinical Genetics Unit, Department of Women's and Children's Health, University of Padova, 35128 Padova, Italy
- Istituto di Ricerca Pediatrica, Fondazione Città della Speranza, 35127 Padova, Italy
| | - Laura Tomás-Gallado
- Proteomics and Biochemistry Platform, Andalusian Centre for Developmental Biology (CABD), CSIC-Pablo de Olavide University, 41013 Seville, Spain
| | - Laura García-Corzo
- Andalusian Center of Developmental Biology (CABD), Universidad Pablo de Olavide-CSIC-JA, 41013 Seville, Spain
- Centre for Biomedical Research on Rare Diseases (CIBERER), Instituto de Salud Carlos III, 28029 Madrid, Spain
- Physiology, Anatomy and Cell Biology Department, Universidad Pablo de Olavide, 41013 Seville, Spain
| | - María del Mar Blanquer-Roselló
- Andalusian Center of Developmental Biology (CABD), Universidad Pablo de Olavide-CSIC-JA, 41013 Seville, Spain
- Centre for Biomedical Research on Rare Diseases (CIBERER), Instituto de Salud Carlos III, 28029 Madrid, Spain
- Physiology, Anatomy and Cell Biology Department, Universidad Pablo de Olavide, 41013 Seville, Spain
| | - Abigail K. Bartlett
- Department of Biochemistry, University of Wisconsin–Madison, Madison, WI 53706, USA
- Department of Cell Biology and Physiology, Washington University School of Medicine, St. Louis, MO 63110, USA
| | - Ana Sánchez-Cuesta
- Andalusian Center of Developmental Biology (CABD), Universidad Pablo de Olavide-CSIC-JA, 41013 Seville, Spain
- Centre for Biomedical Research on Rare Diseases (CIBERER), Instituto de Salud Carlos III, 28029 Madrid, Spain
| | - Lucía Sena
- Physiology, Anatomy and Cell Biology Department, Universidad Pablo de Olavide, 41013 Seville, Spain
| | - Ana Cortés-Rodríguez
- Bioenergetics and Cell Physiology Service (U729), Central Services of Research, University Pablo de Olavide, 41013 Seville, Spain
| | - María Victoria Cascajo-Almenara
- Andalusian Center of Developmental Biology (CABD), Universidad Pablo de Olavide-CSIC-JA, 41013 Seville, Spain
- Centre for Biomedical Research on Rare Diseases (CIBERER), Instituto de Salud Carlos III, 28029 Madrid, Spain
| | - David J. Pagliarini
- Department of Cell Biology and Physiology, Washington University School of Medicine, St. Louis, MO 63110, USA
- Department of Biochemistry and Molecular Biophysics, Washington University School of Medicine, St. Louis, MO 63110, USA
- Department of Genetics, Washington University School of Medicine, St. Louis, MO 63110, USA
- Howard Hughes Medical Institute, Washington University School of Medicine, St. Louis, MO 63110, USA
| | - Eva Trevisson
- Clinical Genetics Unit, Department of Women's and Children's Health, University of Padova, 35128 Padova, Italy
- Istituto di Ricerca Pediatrica, Fondazione Città della Speranza, 35127 Padova, Italy
| | - Sabine W. Gronborg
- Center for Inherited Metabolic Diseases, Department of Pediatrics and Adolescent Medicine and Department of Clinical Genetics, Copenhagen University Hospital Rigshospitalet, Blegdamsvej 9, 2100 Copenhagen, Denmark
| | - Gloria Brea-Calvo
- Andalusian Center of Developmental Biology (CABD), Universidad Pablo de Olavide-CSIC-JA, 41013 Seville, Spain
- Centre for Biomedical Research on Rare Diseases (CIBERER), Instituto de Salud Carlos III, 28029 Madrid, Spain
- Physiology, Anatomy and Cell Biology Department, Universidad Pablo de Olavide, 41013 Seville, Spain
| |
Collapse
|
2
|
Arriaga MT, Mendez R, Ungar RA, Bonner DE, Matalon DR, Lemire G, Goddard PC, Padhi EM, Miller AM, Nguyen JV, Ma J, Smith KS, Scott SA, Liao L, Ng Z, Marwaha S, Bademci G, Bivona SA, Tekin M, Bernstein JA, Montgomery SB, O'Donnell-Luria A, Wheeler MT, Ganesh VS. Transcriptome-wide outlier approach identifies individuals with minor spliceopathies. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2025:2025.01.02.24318941. [PMID: 39802771 PMCID: PMC11722475 DOI: 10.1101/2025.01.02.24318941] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/23/2025]
Abstract
RNA-sequencing has improved the diagnostic yield of individuals with rare diseases. Current analyses predominantly focus on identifying outliers in single genes that can be attributed to cis-acting variants within the gene locus. This approach overlooks causal variants with trans-acting effects on splicing transcriptome-wide, such as variants impacting spliceosome function. We present a transcriptomics-first method to diagnose individuals with rare diseases by examining transcriptome-wide patterns of splicing outliers. Using splicing outlier detection methods (FRASER and FRASER2) we characterized splicing outliers from whole blood for 390 individuals from the Genomics Research to Elucidate the Genetics of Rare Diseases (GREGoR) and Undiagnosed Diseases Network (UDN) consortia. We examined all samples for excess intron retention outliers in minor intron containing genes (MIGs). Minor introns, which make up about 0.5% of all introns in the human genome, are removed by small nuclear RNAs (snRNAs) in the minor spliceosome. This approach identified five individuals with excess intron retention outliers in MIGs, all of which were found to harbor rare, biallelic variants in minor spliceosome snRNAs. Four individuals had rare, compound heterozygous variants in RNU4ATAC, which aided the reclassification of four variants. Additionally, one individual had rare, highly conserved, compound heterozygous variants in RNU6ATAC that may disrupt the formation of the catalytic spliceosome, suggesting a novel gene-disease candidate. These results demonstrate that examining RNA-sequencing data for transcriptome-wide signatures can increase the diagnostic yield of individuals with rare diseases, provide variant-to-function interpretation of spliceopathies, and uncover novel disease gene associations.
Collapse
Affiliation(s)
| | | | - Rachel A Ungar
- Dept. of Genetics, Stanford Univ., Stanford, CA
- Stanford Center for Biomedical Ethics, Stanford Univ., Stanford, CA
| | - Devon E Bonner
- Div. of Med. Genetics, Dept. of Pediatrics, Stanford Univ., Stanford, CA
| | - Dena R Matalon
- Div. of Med. Genetics, Dept. of Pediatrics, Stanford Univ., Stanford, CA
| | - Gabrielle Lemire
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA
- Div. of Genetics and Genomics, Boston Children's Hospital, Boston, MA
| | | | - Evin M Padhi
- Dept. of Pathology, Stanford Univ., Stanford, CA
| | | | | | - Jialan Ma
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA
| | | | - Stuart A Scott
- Dept. of Pathology, Stanford Univ., Stanford, CA
- Clinical Genomics Laboratory, Stanford Medicine, Stanford, CA
| | - Linda Liao
- Clinical Genomics Laboratory, Stanford Medicine, Stanford, CA
| | - Zena Ng
- Clinical Genomics Laboratory, Stanford Medicine, Stanford, CA
| | - Shruti Marwaha
- Div. of Cardiovascular Medicine, Stanford Univ. School of Medicine, Stanford, CA
| | - Guney Bademci
- John T. Macdonald Foundation Dept. of Human Genetics, Univ. of Miami Miller School of Medicine, Miami, FL
| | - Stephanie A Bivona
- John T. Macdonald Foundation Dept. of Human Genetics, Univ. of Miami Miller School of Medicine, Miami, FL
| | - Mustafa Tekin
- John T. Macdonald Foundation Dept. of Human Genetics, Univ. of Miami Miller School of Medicine, Miami, FL
| | | | - Stephen B Montgomery
- Dept. of Pathology, Stanford Univ., Stanford, CA
- Dept. of Genetics, Stanford Univ., Stanford, CA
- Dept. of Biomedical Data Science, Stanford Univ., Stanford, CA
| | - Anne O'Donnell-Luria
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA
- Div. of Genetics and Genomics, Boston Children's Hospital, Boston, MA
| | | | - Vijay S Ganesh
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA
- Department of Neurology, Brigham and Women's Hospital, Boston, MA
| |
Collapse
|
3
|
Qu Z, Sakaguchi N, Kikutake C, Suyama M. Identification and analysis of short indels inducing exon extension/shrinkage events. FEBS Open Bio 2024; 14:1682-1690. [PMID: 39085971 PMCID: PMC11452298 DOI: 10.1002/2211-5463.13871] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2024] [Revised: 06/24/2024] [Accepted: 07/19/2024] [Indexed: 08/02/2024] Open
Abstract
The search for genetic variants that act as causative factors in human diseases by disrupting the normal splicing process has primarily focused on single nucleotide variants (SNVs). It is worth noting that insertions or deletions (indels) have also been sporadically reported as causative disease variants through their potential impact on the splicing process. In this study, to perform identification of indels inducing exon extension/shrinkage events, we used individual-specific genomes and RNA sequencing (RNA-seq) data pertaining to the corresponding individuals and identified 12 exon extension/shrinkage events that were potentially induced by indels that disrupted authentic splice sites or created novel splice sites in 235 normal individuals. By evaluating the impact of these abnormal splicing events on the resulting transcripts, we found that five events led to the generation of premature termination codons (PTCs), including those occurring within genes associated with genetic disorders. Our analysis revealed that the potential functions of indels have been underexamined, and it is worth considering the possibility that indels may affect splice site usage, using RNA-seq data to discover novel potentially disease-associated mutations.
Collapse
Affiliation(s)
- Zhuo Qu
- Division of Bioinformatics, Medical Institute of BioregulationKyushu UniversityFukuokaJapan
| | - Narumi Sakaguchi
- Division of Bioinformatics, Medical Institute of BioregulationKyushu UniversityFukuokaJapan
| | - Chie Kikutake
- Division of Bioinformatics, Medical Institute of BioregulationKyushu UniversityFukuokaJapan
| | - Mikita Suyama
- Division of Bioinformatics, Medical Institute of BioregulationKyushu UniversityFukuokaJapan
| |
Collapse
|
4
|
Holm LL, Doktor TK, Flugt KK, Petersen US, Petersen R, Andresen B. All exons are not created equal-exon vulnerability determines the effect of exonic mutations on splicing. Nucleic Acids Res 2024; 52:4588-4603. [PMID: 38324470 PMCID: PMC11077056 DOI: 10.1093/nar/gkae077] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2023] [Revised: 01/05/2024] [Accepted: 01/26/2024] [Indexed: 02/09/2024] Open
Abstract
It is now widely accepted that aberrant splicing of constitutive exons is often caused by mutations affecting cis-acting splicing regulatory elements (SREs), but there is a misconception that all exons have an equal dependency on SREs and thus a similar vulnerability to aberrant splicing. We demonstrate that some exons are more likely to be affected by exonic splicing mutations (ESMs) due to an inherent vulnerability, which is context dependent and influenced by the strength of exon definition. We have developed VulExMap, a tool which is based on empirical data that can designate whether a constitutive exon is vulnerable. Using VulExMap, we find that only 25% of all exons can be categorized as vulnerable, whereas two-thirds of 359 previously reported ESMs in 75 disease genes are located in vulnerable exons. Because VulExMap analysis is based on empirical data on splicing of exons in their endogenous context, it includes all features important in determining the vulnerability. We believe that VulExMap will be an important tool when assessing the effect of exonic mutations by pinpointing whether they are located in exons vulnerable to ESMs.
Collapse
Affiliation(s)
- Lise L Holm
- Department of Biochemistry and Molecular Biology, University of Southern Denmark, 5230 Odense M, Denmark
- Villum Center for Bioanalytical Sciences, University of Southern Denmark, 5230 Odense M, Denmark
| | - Thomas K Doktor
- Department of Biochemistry and Molecular Biology, University of Southern Denmark, 5230 Odense M, Denmark
- Villum Center for Bioanalytical Sciences, University of Southern Denmark, 5230 Odense M, Denmark
| | - Katharina K Flugt
- Department of Biochemistry and Molecular Biology, University of Southern Denmark, 5230 Odense M, Denmark
- Villum Center for Bioanalytical Sciences, University of Southern Denmark, 5230 Odense M, Denmark
| | - Ulrika S S Petersen
- Department of Biochemistry and Molecular Biology, University of Southern Denmark, 5230 Odense M, Denmark
- Villum Center for Bioanalytical Sciences, University of Southern Denmark, 5230 Odense M, Denmark
| | - Rikke Petersen
- Department of Biochemistry and Molecular Biology, University of Southern Denmark, 5230 Odense M, Denmark
- Villum Center for Bioanalytical Sciences, University of Southern Denmark, 5230 Odense M, Denmark
| | - Brage S Andresen
- Department of Biochemistry and Molecular Biology, University of Southern Denmark, 5230 Odense M, Denmark
- Villum Center for Bioanalytical Sciences, University of Southern Denmark, 5230 Odense M, Denmark
| |
Collapse
|
5
|
Smith C, Kitzman JO. Benchmarking splice variant prediction algorithms using massively parallel splicing assays. Genome Biol 2023; 24:294. [PMID: 38129864 PMCID: PMC10734170 DOI: 10.1186/s13059-023-03144-z] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2023] [Accepted: 12/13/2023] [Indexed: 12/23/2023] Open
Abstract
BACKGROUND Variants that disrupt mRNA splicing account for a sizable fraction of the pathogenic burden in many genetic disorders, but identifying splice-disruptive variants (SDVs) beyond the essential splice site dinucleotides remains difficult. Computational predictors are often discordant, compounding the challenge of variant interpretation. Because they are primarily validated using clinical variant sets heavily biased to known canonical splice site mutations, it remains unclear how well their performance generalizes. RESULTS We benchmark eight widely used splicing effect prediction algorithms, leveraging massively parallel splicing assays (MPSAs) as a source of experimentally determined ground-truth. MPSAs simultaneously assay many variants to nominate candidate SDVs. We compare experimentally measured splicing outcomes with bioinformatic predictions for 3,616 variants in five genes. Algorithms' concordance with MPSA measurements, and with each other, is lower for exonic than intronic variants, underscoring the difficulty of identifying missense or synonymous SDVs. Deep learning-based predictors trained on gene model annotations achieve the best overall performance at distinguishing disruptive and neutral variants, and controlling for overall call rate genome-wide, SpliceAI and Pangolin have superior sensitivity. Finally, our results highlight two practical considerations when scoring variants genome-wide: finding an optimal score cutoff, and the substantial variability introduced by differences in gene model annotation, and we suggest strategies for optimal splice effect prediction in the face of these issues. CONCLUSION SpliceAI and Pangolin show the best overall performance among predictors tested, however, improvements in splice effect prediction are still needed especially within exons.
Collapse
Affiliation(s)
- Cathy Smith
- Department of Human Genetics, University of Michigan Medical School, Ann Arbor, MI, 48109, USA
- Department of Computational Medicine and Bioinformatics, University of Michigan Medical School, Ann Arbor, MI, 48109, USA
| | - Jacob O Kitzman
- Department of Human Genetics, University of Michigan Medical School, Ann Arbor, MI, 48109, USA.
- Department of Computational Medicine and Bioinformatics, University of Michigan Medical School, Ann Arbor, MI, 48109, USA.
| |
Collapse
|
6
|
Kurosawa R, Iida K, Ajiro M, Awaya T, Yamada M, Kosaki K, Hagiwara M. PDIVAS: Pathogenicity predictor for Deep-Intronic Variants causing Aberrant Splicing. BMC Genomics 2023; 24:601. [PMID: 37817060 PMCID: PMC10563346 DOI: 10.1186/s12864-023-09645-2] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2023] [Accepted: 09/01/2023] [Indexed: 10/12/2023] Open
Abstract
BACKGROUND Deep-intronic variants that alter RNA splicing were ineffectively evaluated in the search for the cause of genetic diseases. Determination of such pathogenic variants from a vast number of deep-intronic variants (approximately 1,500,000 variants per individual) represents a technical challenge to researchers. Thus, we developed a Pathogenicity predictor for Deep-Intronic Variants causing Aberrant Splicing (PDIVAS) to easily detect pathogenic deep-intronic variants. RESULTS PDIVAS was trained on an ensemble machine-learning algorithm to classify pathogenic and benign variants in a curated dataset. The dataset consists of manually curated pathogenic splice-altering variants (SAVs) and commonly observed benign variants within deep introns. Splicing features and a splicing constraint metric were used to maximize the predictive sensitivity and specificity, respectively. PDIVAS showed an average precision of 0.92 and a maximum MCC of 0.88 in classifying these variants, which were the best of the previous predictors. When PDIVAS was applied to genome sequencing analysis on a threshold with 95% sensitivity for reported pathogenic SAVs, an average of 27 pathogenic candidates were extracted per individual. Furthermore, the causative variants in simulated patient genomes were more efficiently prioritized than the previous predictors. CONCLUSION Incorporating PDIVAS into variant interpretation pipelines will enable efficient detection of disease-causing deep-intronic SAVs and contribute to improving the diagnostic yield. PDIVAS is publicly available at https://github.com/shiro-kur/PDIVAS .
Collapse
Affiliation(s)
- Ryo Kurosawa
- Department of Anatomy and Developmental Biology, Graduate School of Medicine, Kyoto University, Yoshida-Konoe-cho, Sakyo-ku, Kyoto, 606-8501, Japan.
| | - Kei Iida
- Faculty of Science and Engineering, Kindai University, 3-4-1 Kowakae, Higashi-osaka, Osaka, 577-8502, Japan
- Medical Research Support Center, Graduate School of Medicine, Kyoto University, Yoshida- Konoe-cho, Sakyo-ku, Kyoto, 606-8501, Japan
| | - Masahiko Ajiro
- Division of Cancer RNA Research, National Cancer Center Research Institute, Tokyo, 104- 0045, Japan
- Department of Drug Discovery Medicine, Graduate School of Medicine, Kyoto University, Yoshida Konoe-cho, Sakyo-ku, Kyoto, 606-8501, Japan
| | - Tomonari Awaya
- Department of Anatomy and Developmental Biology, Graduate School of Medicine, Kyoto University, Yoshida-Konoe-cho, Sakyo-ku, Kyoto, 606-8501, Japan
- Laboratory of Tumor Microenvironment and Immunity, Graduate School of Medicine, Kyoto University, Yoshida-Konoe-cho, Sakyo-ku, Kyoto, 606-8501, Japan
| | - Mamiko Yamada
- Center for Medical Genetics, Keio University School of Medicine, Tokyo, 160-8582, Japan
| | - Kenjiro Kosaki
- Center for Medical Genetics, Keio University School of Medicine, Tokyo, 160-8582, Japan
| | - Masatoshi Hagiwara
- Department of Anatomy and Developmental Biology, Graduate School of Medicine, Kyoto University, Yoshida-Konoe-cho, Sakyo-ku, Kyoto, 606-8501, Japan.
| |
Collapse
|
7
|
Smith C, Kitzman JO. Benchmarking splice variant prediction algorithms using massively parallel splicing assays. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.05.04.539398. [PMID: 37205456 PMCID: PMC10187268 DOI: 10.1101/2023.05.04.539398] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/21/2023]
Abstract
Background Variants that disrupt mRNA splicing account for a sizable fraction of the pathogenic burden in many genetic disorders, but identifying splice-disruptive variants (SDVs) beyond the essential splice site dinucleotides remains difficult. Computational predictors are often discordant, compounding the challenge of variant interpretation. Because they are primarily validated using clinical variant sets heavily biased to known canonical splice site mutations, it remains unclear how well their performance generalizes. Results We benchmarked eight widely used splicing effect prediction algorithms, leveraging massively parallel splicing assays (MPSAs) as a source of experimentally determined ground-truth. MPSAs simultaneously assay many variants to nominate candidate SDVs. We compared experimentally measured splicing outcomes with bioinformatic predictions for 3,616 variants in five genes. Algorithms' concordance with MPSA measurements, and with each other, was lower for exonic than intronic variants, underscoring the difficulty of identifying missense or synonymous SDVs. Deep learning-based predictors trained on gene model annotations achieved the best overall performance at distinguishing disruptive and neutral variants. Controlling for overall call rate genome-wide, SpliceAI and Pangolin also showed superior overall sensitivity for identifying SDVs. Finally, our results highlight two practical considerations when scoring variants genome-wide: finding an optimal score cutoff, and the substantial variability introduced by differences in gene model annotation, and we suggest strategies for optimal splice effect prediction in the face of these issues. Conclusion SpliceAI and Pangolin showed the best overall performance among predictors tested, however, improvements in splice effect prediction are still needed especially within exons.
Collapse
Affiliation(s)
- Cathy Smith
- Department of Human Genetics, University of Michigan Medical School, Ann Arbor, MI 48109, USA
- Department of Computational Medicine and Bioinformatics, University of Michigan Medical School, Ann Arbor, MI 48109, USA
| | - Jacob O. Kitzman
- Department of Human Genetics, University of Michigan Medical School, Ann Arbor, MI 48109, USA
- Department of Computational Medicine and Bioinformatics, University of Michigan Medical School, Ann Arbor, MI 48109, USA
| |
Collapse
|
8
|
Barbosa P, Savisaar R, Carmo-Fonseca M, Fonseca A. Computational prediction of human deep intronic variation. Gigascience 2022; 12:giad085. [PMID: 37878682 PMCID: PMC10599398 DOI: 10.1093/gigascience/giad085] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2023] [Revised: 06/07/2023] [Accepted: 09/20/2023] [Indexed: 10/27/2023] Open
Abstract
BACKGROUND The adoption of whole-genome sequencing in genetic screens has facilitated the detection of genetic variation in the intronic regions of genes, far from annotated splice sites. However, selecting an appropriate computational tool to discriminate functionally relevant genetic variants from those with no effect is challenging, particularly for deep intronic regions where independent benchmarks are scarce. RESULTS In this study, we have provided an overview of the computational methods available and the extent to which they can be used to analyze deep intronic variation. We leveraged diverse datasets to extensively evaluate tool performance across different intronic regions, distinguishing between variants that are expected to disrupt splicing through different molecular mechanisms. Notably, we compared the performance of SpliceAI, a widely used sequence-based deep learning model, with that of more recent methods that extend its original implementation. We observed considerable differences in tool performance depending on the region considered, with variants generating cryptic splice sites being better predicted than those that potentially affect splicing regulatory elements. Finally, we devised a novel quantitative assessment of tool interpretability and found that tools providing mechanistic explanations of their predictions are often correct with respect to the ground - information, but the use of these tools results in decreased predictive power when compared to black box methods. CONCLUSIONS Our findings translate into practical recommendations for tool usage and provide a reference framework for applying prediction tools in deep intronic regions, enabling more informed decision-making by practitioners.
Collapse
Affiliation(s)
- Pedro Barbosa
- LASIGE, Departamento de Informática, Faculdade de Ciências, Universidade de Lisboa, 1749-016,, Lisboa, Portugal
- Instituto de Medicina Molecular João Lobo Antunes, Faculdade de Medicina, Universidade de Lisboa, 1649-028, Lisboa, Portugal
| | | | - Maria Carmo-Fonseca
- Instituto de Medicina Molecular João Lobo Antunes, Faculdade de Medicina, Universidade de Lisboa, 1649-028, Lisboa, Portugal
| | - Alcides Fonseca
- LASIGE, Departamento de Informática, Faculdade de Ciências, Universidade de Lisboa, 1749-016,, Lisboa, Portugal
| |
Collapse
|