1
|
Riepe TV, Khan M, Roosing S, Cremers FPM, 't Hoen PAC. Benchmarking deep learning splice prediction tools using functional splice assays. Hum Mutat 2021; 42:799-810. [PMID: 33942434 PMCID: PMC8360004 DOI: 10.1002/humu.24212] [Citation(s) in RCA: 56] [Impact Index Per Article: 18.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2020] [Revised: 03/16/2021] [Accepted: 04/17/2021] [Indexed: 12/21/2022]
Abstract
Hereditary disorders are frequently caused by genetic variants that affect pre-messenger RNA splicing. Though genetic variants in the canonical splice motifs are almost always disrupting splicing, the pathogenicity of variants in the noncanonical splice sites (NCSS) and deep intronic (DI) regions are difficult to predict. Multiple splice prediction tools have been developed for this purpose, with the latest tools employing deep learning algorithms. We benchmarked established and deep learning splice prediction tools on published gold standard sets of 71 NCSS and 81 DI variants in the ABCA4 gene and 61 NCSS variants in the MYBPC3 gene with functional assessment in midigene and minigene splice assays. The selection of splice prediction tools included CADD, DSSP, GeneSplicer, MaxEntScan, MMSplice, NNSPLICE, SPIDEX, SpliceAI, SpliceRover, and SpliceSiteFinder-like. The best-performing splice prediction tool for the different variants was SpliceRover for ABCA4 NCSS variants, SpliceAI for ABCA4 DI variants, and the Alamut 3/4 consensus approach (GeneSplicer, MaxEntScacn, NNSPLICE and SpliceSiteFinder-like) for NCSS variants in MYBPC3 based on the area under the receiver operator curve. Overall, the performance in a real-time clinical setting is much more modest than reported by the developers of the tools.
Collapse
Affiliation(s)
- Tabea V. Riepe
- Centre for Molecular and Biomolecular Informatics, Radboud Institute for Molecular Life SciencesRadboud University Medical CenterNijmegenThe Netherlands
- Department of Human Genetics and Donders Institute for Brain, Cognition and BehaviorRadboud University Medical CenterNijmegenThe Netherlands
| | - Mubeen Khan
- Department of Human Genetics and Donders Institute for Brain, Cognition and BehaviorRadboud University Medical CenterNijmegenThe Netherlands
| | - Susanne Roosing
- Department of Human Genetics and Donders Institute for Brain, Cognition and BehaviorRadboud University Medical CenterNijmegenThe Netherlands
| | - Frans P. M. Cremers
- Department of Human Genetics and Donders Institute for Brain, Cognition and BehaviorRadboud University Medical CenterNijmegenThe Netherlands
| | - Peter A. C. 't Hoen
- Centre for Molecular and Biomolecular Informatics, Radboud Institute for Molecular Life SciencesRadboud University Medical CenterNijmegenThe Netherlands
| |
Collapse
|
2
|
Abstract
Systematics is described for annotation of variations in RNA molecules. The conceptual framework is part of Variation Ontology (VariO) and facilitates depiction of types of variations, their functional and structural effects and other consequences in any RNA molecule in any organism. There are more than 150 RNA related VariO terms in seven levels, which can be further combined to generate even more complicated and detailed annotations. The terms are described together with examples, usually for variations and effects in human and in diseases. RNA variation type has two subcategories: variation classification and origin with subterms. Altogether six terms are available for function description. Several terms are available for affected RNA properties. The ontology contains also terms for structural description for affected RNA type, post-transcriptional RNA modifications, secondary and tertiary structure effects and RNA sugar variations. Together with the DNA and protein concepts and annotations, RNA terms allow comprehensive description of variations of genetic and non-genetic origin at all possible levels. The VariO annotations are readable both for humans and computer programs for advanced data integration and mining.
Collapse
Affiliation(s)
- Mauno Vihinen
- Department of Experimental Medical Science, Lund University, Lund, Sweden
| |
Collapse
|
3
|
Understanding human DNA variants affecting pre-mRNA splicing in the NGS era. ADVANCES IN GENETICS 2019; 103:39-90. [PMID: 30904096 DOI: 10.1016/bs.adgen.2018.09.002] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
Pre-mRNA splicing, an essential step in eukaryotic gene expression, relies on recognition of short sequences on the primary transcript intron ends and takes place along transcription by RNA polymerase II. Exonic and intronic auxiliary elements may modify the strength of exon definition and intron recognition. Splicing DNA variants (SV) have been associated with human genetic diseases at canonical intron sites, as well as exonic substitutions putatively classified as nonsense, missense or synonymous variants. Their effects on mRNA may be modulated by cryptic splice sites associated to the SV allele, comprehending exon skipping or shortening, and partial or complete intron retention. As splicing mRNA outputs result from combinatorial effects of both intrinsic and extrinsic factors, in vitro functional assays supported by computational analyses are recommended to assist SV pathogenicity assessment for human Mendelian inheritance diseases. The increasing use of next-generating sequencing (NGS) targeting full genomic gene sequence has raised awareness of the relevance of deep intronic SV in genetic diseases and inclusion of pseudo-exons into mRNA. Finally, we take advantage of recent advances in sequencing and computational technologies to analyze alternative splicing in cancer. We explore the Catalog of Somatic Mutations in Cancer (COSMIC) to describe the proportion of splice-site mutations in cis and trans regulatory elements. Genomic data from large cohorts of different cancer types are increasingly available, in addition to repositories of normal and somatic genetic variations. These are likely to bring new insights to understanding the genetic control of alternative splicing by mapping splicing quantitative trait loci in tumors.
Collapse
|
4
|
Hong Y, Shi J, Ge Z, Wu H. Associations between mutations of the cell cycle checkpoint kinase 2 gene and gastric carcinogenesis. Mol Med Rep 2017; 16:4287-4292. [PMID: 29067458 DOI: 10.3892/mmr.2017.7080] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2016] [Accepted: 05/18/2017] [Indexed: 11/05/2022] Open
Abstract
Gastric cancer is the most common malignant tumor of the digestive system. The etiology of gastric cancer is complex, and susceptibility at the genetic level remains to be fully elucidated in genetic investigations. In the present study, mutations of the cell cycle checkpoint kinase 2 (CHEK2) gene and its association with gastric cancer were examined. Reverse transcription‑quantitative polymerase chain reaction technology was used to detect the expression of CHEK2 and it was found that the expression of CHEK2 was low in gastric cancer. Using sequencing analysis, it was found that the low expression level of CHEK2 was associated with expression of its mutation. The present study also established a CHEK2‑overexpressing mutant and confirmed that CHEK2 promoted gastric cancer cell proliferation. Overexpression of the CHEK2 mutation was confirmed to promote cancer cell migration and invasion. Furthermore, western blot analysis results revealed that overexpression of the CHEK2 mutation downregulated E‑cadherin and upregulated vimentin expression, indicating the mechanism underlying the altered biological behavior. These results suggested that there was a correlation between mutation of the CHEK2 gene and gastric cancer, and provided an experimental basis for antitumor drug investigation and development according to its mutation target.
Collapse
Affiliation(s)
- Yan Hong
- Department of General Surgery, The Second Affiliated Hospital of Soochow University, Suzhou, Jiangsu 215004, P.R. China
| | - Jun Shi
- Department of General Surgery, Yixing People's Hospital, Yixing, Jiangsu 214200, P.R. China
| | - Zhijun Ge
- Department of General Surgery, Yixing People's Hospital, Yixing, Jiangsu 214200, P.R. China
| | - Haorong Wu
- Department of General Surgery, The Second Affiliated Hospital of Soochow University, Suzhou, Jiangsu 215004, P.R. China
| |
Collapse
|
5
|
Steric Clash in the SET Domain of Histone Methyltransferase NSD1 as a Cause of Sotos Syndrome and Its Genetic Heterogeneity in a Brazilian Cohort. Genes (Basel) 2016; 7:genes7110096. [PMID: 27834868 PMCID: PMC5126782 DOI: 10.3390/genes7110096] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2016] [Revised: 10/08/2016] [Accepted: 10/21/2016] [Indexed: 12/26/2022] Open
Abstract
Most histone methyltransferases (HMTase) harbor a predicted Su(var)3–9, Enhancer-of-zeste, Trithorax (SET) domain, which transfers a methyl group to a lysine residue in their substrates. Mutations of the SET domains were reported to cause intellectual disability syndromes such as Sotos, Weaver, or Kabuki syndromes. Sotos syndrome is an overgrowth syndrome with intellectual disability caused by haploinsufficiency of the nuclear receptor binding SET domain protein 1 (NSD1) gene, an HMTase at 5q35.2–35.3. Here, we analyzed NSD1 in 34 Brazilian Sotos patients and identified three novel and eight known mutations. Using protein modeling and bioinformatic approaches, we evaluated the effects of one novel (I2007F) and 21 previously reported missense mutations in the SET domain. For the I2007F mutation, we observed conformational change and loss of structural stability in Molecular Dynamics (MD) simulations which may lead to loss-of-function of the SET domain. For six mutations near the ligand-binding site we observed in simulations steric clashes with neighboring side chains near the substrate S-Adenosyl methionine (SAM) binding site, which may disrupt the enzymatic activity of NSD1. These results point to a structural mechanism underlying the pathology of the NSD1 missense mutations in the SET domain in Sotos syndrome. NSD1 mutations were identified in only 32% of the Brazilian Sotos patients in our study cohort suggesting other genes (including unknown disease genes) underlie the molecular etiology for the majority of these patients. Our studies also found NSD1 expression to be profound in human fetal brain and cerebellum, accounting for prenatal onset and hypoplasia of cerebellar vermis seen in Sotos syndrome.
Collapse
|
6
|
Labonne JDJ, Graves TD, Shen Y, Jones JR, Kong IK, Layman LC, Kim HG. A microdeletion at Xq22.2 implicates a glycine receptor GLRA4 involved in intellectual disability, behavioral problems and craniofacial anomalies. BMC Neurol 2016; 16:132. [PMID: 27506666 PMCID: PMC4979147 DOI: 10.1186/s12883-016-0642-z] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2016] [Accepted: 07/20/2016] [Indexed: 12/03/2022] Open
Abstract
Background Among the 21 annotated genes at Xq22.2, PLP1 is the only known gene involved in Xq22.2 microdeletion and microduplication syndromes with intellectual disability. Using an atypical microdeletion, which does not encompass PLP1, we implicate a novel gene GLRA4 involved in intellectual disability, behavioral problems and craniofacial anomalies. Case presentation We report a female patient (DGDP084) with a de novo Xq22.2 microdeletion of at least 110 kb presenting with intellectual disability, motor delay, behavioral problems and craniofacial anomalies. While her phenotypic features such as cognitive impairment and motor delay show overlap with Pelizaeus-Merzbacher disease (PMD) caused by PLP1 mutations at Xq22.2, this gene is not included in our patient’s microdeletion and is not dysregulated by a position effect. Because the microdeletion encompasses only three genes, GLRA4, MORF4L2 and TCEAL1, we investigated their expression levels in various tissues by RT-qPCR and found that all three genes were highly expressed in whole human brain, fetal brain, cerebellum and hippocampus. When we examined the transcript levels of GLRA4, MORF4L2 as well as TCEAL1 in DGDP084′s family, however, only GLRA4 transcripts were reduced in the female patient compared to her healthy mother. This suggests that GLRA4 is the plausible candidate gene for cognitive impairment, behavioral problems and craniofacial anomalies observed in DGDP084. Importantly, glycine receptors mediate inhibitory synaptic transmission in the brain stem as well as the spinal cord, and are known to be involved in syndromic intellectual disability. Conclusion We hypothesize that GLRA4 is involved in intellectual disability, behavioral problems and craniofacial anomalies as the second gene identified for X-linked syndromic intellectual disability at Xq22.2. Additional point mutations or intragenic deletions of GLRA4 as well as functional studies are needed to further validate our hypothesis. Electronic supplementary material The online version of this article (doi:10.1186/s12883-016-0642-z) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Jonathan D J Labonne
- Department of Obstetrics & Gynecology, Section of Reproductive Endocrinology, Infertility & Genetics, Medical College of Georgia, Augusta University, 1120 15th Street, Augusta, GA, 30912, USA.,Department of Neuroscience and Regenerative Medicine, Medical College of Georgia, Augusta University, 1120 15th Street, Augusta, GA, 30912, USA
| | - Tyler D Graves
- Department of Obstetrics & Gynecology, Section of Reproductive Endocrinology, Infertility & Genetics, Medical College of Georgia, Augusta University, 1120 15th Street, Augusta, GA, 30912, USA
| | - Yiping Shen
- Department of Laboratory Medicine, Boston Children's Hospital, Harvard Medical School, Boston, MA, 02115, USA
| | | | - Il-Keun Kong
- Department of Animal Science, Division of Applied Life Science (BK21plus), Institute of Agriculture and Life Science, Gyeongsang National University, Jinju, Gyeongsangnam-do, South Korea
| | - Lawrence C Layman
- Department of Obstetrics & Gynecology, Section of Reproductive Endocrinology, Infertility & Genetics, Medical College of Georgia, Augusta University, 1120 15th Street, Augusta, GA, 30912, USA.,Department of Neuroscience and Regenerative Medicine, Medical College of Georgia, Augusta University, 1120 15th Street, Augusta, GA, 30912, USA.,Neuroscience Program, Medical College of Georgia, Augusta University, Augusta, GA, 30912, USA
| | - Hyung-Goo Kim
- Department of Obstetrics & Gynecology, Section of Reproductive Endocrinology, Infertility & Genetics, Medical College of Georgia, Augusta University, 1120 15th Street, Augusta, GA, 30912, USA. .,Department of Neuroscience and Regenerative Medicine, Medical College of Georgia, Augusta University, 1120 15th Street, Augusta, GA, 30912, USA.
| |
Collapse
|