1
|
Zhu A, Chiba S, Shimizu Y, Kunitake K, Okuno Y, Aoki Y, Yokota T. Ensemble-Learning and Feature Selection Techniques for Enhanced Antisense Oligonucleotide Efficacy Prediction in Exon Skipping. Pharmaceutics 2023; 15:1808. [PMID: 37513994 PMCID: PMC10384346 DOI: 10.3390/pharmaceutics15071808] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2023] [Revised: 06/13/2023] [Accepted: 06/15/2023] [Indexed: 07/30/2023] Open
Abstract
Antisense oligonucleotide (ASO)-mediated exon skipping has become a valuable tool for investigating gene function and developing gene therapy. Machine-learning-based computational methods, such as eSkip-Finder, have been developed to predict the efficacy of ASOs via exon skipping. However, these methods are computationally demanding, and the accuracy of predictions remains suboptimal. In this study, we propose a new approach to reduce the computational burden and improve the prediction performance by using feature selection within machine-learning algorithms and ensemble-learning techniques. We evaluated our approach using a dataset of experimentally validated exon-skipping events, dividing it into training and testing sets. Our results demonstrate that using a three-way-voting approach with random forest, gradient boosting, and XGBoost can significantly reduce the computation time to under ten seconds while improving prediction performance, as measured by R2 for both 2'-O-methyl nucleotides (2OMe) and phosphorodiamidate morpholino oligomers (PMOs). Additionally, the feature importance ranking derived from our approach is in good agreement with previously published results. Our findings suggest that our approach has the potential to enhance the accuracy and efficiency of predicting ASO efficacy via exon skipping. It could also facilitate the development of novel therapeutic strategies. This study could contribute to the ongoing efforts to improve ASO design and optimize gene therapy approaches.
Collapse
Affiliation(s)
- Alex Zhu
- Phillips Academy, Andover, MA 01810, USA
- Department of Medical Generics, Faculty of Medicine and Dentistry, University of Alberta, Edmonton, AB T6G 2H7, Canada
| | - Shuntaro Chiba
- HPC- and AI-Driven Drug Development Platform Division, RIKEN Center for Computational Science, Yokohama 230-0045, Japan
| | - Yuki Shimizu
- Department of Biomedical Data Intelligence, Graduate School of Medicine, Kyoto University, Kyoto 606-8507, Japan
| | - Katsuhiko Kunitake
- Department of Molecular Therapy, National Institute of Neuroscience, National Center of Neurology and Psychiatry (NCNP), Kodaira, Tokyo 187-8551, Japan
| | - Yasushi Okuno
- HPC- and AI-Driven Drug Development Platform Division, RIKEN Center for Computational Science, Yokohama 230-0045, Japan
- Department of Biomedical Data Intelligence, Graduate School of Medicine, Kyoto University, Kyoto 606-8507, Japan
| | - Yoshitsugu Aoki
- Department of Molecular Therapy, National Institute of Neuroscience, National Center of Neurology and Psychiatry (NCNP), Kodaira, Tokyo 187-8551, Japan
| | - Toshifumi Yokota
- Department of Medical Generics, Faculty of Medicine and Dentistry, University of Alberta, Edmonton, AB T6G 2H7, Canada
| |
Collapse
|
2
|
Keegan NP, Wilton SD, Fletcher S. Analysis of Pathogenic Pseudoexons Reveals Novel Mechanisms Driving Cryptic Splicing. Front Genet 2022; 12:806946. [PMID: 35140743 PMCID: PMC8819188 DOI: 10.3389/fgene.2021.806946] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2021] [Accepted: 12/09/2021] [Indexed: 12/16/2022] Open
Abstract
Understanding pre-mRNA splicing is crucial to accurately diagnosing and treating genetic diseases. However, mutations that alter splicing can exert highly diverse effects. Of all the known types of splicing mutations, perhaps the rarest and most difficult to predict are those that activate pseudoexons, sometimes also called cryptic exons. Unlike other splicing mutations that either destroy or redirect existing splice events, pseudoexon mutations appear to create entirely new exons within introns. Since exon definition in vertebrates requires coordinated arrangements of numerous RNA motifs, one might expect that pseudoexons would only arise when rearrangements of intronic DNA create novel exons by chance. Surprisingly, although such mutations do occur, a far more common cause of pseudoexons is deep-intronic single nucleotide variants, raising the question of why these latent exon-like tracts near the mutation sites have not already been purged from the genome by the evolutionary advantage of more efficient splicing. Possible answers may lie in deep intronic splicing processes such as recursive splicing or poison exon splicing. Because these processes utilize intronic motifs that benignly engage with the spliceosome, the regions involved may be more susceptible to exonization than other intronic regions would be. We speculated that a comprehensive study of reported pseudoexons might detect alignments with known deep intronic splice sites and could also permit the characterisation of novel pseudoexon categories. In this report, we present and analyse a catalogue of over 400 published pseudoexon splice events. In addition to confirming prior observations of the most common pseudoexon mutation types, the size of this catalogue also enabled us to suggest new categories for some of the rarer types of pseudoexon mutation. By comparing our catalogue against published datasets of non-canonical splice events, we also found that 15.7% of pseudoexons exhibit some splicing activity at one or both of their splice sites in non-mutant cells. Importantly, this included seven examples of experimentally confirmed recursive splice sites, confirming for the first time a long-suspected link between these two splicing phenomena. These findings have the potential to improve the fidelity of genetic diagnostics and reveal new targets for splice-modulating therapies.
Collapse
Affiliation(s)
- Niall P. Keegan
- Centre for Molecular Medicine and Innovative Therapeutics, Health Futures Institute, Murdoch University, Perth, WA, Australia
- Centre for Neuromuscular and Neurological Disorders, Perron Institute for Neurological and Translational Science, The University of Western Australia, Perth, WA, Australia
- *Correspondence: Niall P. Keegan,
| | - Steve D. Wilton
- Centre for Molecular Medicine and Innovative Therapeutics, Health Futures Institute, Murdoch University, Perth, WA, Australia
- Centre for Neuromuscular and Neurological Disorders, Perron Institute for Neurological and Translational Science, The University of Western Australia, Perth, WA, Australia
| | - Sue Fletcher
- Centre for Molecular Medicine and Innovative Therapeutics, Health Futures Institute, Murdoch University, Perth, WA, Australia
- Centre for Neuromuscular and Neurological Disorders, Perron Institute for Neurological and Translational Science, The University of Western Australia, Perth, WA, Australia
| |
Collapse
|
3
|
Abstract
The DMD gene is the largest in the human genome, with a total intron content exceeding 2.2Mb. In the decades since DMD was discovered there have been numerous reported cases of pseudoexons (PEs) arising in the mature DMD transcripts of some individuals, either as the result of mutations or as low-frequency errors of the spliceosome. In this review, I collate from the literature 58 examples of DMD PEs and examine the diversity and commonalities of their features. In particular, I note the high frequency of PEs that arise from deep intronic SNVs and discuss a possible link between PEs induced by distal mutations and the regulation of recursive splicing.
Collapse
Affiliation(s)
- Niall P Keegan
- Centre for Molecular Medicine and Innovative Therapeutics, Murdoch University and Perron Institute, Perth, Australia
| |
Collapse
|
4
|
Normal and altered pre-mRNA processing in the DMD gene. Hum Genet 2017; 136:1155-1172. [DOI: 10.1007/s00439-017-1820-9] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2017] [Accepted: 06/02/2017] [Indexed: 12/11/2022]
|
5
|
Nishida A, Minegishi M, Takeuchi A, Awano H, Niba ETE, Matsuo M. Neuronal SH-SY5Y cells use the C-dystrophin promoter coupled with exon 78 skipping and display multiple patterns of alternative splicing including two intronic insertion events. Hum Genet 2015; 134:993-1001. [PMID: 26152642 DOI: 10.1007/s00439-015-1581-2] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2015] [Accepted: 06/27/2015] [Indexed: 01/01/2023]
Abstract
Duchenne muscular dystrophy (DMD) is a progressive muscle wasting disease caused by mutations in the dystrophin gene. One-third of DMD cases are complicated by mental retardation. Here, we used reverse transcription PCR to analyze the pattern of dystrophin transcripts in neuronal SH-SY5Y cells. Among the three alternative promoters/first exons at the 5'-end, only transcripts containing the brain cortex-specific C1 exon could be amplified. The C-transcript appeared as two products: a major product of the expected size and a minor larger product that contained the cryptic exon 1a between exons C1 and 2. At the 3'-end there was complete exon 78 skipping. Together, these findings indicate that SH-SY5Y cells have neuron-specific characteristics with regard to both promoter activation and alternative splicing. We also revealed partial skipping of exons 9 and 71. Four amplified products were obtained from a fragment covering exons 36-41: a strong expected product, two weak products lacking either exon 37 or exon 38, and a second strong larger product with a 568-bp insertion between exons 40 and 41. The inserted sequence matched the 3'-end of intron 40 perfectly. We concluded that a cryptic splice site was activated in SH-SY5Y cells to create the novel, unusually large, exon 41e (751 bp). In total, we identified seven alternative splicing events in neuronal SH-SY5Y cells, and calculated that 32 dystrophin transcripts could be produced. Our results may provide clues in the analysis of transcriptype-phenotype correlations as regards mental retardation in DMD.
Collapse
Affiliation(s)
- Atsushi Nishida
- Department of Medical Rehabilitation, Faculty of Rehabilitation, Kobegakuin University, 518 Arise, Ikawadani, Nishi, Kobe, 651-2180, Japan
| | | | | | | | | | | |
Collapse
|
6
|
Nishida A, Minegishi M, Takeuchi A, Niba ETE, Awano H, Lee T, Iijima K, Takeshima Y, Matsuo M. Tissue- and case-specific retention of intron 40 in mature dystrophin mRNA. J Hum Genet 2015; 60:327-33. [PMID: 25833469 DOI: 10.1038/jhg.2015.24] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2014] [Revised: 02/10/2015] [Accepted: 02/12/2015] [Indexed: 11/09/2022]
Abstract
The dystrophin gene, which is mutated in Duchenne muscular dystrophy (DMD), comprises 79 exons that show multiple alternative splicing events. Intron retention, a type of alternative splicing, may control gene expression. We examined intron retention in dystrophin introns by reverse-transcription PCR from skeletal muscle, focusing on the nine shortest (all <1000 bp), because these are more likely to be retained. Only one, intron 40, was retained in mRNA; sequencing revealed insertion of a complete intron 40 (851 nt) between exons 40 and 41. The intron 40 retention product accounted for 1.2% of the total product but had a premature stop codon at the fifth intronic codon. Intron 40 retention was most strongly observed in the kidney (36.6%) and was not obtained from the fetal liver, lung, spleen or placenta. This indicated that intron retention is a tissue-specific event whose level varies among tissues. In two DMD patients, intron 40 retention was observed in one patient but not in the other. Examination of splicing regulatory factors revealed that intron 40 had the highest guanine-cytosine content of all examined introns in a 30-nt segment at its 3' end. Further studies are needed to clarify the biological role of intron 40-retained dystrophin mRNA.
Collapse
Affiliation(s)
- Atsushi Nishida
- Department of Medical Rehabilitation, Faculty of Rehabilitation, Kobegakuin University, Kobe, Japan
| | - Maki Minegishi
- Department of Clinical Pharmacy, Kobe Pharmaceutical University, Kobe, Japan
| | - Atsuko Takeuchi
- Department of Clinical Pharmacy, Kobe Pharmaceutical University, Kobe, Japan
| | - Emma Tabe Eko Niba
- Department of Medical Rehabilitation, Faculty of Rehabilitation, Kobegakuin University, Kobe, Japan
| | - Hiroyuki Awano
- Department of Pediatrics, Graduate School of Medicine, Kobe University, Kobe, Japan
| | - Tomoko Lee
- Department of Pediatrics, Hyogo College of Medicine, Nishinomiya, Japan
| | - Kazumoto Iijima
- Department of Pediatrics, Graduate School of Medicine, Kobe University, Kobe, Japan
| | | | - Masafumi Matsuo
- Department of Medical Rehabilitation, Faculty of Rehabilitation, Kobegakuin University, Kobe, Japan
| |
Collapse
|
7
|
Chakravarty D, Chakraborti S, Chakrabarti P. Flexibility in the N-terminal actin-binding domain: clues from in silico mutations and molecular dynamics. Proteins 2015; 83:696-710. [PMID: 25620004 DOI: 10.1002/prot.24767] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2014] [Revised: 12/31/2014] [Accepted: 01/10/2015] [Indexed: 01/01/2023]
Abstract
Dystrophin is a long, rod-shaped cytoskeleton protein implicated in muscular dystrophy (MDys). Utrophin is the closest autosomal homolog of dystrophin. Both proteins have N-terminal actin-binding domain (N-ABD), a central rod domain and C-terminal region. N-ABD, composed of two calponin homology (CH) subdomains joined by a helical linker, harbors a few disease causing missense mutations. Although the two proteins share considerable homology (>72%) in N-ABD, recent structural and biochemical studies have shown that there are significant differences (including stability, mode of actin-binding) and their functions are not completely interchangeable. In this investigation, we have used extensive molecular dynamics simulations to understand the differences and the similarities of these two proteins, along with another actin-binding protein, fimbrin. In silico mutations were performed to identify two key residues that might be responsible for the dynamical difference between the molecules. Simulation points to the inherent flexibility of the linker region, which adapts different conformations in the wild type dystrophin. Mutations T220V and G130D in dystrophin constrain the flexibility of the central helical region, while in the two known disease-causing mutants, K18N and L54R, the helicity of the region is compromised. Phylogenetic tree and sequence analysis revealed that dystrophin and utrophin genes have probably originated from the same ancestor. The investigation would provide insight into the functional diversity of two closely related proteins and fimbrin, and contribute to our understanding of the mechanism of MDys.
Collapse
Affiliation(s)
- Devlina Chakravarty
- Department of Biochemistry, Bose Institute, Kolkata, West Bengal, 700054, India
| | | | | |
Collapse
|
8
|
Echigoya Y, Mouly V, Garcia L, Yokota T, Duddy W. In silico screening based on predictive algorithms as a design tool for exon skipping oligonucleotides in Duchenne muscular dystrophy. PLoS One 2015; 10:e0120058. [PMID: 25816009 PMCID: PMC4376395 DOI: 10.1371/journal.pone.0120058] [Citation(s) in RCA: 38] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2014] [Accepted: 02/03/2015] [Indexed: 12/27/2022] Open
Abstract
The use of antisense 'splice-switching' oligonucleotides to induce exon skipping represents a potential therapeutic approach to various human genetic diseases. It has achieved greatest maturity in exon skipping of the dystrophin transcript in Duchenne muscular dystrophy (DMD), for which several clinical trials are completed or ongoing, and a large body of data exists describing tested oligonucleotides and their efficacy. The rational design of an exon skipping oligonucleotide involves the choice of an antisense sequence, usually between 15 and 32 nucleotides, targeting the exon that is to be skipped. Although parameters describing the target site can be computationally estimated and several have been identified to correlate with efficacy, methods to predict efficacy are limited. Here, an in silico pre-screening approach is proposed, based on predictive statistical modelling. Previous DMD data were compiled together and, for each oligonucleotide, some 60 descriptors were considered. Statistical modelling approaches were applied to derive algorithms that predict exon skipping for a given target site. We confirmed (1) the binding energetics of the oligonucleotide to the RNA, and (2) the distance in bases of the target site from the splice acceptor site, as the two most predictive parameters, and we included these and several other parameters (while discounting many) into an in silico screening process, based on their capacity to predict high or low efficacy in either phosphorodiamidate morpholino oligomers (89% correctly predicted) and/or 2'O Methyl RNA oligonucleotides (76% correctly predicted). Predictions correlated strongly with in vitro testing for sixteen de novo PMO sequences targeting various positions on DMD exons 44 (R² 0.89) and 53 (R² 0.89), one of which represents a potential novel candidate for clinical trials. We provide these algorithms together with a computational tool that facilitates screening to predict exon skipping efficacy at each position of a target exon.
Collapse
Affiliation(s)
- Yusuke Echigoya
- University of Alberta, Faculty of Medicine and Dentistry, Department of Medical Genetics, Edmonton, Alberta, Canada
| | - Vincent Mouly
- UPMC-Sorbonne Universités-Univ. Paris 6, UPMC/INSERM UMRS974, CNRS FRE 3617, Center of Research in Myology, Paris, 75651 cedex 13, France
| | - Luis Garcia
- UFR des Sciences de la Santé, Université de Versailles Saint-Quentin-en-Yvelines, 78180 Montigny-le-Bretonneux, France
| | - Toshifumi Yokota
- University of Alberta, Faculty of Medicine and Dentistry, Department of Medical Genetics, Edmonton, Alberta, Canada; Muscular Dystrophy Canada Research Chair, University of Alberta, Edmonton, Alberta, Canada
| | - William Duddy
- UPMC-Sorbonne Universités-Univ. Paris 6, UPMC/INSERM UMRS974, CNRS FRE 3617, Center of Research in Myology, Paris, 75651 cedex 13, France
| |
Collapse
|
9
|
A novel splicing silencer generated by DMD exon 45 deletion junction could explain upstream exon 44 skipping that modifies dystrophinopathy. J Hum Genet 2014; 59:423-9. [PMID: 24871807 DOI: 10.1038/jhg.2014.36] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2013] [Revised: 03/30/2014] [Accepted: 04/18/2014] [Indexed: 12/22/2022]
Abstract
Duchenne muscular dystrophy (DMD), a progressive muscle-wasting disease, is mostly caused by exon deletion mutations in the DMD gene. The reading frame rule explains that out-of-frame deletions lead to muscle dystrophin deficiency in DMD. In outliers to this rule, deletion junction sequences have never previously been explored as splicing modulators. In a Japanese case, we identified a single exon 45 deletion in the patient's DMD gene, indicating out-of-frame mutation. However, immunohistochemical examination disclosed weak dystrophin signals in his muscle. Reverse transcription-PCR amplification of DMD exons 42 to 47 revealed a major normally spliced product with exon 45 deletion and an additional in-frame product with deletion of both exons 44 and 45, indicating upstream exon 44 skipping. We considered the latter to underlie the observed dystrophin expression. Remarkably, the junction sequence cloned by PCR walking abolished the splicing enhancer activity of the upstream intron in a chimeric doublesex gene pre-mRNA in vitro splicing. Furthermore, antisense oligonucleotides directed against the junction site counteracted this effect. These indicated that the junction sequence was a splicing silencer that induced upstream exon 44 skipping. It was strongly suggested that creation of splicing regulator is a modifier of dystrophinopathy.
Collapse
|
10
|
Use of in silico tools for classification of novel missense mutations identified in dystrophin gene in developing countries. Gene 2014; 535:250-4. [DOI: 10.1016/j.gene.2013.11.022] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2013] [Revised: 10/20/2013] [Accepted: 11/06/2013] [Indexed: 01/26/2023]
|
11
|
Abstract
The first mutation that disrupts BRCA2 mRNA by including a novel, cryptic exon is reported in this issue. The mutation lies deep within an intron and would not have been detected by conventional screening methods. In the future, more mutations may be discovered by direct mRNA analysis.
Collapse
Affiliation(s)
- James D Fackenthal
- Center for Clinical Cancer Genetics and Global Health, The University of Chicago Medical Center, Chicago, IL 60637, USA
| | | | | |
Collapse
|