1
|
Cheng Y, Xu SM, Santucci K, Lindner G, Janitz M. Machine learning and related approaches in transcriptomics. Biochem Biophys Res Commun 2024; 724:150225. [PMID: 38852503 DOI: 10.1016/j.bbrc.2024.150225] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2024] [Revised: 05/18/2024] [Accepted: 06/03/2024] [Indexed: 06/11/2024]
Abstract
Data acquisition for transcriptomic studies used to be the bottleneck in the transcriptomic analytical pipeline. However, recent developments in transcriptome profiling technologies have increased researchers' ability to obtain data, resulting in a shift in focus to data analysis. Incorporating machine learning to traditional analytical methods allows the possibility of handling larger volumes of complex data more efficiently. Many bioinformaticians, especially those unfamiliar with ML in the study of human transcriptomics and complex biological systems, face a significant barrier stemming from their limited awareness of the current landscape of ML utilisation in this field. To address this gap, this review endeavours to introduce those individuals to the general types of ML, followed by a comprehensive range of more specific techniques, demonstrated through examples of their incorporation into analytical pipelines for human transcriptome investigations. Important computational aspects such as data pre-processing, task formulation, results (performance of ML models), and validation methods are encompassed. In hope of better practical relevance, there is a strong focus on studies published within the last five years, almost exclusively examining human transcriptomes, with outcomes compared with standard non-ML tools.
Collapse
Affiliation(s)
- Yuning Cheng
- School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, NSW, 2052, Australia
| | - Si-Mei Xu
- School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, NSW, 2052, Australia
| | - Kristina Santucci
- School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, NSW, 2052, Australia
| | - Grace Lindner
- School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, NSW, 2052, Australia
| | - Michael Janitz
- School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, NSW, 2052, Australia.
| |
Collapse
|
2
|
Shen A, Hencel K, Parker MT, Scott R, Skukan R, Adesina AS, Metheringham CL, Miska EA, Nam Y, Haerty W, Simpson GG, Akay A. U6 snRNA m6A modification is required for accurate and efficient splicing of C. elegans and human pre-mRNAs. Nucleic Acids Res 2024:gkae447. [PMID: 38808663 DOI: 10.1093/nar/gkae447] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2023] [Revised: 05/08/2024] [Accepted: 05/13/2024] [Indexed: 05/30/2024] Open
Abstract
pre-mRNA splicing is a critical feature of eukaryotic gene expression. Both cis- and trans-splicing rely on accurately recognising splice site sequences by spliceosomal U snRNAs and associated proteins. Spliceosomal snRNAs carry multiple RNA modifications with the potential to affect different stages of pre-mRNA splicing. Here, we show that the conserved U6 snRNA m6A methyltransferase METT-10 is required for accurate and efficient cis- and trans-splicing of C. elegans pre-mRNAs. The absence of METT-10 in C. elegans and METTL16 in humans primarily leads to alternative splicing at 5' splice sites with an adenosine at +4 position. In addition, METT-10 is required for splicing of weak 3' cis- and trans-splice sites. We identified a significant overlap between METT-10 and the conserved splicing factor SNRNP27K in regulating 5' splice sites with +4A. Finally, we show that editing endogenous 5' splice site +4A positions to +4U restores splicing to wild-type positions in a mett-10 mutant background, supporting a direct role for U6 snRNA m6A modification in 5' splice site recognition. We conclude that the U6 snRNA m6A modification is important for accurate and efficient pre-mRNA splicing.
Collapse
Affiliation(s)
- Aykut Shen
- School of Biological Sciences, University of East Anglia, NR4 7TJ Norwich, UK
| | - Katarzyna Hencel
- School of Biological Sciences, University of East Anglia, NR4 7TJ Norwich, UK
| | - Matthew T Parker
- School of Life Sciences, University of Dundee, Dow Street, Dundee DD1 5EH, UK
| | - Robyn Scott
- Department of Biochemistry, University of Texas Southwestern Medical Center, Dallas, TX, USA
| | - Roberta Skukan
- School of Biological Sciences, University of East Anglia, NR4 7TJ Norwich, UK
| | | | | | - Eric A Miska
- Wellcome/CRUK Gurdon Institute, University of Cambridge, Tennis Court Rd, Cambridge CB2 1QN, UK
| | - Yunsun Nam
- Department of Biochemistry, University of Texas Southwestern Medical Center, Dallas, TX, USA
- Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, TX, USA
- Simmons Comprehensive Cancer Center, University of Texas Southwestern Medical Center, Dallas, TX, USA
| | - Wilfried Haerty
- School of Biological Sciences, University of East Anglia, NR4 7TJ Norwich, UK
- Earlham Institute, Norwich Research Park, Norwich, UK
| | - Gordon G Simpson
- School of Life Sciences, University of Dundee, Dow Street, Dundee DD1 5EH, UK
- Cell & Molecular Sciences, James Hutton Institute, Invergowrie, DD2 5DA, UK
| | - Alper Akay
- School of Biological Sciences, University of East Anglia, NR4 7TJ Norwich, UK
| |
Collapse
|
3
|
Tao S, Hou Y, Diao L, Hu Y, Xu W, Xie S, Xiao Z. Long noncoding RNA study: Genome-wide approaches. Genes Dis 2023; 10:2491-2510. [PMID: 37554208 PMCID: PMC10404890 DOI: 10.1016/j.gendis.2022.10.024] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2022] [Revised: 10/09/2022] [Accepted: 10/23/2022] [Indexed: 11/30/2022] Open
Abstract
Long noncoding RNAs (lncRNAs) have been confirmed to play a crucial role in various biological processes across several species. Though many efforts have been devoted to the expansion of the lncRNAs landscape, much about lncRNAs is still unknown due to their great complexity. The development of high-throughput technologies and the constantly improved bioinformatic methods have resulted in a rapid expansion of lncRNA research and relevant databases. In this review, we introduced genome-wide research of lncRNAs in three parts: (i) novel lncRNA identification by high-throughput sequencing and computational pipelines; (ii) functional characterization of lncRNAs by expression atlas profiling, genome-scale screening, and the research of cancer-related lncRNAs; (iii) mechanism research by large-scale experimental technologies and computational analysis. Besides, primary experimental methods and bioinformatic pipelines related to these three parts are summarized. This review aimed to provide a comprehensive and systemic overview of lncRNA genome-wide research strategies and indicate a genome-wide lncRNA research system.
Collapse
Affiliation(s)
- Shuang Tao
- The Biotherapy Center, The Third Affiliated Hospital of Sun Yat-sen University, Guangzhou, Guangdong 510630, China
| | - Yarui Hou
- The Biotherapy Center, The Third Affiliated Hospital of Sun Yat-sen University, Guangzhou, Guangdong 510630, China
| | - Liting Diao
- The Biotherapy Center, The Third Affiliated Hospital of Sun Yat-sen University, Guangzhou, Guangdong 510630, China
| | - Yanxia Hu
- The Biotherapy Center, The Third Affiliated Hospital of Sun Yat-sen University, Guangzhou, Guangdong 510630, China
| | - Wanyi Xu
- The Biotherapy Center, The Third Affiliated Hospital of Sun Yat-sen University, Guangzhou, Guangdong 510630, China
| | - Shujuan Xie
- The Biotherapy Center, The Third Affiliated Hospital of Sun Yat-sen University, Guangzhou, Guangdong 510630, China
- Institute of Vaccine, The Third Affiliated Hospital of Sun Yat-sen University, Guangzhou, Guangdong 510630, China
| | - Zhendong Xiao
- The Biotherapy Center, The Third Affiliated Hospital of Sun Yat-sen University, Guangzhou, Guangdong 510630, China
| |
Collapse
|
4
|
Shen A, Hencel K, Parker MT, Scott R, Skukan R, Adesina AS, Metheringham CL, Miska EA, Nam Y, Haerty W, Simpson GG, Akay A. U6 snRNA m6A modification is required for accurate and efficient cis- and trans-splicing of C. elegans mRNAs. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.09.16.558044. [PMID: 37745402 PMCID: PMC10516052 DOI: 10.1101/2023.09.16.558044] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/26/2023]
Abstract
pre-mRNA splicing is a critical feature of eukaryotic gene expression. Many eukaryotes use cis-splicing to remove intronic sequences from pre-mRNAs. In addition to cis-splicing, many organisms use trans-splicing to replace the 5' ends of mRNAs with a non-coding spliced-leader RNA. Both cis- and trans-splicing rely on accurately recognising splice site sequences by spliceosomal U snRNAs and associated proteins. Spliceosomal snRNAs carry multiple RNA modifications with the potential to affect different stages of pre-mRNA splicing. Here, we show that m6A modification of U6 snRNA A43 by the RNA methyltransferase METT-10 is required for accurate and efficient cis- and trans-splicing of C. elegans pre-mRNAs. The absence of U6 snRNA m6A modification primarily leads to alternative splicing at 5' splice sites. Furthermore, weaker 5' splice site recognition by the unmodified U6 snRNA A43 affects splicing at 3' splice sites. U6 snRNA m6A43 and the splicing factor SNRNP27K function to recognise an overlapping set of 5' splice sites with an adenosine at +4 position. Finally, we show that U6 snRNA m6A43 is required for efficient SL trans-splicing at weak 3' trans-splice sites. We conclude that the U6 snRNA m6A modification is important for accurate and efficient cis- and trans-splicing in C. elegans.
Collapse
Affiliation(s)
- Aykut Shen
- School of Biological Sciences, University of East Anglia, NR4 7TJ, Norwich
| | - Katarzyna Hencel
- School of Biological Sciences, University of East Anglia, NR4 7TJ, Norwich
- These authors contributed equally
| | - Matthew T Parker
- School of Life Sciences, University of Dundee, Dow Street, Dundee, DD1 5EH, UK
- These authors contributed equally
| | - Robyn Scott
- Department of Biochemistry, University of Texas Southwestern Medical Center, Dallas, TX, USA
| | - Roberta Skukan
- School of Biological Sciences, University of East Anglia, NR4 7TJ, Norwich
| | | | | | - Eric A Miska
- Wellcome/CRUK Gurdon Institute, University of Cambridge, Tennis Court Rd, Cambridge, CB2 1QN, UK
| | - Yunsun Nam
- Department of Biochemistry, University of Texas Southwestern Medical Center, Dallas, TX, USA
- Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, TX, USA
- Simmons Comprehensive Cancer Center, University of Texas Southwestern Medical Center, Dallas, TX, USA
| | - Wilfried Haerty
- School of Biological Sciences, University of East Anglia, NR4 7TJ, Norwich
- Earlham Institute, Norwich Research Park, Norwich, UK
| | - Gordon G Simpson
- School of Life Sciences, University of Dundee, Dow Street, Dundee, DD1 5EH, UK
- Cell & Molecular Sciences, James Hutton Institute, Invergowrie, DD2 5DA, UK
| | - Alper Akay
- School of Biological Sciences, University of East Anglia, NR4 7TJ, Norwich
| |
Collapse
|
5
|
Xie Y, Chan PL, Kwan HS, Chang J. The Genome-Wide Characterization of Alternative Splicing and RNA Editing in the Development of Coprinopsis cinerea. J Fungi (Basel) 2023; 9:915. [PMID: 37755023 PMCID: PMC10532568 DOI: 10.3390/jof9090915] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2023] [Revised: 08/17/2023] [Accepted: 09/07/2023] [Indexed: 09/28/2023] Open
Abstract
Coprinopsis cinerea is one of the model species used in fungal developmental studies. This mushroom-forming Basidiomycetes fungus has several developmental destinies in response to changing environments, with dynamic developmental regulations of the organism. Although the gene expression in C. cinerea development has already been profiled broadly, previous studies have only focused on a specific stage or process of fungal development. A comprehensive perspective across different developmental paths is lacking, and a global view on the dynamic transcriptional regulations in the life cycle and the developmental paths is far from complete. In addition, knowledge on co- and post-transcriptional modifications in this fungus remains rare. In this study, we investigated the transcriptional changes and modifications in C. cinerea during the processes of spore germination, vegetative growth, oidiation, sclerotia formation, and fruiting body formation by inducing different developmental paths of the organism and profiling the transcriptomes using the high-throughput sequencing method. Transition in the identity and abundance of expressed genes drive the physiological and morphological alterations of the organism, including metabolism and multicellularity construction. Moreover, stage- and tissue-specific alternative splicing and RNA editing took place and functioned in C. cinerea. These modifications were negatively correlated to the conservation features of genes and could provide extra plasticity to the transcriptome during fungal development. We suggest that C. cinerea applies different molecular strategies in its developmental regulation, including shifts in expressed gene sets, diversifications of genetic information, and reversible diversifications of RNA molecules. Such features would increase the fungal adaptability in the rapidly changing environment, especially in the transition of developmental programs and the maintenance and balance of genetic and transcriptomic divergence. The multi-layer regulatory network of gene expression serves as the molecular basis of the functioning of developmental regulation.
Collapse
Affiliation(s)
- Yichun Xie
- State Key Laboratory of Agrobiotechnology, Food Research Center, School of Life Sciences, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong SAR, China;
| | - Po-Lam Chan
- Food Research Center, School of Life Sciences, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong SAR, China
| | - Hoi-Shan Kwan
- Food Research Center, School of Life Sciences, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong SAR, China
| | - Jinhui Chang
- Department of Food Science and Nutrition, and Research Institute for Future Food, The Hong Kong Polytechnic University, Hong Kong SAR, China
| |
Collapse
|
6
|
Gao Y, Wang F, Wang R, Kutschera E, Xu Y, Xie S, Wang Y, Kadash-Edmondson KE, Lin L, Xing Y. ESPRESSO: Robust discovery and quantification of transcript isoforms from error-prone long-read RNA-seq data. SCIENCE ADVANCES 2023; 9:eabq5072. [PMID: 36662851 PMCID: PMC9858503 DOI: 10.1126/sciadv.abq5072] [Citation(s) in RCA: 21] [Impact Index Per Article: 21.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/13/2022] [Accepted: 12/16/2022] [Indexed: 05/20/2023]
Abstract
Long-read RNA sequencing (RNA-seq) holds great potential for characterizing transcriptome variation and full-length transcript isoforms, but the relatively high error rate of current long-read sequencing platforms poses a major challenge. We present ESPRESSO, a computational tool for robust discovery and quantification of transcript isoforms from error-prone long reads. ESPRESSO jointly considers alignments of all long reads aligned to a gene and uses error profiles of individual reads to improve the identification of splice junctions and the discovery of their corresponding transcript isoforms. On both a synthetic spike-in RNA sample and human RNA samples, ESPRESSO outperforms multiple contemporary tools in not only transcript isoform discovery but also transcript isoform quantification. In total, we generated and analyzed ~1.1 billion nanopore RNA-seq reads covering 30 human tissue samples and three human cell lines. ESPRESSO and its companion dataset provide a useful resource for studying the RNA repertoire of eukaryotic transcriptomes.
Collapse
Affiliation(s)
- Yuan Gao
- Center for Computational and Genomic Medicine, The Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Feng Wang
- Center for Computational and Genomic Medicine, The Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Robert Wang
- Center for Computational and Genomic Medicine, The Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Genomics and Computational Biology Graduate Program, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Eric Kutschera
- Center for Computational and Genomic Medicine, The Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Yang Xu
- Center for Computational and Genomic Medicine, The Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Genomics and Computational Biology Graduate Program, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Stephan Xie
- Center for Computational and Genomic Medicine, The Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Yuanyuan Wang
- Center for Computational and Genomic Medicine, The Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Kathryn E. Kadash-Edmondson
- Center for Computational and Genomic Medicine, The Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Lan Lin
- Department of Pathology and Laboratory Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, The Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Yi Xing
- Center for Computational and Genomic Medicine, The Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Department of Pathology and Laboratory Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
- Department of Biomedical and Health Informatics, The Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
| |
Collapse
|
7
|
Uzoechi SC, Rosa BA, Singh KS, Choi YJ, Bracken BK, Brindley PJ, Townsend RR, Sprung R, Zhan B, Bottazzi ME, Hawdon JM, Wong Y, Loukas A, Djuranovic S, Mitreva M. Excretory/Secretory Proteome of Females and Males of the Hookworm Ancylostoma ceylanicum. Pathogens 2023; 12:95. [PMID: 36678443 PMCID: PMC9865600 DOI: 10.3390/pathogens12010095] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2022] [Revised: 12/20/2022] [Accepted: 01/04/2023] [Indexed: 01/09/2023] Open
Abstract
The dynamic host-parasite mechanisms underlying hookworm infection establishment and maintenance in mammalian hosts remain poorly understood but are primarily mediated by hookworm's excretory/secretory products (ESPs), which have a wide spectrum of biological functions. We used ultra-high performance mass spectrometry to comprehensively profile and compare female and male ESPs from the zoonotic human hookworm Ancylostoma ceylanicum, which is a natural parasite of dogs, cats, and humans. We improved the genome annotation, decreasing the number of protein-coding genes by 49% while improving completeness from 92 to 96%. Compared to the previous genome annotation, we detected 11% and 10% more spectra in female and male ESPs, respectively, using this improved version, identifying a total of 795 ESPs (70% in both sexes, with the remaining sex-specific). Using functional databases (KEGG, GO and Interpro), common and sex-specific enriched functions were identified. Comparisons with the exclusively human-infective hookworm Necator americanus identified species-specific and conserved ESPs. This is the first study identifying ESPs from female and male A. ceylanicum. The findings provide a deeper understanding of hookworm protein functions that assure long-term host survival and facilitate future engineering of transgenic hookworms and analysis of regulatory elements mediating the high-level expression of ESPs. Furthermore, the findings expand the list of potential vaccine and diagnostic targets and identify biologics that can be explored for anti-inflammatory potential.
Collapse
Affiliation(s)
- Samuel C. Uzoechi
- Division of Infectious Diseases, Department of Medicine, Washington University School of Medicine, St. Louis, MO 63110, USA
| | - Bruce A. Rosa
- Division of Infectious Diseases, Department of Medicine, Washington University School of Medicine, St. Louis, MO 63110, USA
| | - Kumar Sachin Singh
- Division of Infectious Diseases, Department of Medicine, Washington University School of Medicine, St. Louis, MO 63110, USA
| | - Young-Jun Choi
- Division of Infectious Diseases, Department of Medicine, Washington University School of Medicine, St. Louis, MO 63110, USA
| | | | - Paul J. Brindley
- Department of Microbiology, Immunology & Tropical Medicine, Research Center for Neglected Diseases of Poverty, School of Medicine and Health Sciences, George Washington University, Washington, DC 20037, USA
| | - R. Reid Townsend
- Division of Endocrinology, Metabolism and Lipid Research, Department of Medicine, Washington University School of Medicine, St. Louis, MO 63110, USA
| | - Robert Sprung
- Division of Endocrinology, Metabolism and Lipid Research, Department of Medicine, Washington University School of Medicine, St. Louis, MO 63110, USA
| | - Bin Zhan
- Department of Pediatric Tropical Medicine, National School of Tropical Medicine, Baylor College of Medicine, Houston, TX 77030, USA
| | - Maria-Elena Bottazzi
- Department of Pediatric Tropical Medicine, National School of Tropical Medicine, Baylor College of Medicine, Houston, TX 77030, USA
| | - John M. Hawdon
- Department of Microbiology, Immunology & Tropical Medicine, Research Center for Neglected Diseases of Poverty, School of Medicine and Health Sciences, George Washington University, Washington, DC 20037, USA
| | - Yide Wong
- Centre for Molecular Therapeutics, Australian Institute of Tropical Health and Medicine, James Cook University, Cairns 4878, Australia
| | - Alex Loukas
- Centre for Molecular Therapeutics, Australian Institute of Tropical Health and Medicine, James Cook University, Cairns 4878, Australia
| | - Sergej Djuranovic
- Department of Cell Biology and Physiology, Internal Medicine, Washington University School of Medicine, St. Louis, MO 63110, USA
| | - Makedonka Mitreva
- Division of Infectious Diseases, Department of Medicine, Washington University School of Medicine, St. Louis, MO 63110, USA
| |
Collapse
|
8
|
Parker MT, Soanes BK, Kusakina J, Larrieu A, Knop K, Joy N, Breidenbach F, Sherwood AV, Barton GJ, Fica SM, Davies BH, Simpson GG. m 6A modification of U6 snRNA modulates usage of two major classes of pre-mRNA 5' splice site. eLife 2022; 11:e78808. [PMID: 36409063 PMCID: PMC9803359 DOI: 10.7554/elife.78808] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2022] [Accepted: 11/20/2022] [Indexed: 11/23/2022] Open
Abstract
Alternative splicing of messenger RNAs is associated with the evolution of developmentally complex eukaryotes. Splicing is mediated by the spliceosome, and docking of the pre-mRNA 5' splice site into the spliceosome active site depends upon pairing with the conserved ACAGA sequence of U6 snRNA. In some species, including humans, the central adenosine of the ACAGA box is modified by N6 methylation, but the role of this m6A modification is poorly understood. Here, we show that m6A modified U6 snRNA determines the accuracy and efficiency of splicing. We reveal that the conserved methyltransferase, FIONA1, is required for Arabidopsis U6 snRNA m6A modification. Arabidopsis fio1 mutants show disrupted patterns of splicing that can be explained by the sequence composition of 5' splice sites and cooperative roles for U5 and U6 snRNA in splice site selection. U6 snRNA m6A influences 3' splice site usage. We generalise these findings to reveal two major classes of 5' splice site in diverse eukaryotes, which display anti-correlated interaction potential with U5 snRNA loop 1 and the U6 snRNA ACAGA box. We conclude that U6 snRNA m6A modification contributes to the selection of degenerate 5' splice sites crucial to alternative splicing.
Collapse
Affiliation(s)
- Matthew T Parker
- School of Life Sciences, University of DundeeDundeeUnited Kingdom
| | - Beth K Soanes
- Centre for Plant Sciences, School of Biology, Faculty of Biological Sciences, University of LeedsLeedsUnited Kingdom
| | - Jelena Kusakina
- Centre for Plant Sciences, School of Biology, Faculty of Biological Sciences, University of LeedsLeedsUnited Kingdom
| | - Antoine Larrieu
- Centre for Plant Sciences, School of Biology, Faculty of Biological Sciences, University of LeedsLeedsUnited Kingdom
| | - Katarzyna Knop
- School of Life Sciences, University of DundeeDundeeUnited Kingdom
| | - Nisha Joy
- School of Life Sciences, University of DundeeDundeeUnited Kingdom
| | - Friedrich Breidenbach
- School of Life Sciences, University of DundeeDundeeUnited Kingdom
- RNA Biology and Molecular Physiology, Faculty of Biology, Bielefeld UniversityBielefeldGermany
| | - Anna V Sherwood
- School of Life Sciences, University of DundeeDundeeUnited Kingdom
| | | | - Sebastian M Fica
- Department of Biochemistry, University of OxfordOxfordUnited Kingdom
| | - Brendan H Davies
- Centre for Plant Sciences, School of Biology, Faculty of Biological Sciences, University of LeedsLeedsUnited Kingdom
| | - Gordon G Simpson
- School of Life Sciences, University of DundeeDundeeUnited Kingdom
- Cell & Molecular Sciences, James Hutton InstituteInvergowrieUnited Kingdom
| |
Collapse
|
9
|
You Y, Clark MB, Shim H. NanoSplicer: Accurate identification of splice junctions using Oxford Nanopore sequencing. Bioinformatics 2022; 38:3741-3748. [PMID: 35639973 PMCID: PMC9344838 DOI: 10.1093/bioinformatics/btac359] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2021] [Revised: 04/02/2022] [Accepted: 05/24/2022] [Indexed: 11/30/2022] Open
Abstract
Motivation Long-read sequencing methods have considerable advantages for characterizing RNA isoforms. Oxford Nanopore sequencing records changes in electrical current when nucleic acid traverses through a pore. However, basecalling of this raw signal (known as a squiggle) is error prone, making it challenging to accurately identify splice junctions. Existing strategies include utilizing matched short-read data and/or annotated splice junctions to correct nanopore reads but add expense or limit junctions to known (incomplete) annotations. Therefore, a method that could accurately identify splice junctions solely from nanopore data would have numerous advantages. Results We developed ‘NanoSplicer’ to identify splice junctions using raw nanopore signal (squiggles). For each splice junction, the observed squiggle is compared to candidate squiggles representing potential junctions to identify the correct candidate. Measuring squiggle similarity enables us to compute the probability of each candidate junction and find the most likely one. We tested our method using (i) synthetic mRNAs with known splice junctions and (ii) biological mRNAs from a lung-cancer cell-line. The results from both datasets demonstrate NanoSplicer improves splice junction identification, especially when the basecalling error rate near the splice junction is elevated. Availability and implementation NanoSplicer is available at https://github.com/shimlab/NanoSplicer and archived at https://doi.org/10.5281/zenodo.6403849. Data is available from ENA: ERS7273757 and ERS7273453. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yupei You
- School of Mathematics and Statistics/Melbourne Integrative Genomics, The University of Melbourne, Melbourne, VIC, 3010, Australia
| | - Michael B Clark
- Centre for Stem Cell Systems, Department of Anatomy and Physiology, The University of Melbourne, Melbourne, VIC, 3010, Australia
| | - Heejung Shim
- School of Mathematics and Statistics/Melbourne Integrative Genomics, The University of Melbourne, Melbourne, VIC, 3010, Australia
| |
Collapse
|
10
|
Mikheenko A, Prjibelski AD, Joglekar A, Tilgner HU. Sequencing of individual barcoded cDNAs using Pacific Biosciences and Oxford Nanopore technologies reveals platform-specific error patterns. Genome Res 2022; 32:726-737. [PMID: 35301264 PMCID: PMC8997348 DOI: 10.1101/gr.276405.121] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2021] [Accepted: 03/05/2022] [Indexed: 12/04/2022]
Abstract
Long-read transcriptomics require understanding error sources inherent to technologies. Current approaches cannot compare methods for an individual RNA molecule. Here, we present a novel platform-comparison method that combines barcoding strategies and long-read sequencing to sequence cDNA copies representing an individual RNA molecule on both Pacific Biosciences (PacBio) and Oxford Nanopore Technologies (ONT). We compare these long-read pairs in terms of sequence content and isoform patterns. Although individual read pairs show high similarity, we find differences in (1) aligned length, (2) transcription start site (TSS), (3) polyadenylation site (poly(A)-site) assignment, and (4) exon–intron structures. Overall, 25% of read pairs disagree on either TSS, poly(A)-site, or splice site. Intron-chain disagreement typically arises from alignment errors of microexons and complicated splice sites. Our single-molecule technology comparison reveals that inconsistencies are often caused by sequencing error–induced inaccurate ONT alignments, especially to downstream GUNNGU donor motifs. However, annotation-disagreeing upstream shifts in NAGNAG acceptors in ONT are often confirmed by PacBio and are thus likely real. In both barcoded and nonbarcoded ONT reads, we find that intron number and proximity of GU/AGs better predict inconsistencies with the annotation than read quality alone. We summarize these findings in an annotation-based algorithm for spliced alignment correction that improves subsequent transcript construction with ONT reads.
Collapse
|
11
|
Sahlin K, Mäkinen V. Accurate spliced alignment of long RNA sequencing reads. Bioinformatics 2021; 37:4643-4651. [PMID: 34302453 PMCID: PMC8665758 DOI: 10.1093/bioinformatics/btab540] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2021] [Revised: 06/29/2021] [Accepted: 07/20/2021] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Long-read RNA sequencing technologies are establishing themselves as the primary techniques to detect novel isoforms, and many such analyses are dependent on read alignments. However, the error rate and sequencing length of the reads create new challenges for accurately aligning them, particularly around small exons. RESULTS We present an alignment method uLTRA for long RNA sequencing reads based on a novel two-pass collinear chaining algorithm. We show that uLTRA produces higher accuracy over state-of-the-art aligners with substantially higher accuracy for small exons on simulated and synthetic data. On simulated data, uLTRA achieves an accuracy of about 60% for exons of length 10 nucleotides or smaller and close to 90% accuracy for exons of length between 11 to 20 nucleotides. On biological data where true read location is unknown, we show several examples where uLTRA aligns to known and novel isoforms containing small exons that are not detected with other aligners. While uLTRA obtains its accuracy using annotations, it can also be used as a wrapper around minimap2 to align reads outside annotated regions. AVAILABILITY uLTRA is available at https://github.com/ksahlin/ultra. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Kristoffer Sahlin
- Department of Mathematics, Science for Life Laboratory, Stockholm University, Stockholm, 106 91, Sweden
| | - Veli Mäkinen
- Department of Computer Science, University of Helsinki, P. O. Box 68, Pietari Kalmin katu 5, 00014, Finland
| |
Collapse
|
12
|
Parker MT, Knop K, Zacharaki V, Sherwood AV, Tomé D, Yu X, Martin PGP, Beynon J, Michaels SD, Barton GJ, Simpson GG. Widespread premature transcription termination of Arabidopsis thaliana NLR genes by the spen protein FPA. eLife 2021; 10:e65537. [PMID: 33904405 PMCID: PMC8116057 DOI: 10.7554/elife.65537] [Citation(s) in RCA: 28] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2020] [Accepted: 04/26/2021] [Indexed: 12/18/2022] Open
Abstract
Genes involved in disease resistance are some of the fastest evolving and most diverse components of genomes. Large numbers of nucleotide-binding, leucine-rich repeat (NLR) genes are found in plant genomes and are required for disease resistance. However, NLRs can trigger autoimmunity, disrupt beneficial microbiota or reduce fitness. It is therefore crucial to understand how NLRs are controlled. Here, we show that the RNA-binding protein FPA mediates widespread premature cleavage and polyadenylation of NLR transcripts, thereby controlling their functional expression and impacting immunity. Using long-read Nanopore direct RNA sequencing, we resolved the complexity of NLR transcript processing and gene annotation. Our results uncover a co-transcriptional layer of NLR control with implications for understanding the regulatory and evolutionary dynamics of NLRs in the immune responses of plants.
Collapse
Affiliation(s)
- Matthew T Parker
- School of Life Sciences, University of DundeeDundeeUnited Kingdom
| | - Katarzyna Knop
- School of Life Sciences, University of DundeeDundeeUnited Kingdom
| | | | - Anna V Sherwood
- School of Life Sciences, University of DundeeDundeeUnited Kingdom
| | - Daniel Tomé
- School of Life Sciences, University of WarwickCoventryUnited Kingdom
| | - Xuhong Yu
- Department of Biology, Indiana UniversityBloomingtonUnited States
| | - Pascal GP Martin
- Department of Biology, Indiana UniversityBloomingtonUnited States
| | - Jim Beynon
- School of Life Sciences, University of WarwickCoventryUnited Kingdom
| | - Scott D Michaels
- Department of Biology, Indiana UniversityBloomingtonUnited States
| | | | - Gordon G Simpson
- School of Life Sciences, University of DundeeDundeeUnited Kingdom
- The James Hutton InstituteInvergowrieUnited Kingdom
| |
Collapse
|
13
|
Lim CS, Sozzi V, Littlejohn M, Yuen LK, Warner N, Betz-Stablein B, Luciani F, Revill PA, Brown CM. Quantitative analysis of the splice variants expressed by the major hepatitis B virus genotypes. Microb Genom 2021; 7:mgen000492. [PMID: 33439114 PMCID: PMC8115900 DOI: 10.1099/mgen.0.000492] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2020] [Accepted: 11/23/2020] [Indexed: 12/13/2022] Open
Abstract
Hepatitis B virus (HBV) is a major human pathogen that causes liver diseases. The main HBV RNAs are unspliced transcripts that encode the key viral proteins. Recent studies have shown that some of the HBV spliced transcript isoforms are predictive of liver cancer, yet the roles of these spliced transcripts remain elusive. Furthermore, there are nine major HBV genotypes common in different regions of the world, these genotypes may express different spliced transcript isoforms. To systematically study the HBV splice variants, we transfected human hepatoma cells, Huh7, with four HBV genotypes (A2, B2, C2 and D3), followed by deep RNA-sequencing. We found that 13-28 % of HBV RNAs were splice variants, which were reproducibly detected across independent biological replicates. These comprised 6 novel and 10 previously identified splice variants. In particular, a novel, singly spliced transcript was detected in genotypes A2 and D3 at high levels. The biological relevance of these splice variants was supported by their identification in HBV-positive liver biopsy and serum samples, and in HBV-infected primary human hepatocytes. Interestingly the levels of HBV splice variants varied across the genotypes, but the spliced pregenomic RNA SP1 and SP9 were the two most abundant splice variants. Counterintuitively, these singly spliced SP1 and SP9 variants had a suboptimal 5' splice site, supporting the idea that splicing of HBV RNAs is tightly controlled by the viral post-transcriptional regulatory RNA element.
Collapse
Affiliation(s)
- Chun Shen Lim
- Department of Biochemistry, School of Biomedical Sciences, University of Otago, Dunedin, New Zealand
| | - Vitina Sozzi
- Victorian Infectious Diseases Reference Laboratory, Royal Melbourne Hospital at the Peter Doherty Institute for Infection and Immunity, Melbourne, Victoria, Australia
| | - Margaret Littlejohn
- Victorian Infectious Diseases Reference Laboratory, Royal Melbourne Hospital at the Peter Doherty Institute for Infection and Immunity, Melbourne, Victoria, Australia
| | - Lilly K.W. Yuen
- Victorian Infectious Diseases Reference Laboratory, Royal Melbourne Hospital at the Peter Doherty Institute for Infection and Immunity, Melbourne, Victoria, Australia
| | - Nadia Warner
- Victorian Infectious Diseases Reference Laboratory, Royal Melbourne Hospital at the Peter Doherty Institute for Infection and Immunity, Melbourne, Victoria, Australia
| | - Brigid Betz-Stablein
- Systems Medicine, School of Medical Sciences, Faculty of Medicine, University of New South Wales, Sydney, New South Wales, Australia
- Present address: Dermatology Research Centre, Diamantina Institute, University of Queensland, Brisbane, Queensland, Australia
| | - Fabio Luciani
- Systems Medicine, School of Medical Sciences, Faculty of Medicine, University of New South Wales, Sydney, New South Wales, Australia
| | - Peter A. Revill
- Victorian Infectious Diseases Reference Laboratory, Royal Melbourne Hospital at the Peter Doherty Institute for Infection and Immunity, Melbourne, Victoria, Australia
- Department of Microbiology and Immunology, University of Melbourne, Melbourne, Victoria, Australia
| | - Chris M. Brown
- Department of Biochemistry, School of Biomedical Sciences, University of Otago, Dunedin, New Zealand
| |
Collapse
|