1
|
Anczukow O, Allain FHT, Angarola BL, Black DL, Brooks AN, Cheng C, Conesa A, Crosse EI, Eyras E, Guccione E, Lu SX, Neugebauer KM, Sehgal P, Song X, Tothova Z, Valcárcel J, Weeks KM, Yeo GW, Thomas-Tikhonenko A. Steering research on mRNA splicing in cancer towards clinical translation. Nat Rev Cancer 2024; 24:887-905. [PMID: 39384951 DOI: 10.1038/s41568-024-00750-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 08/27/2024] [Indexed: 10/11/2024]
Abstract
Splicing factors are affected by recurrent somatic mutations and copy number variations in several types of haematologic and solid malignancies, which is often seen as prima facie evidence that splicing aberrations can drive cancer initiation and progression. However, numerous spliceosome components also 'moonlight' in DNA repair and other cellular processes, making their precise role in cancer difficult to pinpoint. Still, few would deny that dysregulated mRNA splicing is a pervasive feature of most cancers. Correctly interpreting these molecular fingerprints can reveal novel tumour vulnerabilities and untapped therapeutic opportunities. Yet multiple technological challenges, lingering misconceptions, and outstanding questions hinder clinical translation. To start with, the general landscape of splicing aberrations in cancer is not well defined, due to limitations of short-read RNA sequencing not adept at resolving complete mRNA isoforms, as well as the shallow read depth inherent in long-read RNA-sequencing, especially at single-cell level. Although individual cancer-associated isoforms are known to contribute to cancer progression, widespread splicing alterations could be an equally important and, perhaps, more readily actionable feature of human cancers. This is to say that in addition to 'repairing' mis-spliced transcripts, possible therapeutic avenues include exacerbating splicing aberration with small-molecule spliceosome inhibitors, targeting recurrent splicing aberrations with synthetic lethal approaches, and training the immune system to recognize splicing-derived neoantigens.
Collapse
Affiliation(s)
- Olga Anczukow
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA.
| | - Frédéric H-T Allain
- Department of Biology, Eidgenössische Technische Hochschule (ETH), Zürich, Switzerland
| | | | - Douglas L Black
- Department of Microbiology, Immunology, and Molecular Genetics, University of California Los Angeles, Los Angeles, CA, USA
| | - Angela N Brooks
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Chonghui Cheng
- Department of Molecular and Human Genetics, Lester & Sue Breast Center, Baylor College of Medicine, Houston, TX, USA
| | - Ana Conesa
- Institute for Integrative Systems Biology, Spanish National Research Council, Paterna, Spain
| | - Edie I Crosse
- Basic Sciences Division, Fred Hutchinson Cancer Center, Seattle, WA, USA
| | - Eduardo Eyras
- Shine-Dalgarno Centre for RNA Innovation, Australian National University, Canberra, Australian Capital Territory, Australia
| | - Ernesto Guccione
- Department of Oncological Sciences, Mount Sinai School of Medicine, New York, NY, USA
| | - Sydney X Lu
- Department of Medicine, Stanford Medical School, Palo Alto, CA, USA
| | - Karla M Neugebauer
- Department of Molecular Biophysics & Biochemistry, Yale University, New Haven, CT, USA
| | - Priyanka Sehgal
- Division of Cancer Pathobiology, Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - Xiao Song
- Department of Neurology, Northwestern University, Chicago, IL, USA
| | - Zuzana Tothova
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Juan Valcárcel
- Centre for Genomic Regulation, Institució Catalana de Recerca i Estudis Avançats, Barcelona, Spain
| | - Kevin M Weeks
- Department of Chemistry, University of North Carolina, Chapel Hill, NC, USA
| | - Gene W Yeo
- Department of Cellular and Molecular Medicine, University of California San Diego, La Jolla, CA, USA
| | - Andrei Thomas-Tikhonenko
- Division of Cancer Pathobiology, Children's Hospital of Philadelphia, Philadelphia, PA, USA.
- Department of Pathology & Laboratory Medicine, Perelman School of Medicine at the University of Pennsylvania, Philadelphia, PA, USA.
| |
Collapse
|
2
|
Ritter AJ, Draper JM, Vollmers C, Sanford JR. Long-read subcellular fractionation and sequencing reveals the translational fate of full-length mRNA isoforms during neuronal differentiation. Genome Res 2024; 34:2000-2011. [PMID: 38839373 PMCID: PMC11610577 DOI: 10.1101/gr.279170.124] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2024] [Accepted: 05/21/2024] [Indexed: 06/07/2024]
Abstract
Alternative splicing (AS) alters the cis-regulatory landscape of mRNA isoforms, leading to transcripts with distinct localization, stability, and translational efficiency. To rigorously investigate mRNA isoform-specific ribosome association, we generated subcellular fractionation and sequencing (Frac-seq) libraries using both conventional short reads and long reads from human embryonic stem cells (ESCs) and neural progenitor cells (NPCs) derived from the same ESCs. We performed de novo transcriptome assembly from high-confidence long reads from cytosolic, monosomal, light, and heavy polyribosomal fractions and quantified their abundance using short reads from their respective subcellular fractions. Thousands of transcripts in each cell type exhibited association with particular subcellular fractions relative to the cytosol. Of the multi-isoform genes, 27% and 19% exhibited significant differential isoform sedimentation in ESCs and NPCs, respectively. Alternative promoter usage and internal exon skipping accounted for the majority of differences between isoforms from the same gene. Random forest classifiers implicated coding sequence (CDS) and untranslated region (UTR) lengths as important determinants of isoform-specific sedimentation profiles, and motif analyses reveal potential cell type-specific and subcellular fraction-associated RNA-binding protein signatures. Taken together, our data demonstrate that alternative mRNA processing within the CDS and UTRs impacts the translational control of mRNA isoforms during stem cell differentiation, and highlight the utility of using a novel long-read sequencing-based method to study translational control.
Collapse
Affiliation(s)
- Alexander J Ritter
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, California 95064, USA
| | - Jolene M Draper
- Department of Molecular, Cell, and Developmental Biology and Center for Molecular Biology of RNA, University of California Santa Cruz, Santa Cruz, California 95064, USA
| | - Christopher Vollmers
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, California 95064, USA
| | - Jeremy R Sanford
- Department of Molecular, Cell, and Developmental Biology and Center for Molecular Biology of RNA, University of California Santa Cruz, Santa Cruz, California 95064, USA
| |
Collapse
|
3
|
Abood A, Mesner LD, Jeffery ED, Murali M, Lehe MD, Saquing J, Farber CR, Sheynkman GM. Long-read proteogenomics to connect disease-associated sQTLs to the protein isoform effectors of disease. Am J Hum Genet 2024; 111:1914-1931. [PMID: 39079539 PMCID: PMC11393689 DOI: 10.1016/j.ajhg.2024.07.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2024] [Revised: 07/01/2024] [Accepted: 07/02/2024] [Indexed: 08/07/2024] Open
Abstract
A major fraction of loci identified by genome-wide association studies (GWASs) mediate alternative splicing, but mechanistic interpretation is hindered by the technical limitations of short-read RNA sequencing (RNA-seq), which cannot directly link splicing events to full-length protein isoforms. Long-read RNA-seq represents a powerful tool to characterize transcript isoforms, and recently, infer protein isoform existence. Here, we present an approach that integrates information from GWASs, splicing quantitative trait loci (sQTLs), and PacBio long-read RNA-seq in a disease-relevant model to infer the effects of sQTLs on the ultimate protein isoform products they encode. We demonstrate the utility of our approach using bone mineral density (BMD) GWAS data. We identified 1,863 sQTLs from the Genotype-Tissue Expression (GTEx) project in 732 protein-coding genes that colocalized with BMD associations (H4PP ≥ 0.75). We generated PacBio Iso-Seq data (N = ∼22 million full-length reads) on human osteoblasts, identifying 68,326 protein-coding isoforms, of which 17,375 (25%) were unannotated. By casting the sQTLs onto protein isoforms, we connected 809 sQTLs to 2,029 protein isoforms from 441 genes expressed in osteoblasts. Overall, we found that 74 sQTLs influenced isoforms likely impacted by nonsense-mediated decay and 190 that potentially resulted in the expression of unannotated protein isoforms. Finally, we functionally validated colocalizing sQTLs in TPM2, in which siRNA-mediated knockdown in osteoblasts showed two TPM2 isoforms with opposing effects on mineralization but exhibited no effect upon knockdown of the entire gene. Our approach should be to generalize across diverse clinical traits and to provide insights into protein isoform activities modulated by GWAS loci.
Collapse
Affiliation(s)
- Abdullah Abood
- Center for Public Health Genomics, School of Medicine, University of Virginia, Charlottesville, VA 22908, USA; Department of Biochemistry and Molecular Genetics, School of Medicine, University of Virginia, Charlottesville, VA 22908, USA
| | - Larry D Mesner
- Center for Public Health Genomics, School of Medicine, University of Virginia, Charlottesville, VA 22908, USA; Department of Public Health Sciences, University of Virginia, Charlottesville, VA 22908, USA
| | - Erin D Jeffery
- Department of Molecular Physiology and Biological Physics, University of Virginia, Charlottesville, VA, USA
| | - Mayank Murali
- Department of Molecular Physiology and Biological Physics, University of Virginia, Charlottesville, VA, USA
| | - Micah D Lehe
- Department of Molecular Physiology and Biological Physics, University of Virginia, Charlottesville, VA, USA
| | - Jamie Saquing
- Department of Molecular Physiology and Biological Physics, University of Virginia, Charlottesville, VA, USA
| | - Charles R Farber
- Center for Public Health Genomics, School of Medicine, University of Virginia, Charlottesville, VA 22908, USA; Department of Biochemistry and Molecular Genetics, School of Medicine, University of Virginia, Charlottesville, VA 22908, USA; Department of Public Health Sciences, University of Virginia, Charlottesville, VA 22908, USA.
| | - Gloria M Sheynkman
- Center for Public Health Genomics, School of Medicine, University of Virginia, Charlottesville, VA 22908, USA; Department of Molecular Physiology and Biological Physics, University of Virginia, Charlottesville, VA, USA; UVA Comprehensive Cancer Center, University of Virginia, Charlottesville, VA, USA.
| |
Collapse
|
4
|
Leung SK, Bamford RA, Jeffries AR, Castanho I, Chioza B, Flaxman CS, Moore K, Dempster EL, Harvey J, Brown JT, Ahmed Z, O'Neill P, Richardson SJ, Hannon E, Mill J. Long-read transcript sequencing identifies differential isoform expression in the entorhinal cortex in a transgenic model of tau pathology. Nat Commun 2024; 15:6458. [PMID: 39095344 PMCID: PMC11297290 DOI: 10.1038/s41467-024-50486-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2023] [Accepted: 07/10/2024] [Indexed: 08/04/2024] Open
Abstract
Increasing evidence suggests that alternative splicing plays an important role in Alzheimer's disease (AD) pathology. We used long-read sequencing in combination with a novel bioinformatics tool (FICLE) to profile transcript diversity in the entorhinal cortex of female transgenic (TG) mice harboring a mutant form of human tau. Our analyses revealed hundreds of novel isoforms and identified differentially expressed transcripts - including specific isoforms of Apoe, App, Cd33, Clu, Fyn and Trem2 - associated with the development of tau pathology in TG mice. Subsequent profiling of the human cortex from AD individuals and controls revealed similar patterns of transcript diversity, including the upregulation of the dominant TREM2 isoform in AD paralleling the increased expression of the homologous transcript in TG mice. Our results highlight the importance of differential transcript usage, even in the absence of gene-level expression alterations, as a mechanism underpinning gene regulation in the development of AD neuropathology.
Collapse
Affiliation(s)
- Szi Kay Leung
- Department of Clinical and Biomedical Sciences, University of Exeter, Exeter, UK.
| | - Rosemary A Bamford
- Department of Clinical and Biomedical Sciences, University of Exeter, Exeter, UK
| | | | - Isabel Castanho
- Department of Clinical and Biomedical Sciences, University of Exeter, Exeter, UK
- Department of Pathology, Beth Israel Deaconess Medical Center, Boston, MA, USA
- Harvard Medical School, Boston, MA, USA
| | - Barry Chioza
- Department of Clinical and Biomedical Sciences, University of Exeter, Exeter, UK
| | - Christine S Flaxman
- Department of Clinical and Biomedical Sciences, University of Exeter, Exeter, UK
| | - Karen Moore
- Biosciences, University of Exeter, Exeter, UK
| | - Emma L Dempster
- Department of Clinical and Biomedical Sciences, University of Exeter, Exeter, UK
| | - Joshua Harvey
- Department of Clinical and Biomedical Sciences, University of Exeter, Exeter, UK
| | - Jonathan T Brown
- Department of Clinical and Biomedical Sciences, University of Exeter, Exeter, UK
| | | | | | - Sarah J Richardson
- Department of Clinical and Biomedical Sciences, University of Exeter, Exeter, UK
| | - Eilis Hannon
- Department of Clinical and Biomedical Sciences, University of Exeter, Exeter, UK
| | - Jonathan Mill
- Department of Clinical and Biomedical Sciences, University of Exeter, Exeter, UK.
| |
Collapse
|
5
|
Pardo-Palacios FJ, Arzalluz-Luque A, Kondratova L, Salguero P, Mestre-Tomás J, Amorín R, Estevan-Morió E, Liu T, Nanni A, McIntyre L, Tseng E, Conesa A. SQANTI3: curation of long-read transcriptomes for accurate identification of known and novel isoforms. Nat Methods 2024; 21:793-797. [PMID: 38509328 PMCID: PMC11093726 DOI: 10.1038/s41592-024-02229-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2023] [Accepted: 03/01/2024] [Indexed: 03/22/2024]
Abstract
SQANTI3 is a tool designed for the quality control, curation and annotation of long-read transcript models obtained with third-generation sequencing technologies. Leveraging its annotation framework, SQANTI3 calculates quality descriptors of transcript models, junctions and transcript ends. With this information, potential artifacts can be identified and replaced with reliable sequences. Furthermore, the integrated functional annotation feature enables subsequent functional iso-transcriptomics analyses.
Collapse
Affiliation(s)
- Francisco J Pardo-Palacios
- Institute for Integrative Systems Biology, Spanish National Research Council, Paterna, Valencia, Spain
- Department of Applied Statistics and Operational Research, and Quality, Universitat Politècnica de València, Valencia, Valencia, Spain
| | - Angeles Arzalluz-Luque
- Institute for Integrative Systems Biology, Spanish National Research Council, Paterna, Valencia, Spain
- Department of Applied Statistics and Operational Research, and Quality, Universitat Politècnica de València, Valencia, Valencia, Spain
| | - Liudmyla Kondratova
- Horticultural Sciences Department, University of Florida, Gainesville, FL, USA
- Genetics Institute, University of Florida, Gainesville, FL, USA
| | - Pedro Salguero
- Department of Applied Statistics and Operational Research, and Quality, Universitat Politècnica de València, Valencia, Valencia, Spain
| | - Jorge Mestre-Tomás
- Institute for Integrative Systems Biology, Spanish National Research Council, Paterna, Valencia, Spain
| | - Rocío Amorín
- Genetics Institute, University of Florida, Gainesville, FL, USA
- Department of Microbiology and Cell Science, University of Florida, Gainesville, FL, USA
| | - Eva Estevan-Morió
- Institute for Integrative Systems Biology, Spanish National Research Council, Paterna, Valencia, Spain
| | - Tianyuan Liu
- Institute for Integrative Systems Biology, Spanish National Research Council, Paterna, Valencia, Spain
| | - Adalena Nanni
- Department of Molecular Genetics and Microbiology, University of Florida, Gainesville, FL, USA
| | - Lauren McIntyre
- Genetics Institute, University of Florida, Gainesville, FL, USA
- Department of Molecular Genetics and Microbiology, University of Florida, Gainesville, FL, USA
| | | | - Ana Conesa
- Institute for Integrative Systems Biology, Spanish National Research Council, Paterna, Valencia, Spain.
| |
Collapse
|
6
|
Kato Y, Nitta JH, Perez CAG, Adhitama N, Religia P, Toyoda A, Iwasaki W, Watanabe H. Identification of gene isoforms and their switching events between male and female embryos of the parthenogenetic crustacean Daphnia magna. Sci Rep 2024; 14:9407. [PMID: 38688940 PMCID: PMC11061156 DOI: 10.1038/s41598-024-59774-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2024] [Accepted: 04/15/2024] [Indexed: 05/02/2024] Open
Abstract
The cladoceran crustacean Daphnia exhibits phenotypic plasticity, a phenomenon that leads to diverse phenotypes from one genome. Alternative usage of gene isoforms has been considered a key gene regulation mechanism for controlling different phenotypes. However, to understand the phenotypic plasticity of Daphnia, gene isoforms have not been comprehensively analyzed. Here we identified 25,654 transcripts derived from the 9710 genes expressed during environmental sex determination of Daphnia magna using the long-read RNA-Seq with PacBio Iso-Seq. We found that 14,924 transcripts were previously unidentified and 5713 genes produced two or more isoforms. By a combination of Illumina short-read RNA-Seq, we detected 824 genes that implemented switching of the highest expressed isoform between females and males. Among the 824 genes, we found isoform switching of an ortholog of CREB-regulated transcription coactivator, a major regulator of carbohydrate metabolism in animals, and a correlation of this switching event with the sexually dimorphic expression of carbohydrate metabolic genes. These results suggest that a comprehensive catalog of isoforms may lead to understanding the molecular basis for environmental sex determination of Daphnia. We also infer the applicability of the full-length isoform analyses to the elucidation of phenotypic plasticity in Daphnia.
Collapse
Affiliation(s)
- Yasuhiko Kato
- Department of Biotechnology, Graduate School of Engineering, Osaka University, Suita, Osaka, Japan.
- Institute for Open and Transdisciplinary Research Initiatives (OTRI), Osaka University, Suita, Osaka, Japan.
| | - Joel H Nitta
- Department of Integrated Biosciences, Graduate School of Frontier Sciences, The University of Tokyo, Tokyo, Japan
| | | | - Nikko Adhitama
- Department of Biotechnology, Graduate School of Engineering, Osaka University, Suita, Osaka, Japan
- Institute for Open and Transdisciplinary Research Initiatives (OTRI), Osaka University, Suita, Osaka, Japan
| | - Pijar Religia
- Department of Biotechnology, Graduate School of Engineering, Osaka University, Suita, Osaka, Japan
| | - Atsushi Toyoda
- Advanced Genomics Center, National Institute of Genetics, Mishima, Shizuoka, Japan
| | - Wataru Iwasaki
- Department of Integrated Biosciences, Graduate School of Frontier Sciences, The University of Tokyo, Tokyo, Japan
| | - Hajime Watanabe
- Department of Biotechnology, Graduate School of Engineering, Osaka University, Suita, Osaka, Japan
- Institute for Open and Transdisciplinary Research Initiatives (OTRI), Osaka University, Suita, Osaka, Japan
| |
Collapse
|
7
|
Murali M, Saquing J, Lu S, Gao Z, Jordan B, Wakefield ZP, Fiszbein A, Cooper DR, Castaldi PJ, Korkin D, Sheynkman G. Biosurfer for systematic tracking of regulatory mechanisms leading to protein isoform diversity. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.03.15.585320. [PMID: 38559226 PMCID: PMC10980011 DOI: 10.1101/2024.03.15.585320] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 04/04/2024]
Abstract
Long-read RNA sequencing has shed light on transcriptomic complexity, but questions remain about the functionality of downstream protein products. We introduce Biosurfer, a computational approach for comparing protein isoforms, while systematically tracking the transcriptional, splicing, and translational variations that underlie differences in the sequences of the protein products. Using Biosurfer, we analyzed the differences in 32,799 pairs of GENCODE annotated protein isoforms, finding a majority (70%) of variable N-termini are due to the alternative transcription start sites, while only 9% arise from 5' UTR alternative splicing. Biosurfer's detailed tracking of nucleotide-to-residue relationships helped reveal an uncommonly tracked source of single amino acid residue changes arising from the codon splits at junctions. For 17% of internal sequence changes, such split codon patterns lead to single residue differences, termed "ragged codons". Of variable C-termini, 72% involve splice- or intron retention-induced reading frameshifts. We found an unusual pattern of reading frame changes, in which the first frameshift is closely followed by a distinct second frameshift that restores the original frame, which we term a "snapback" frameshift. We analyzed long read RNA-seq-predicted proteome of a human cell line and found similar trends as compared to our GENCODE analysis, with the exception of a higher proportion of isoforms predicted to undergo nonsense-mediated decay. Biosurfer's comprehensive characterization of long-read RNA-seq datasets should accelerate insights of the functional role of protein isoforms, providing mechanistic explanation of the origins of the proteomic diversity driven by the alternative splicing. Biosurfer is available as a Python package at https://github.com/sheynkman-lab/biosurfer.
Collapse
Affiliation(s)
- Mayank Murali
- Broad Institute of MIT and Harvard University, Cambridge, MA, USA
| | - Jamie Saquing
- Department of Molecular Physiology and Biological Physics, University of Virginia, Charlottesville, VA, USA
| | - Senbao Lu
- Bioinformatics and Computational Biology Program, Worcester Polytechnic Institute, Worcester, MA, USA
- Computer Science Department, Worcester Polytechnic Institute, Worcester, MA, USA
| | - Ziyang Gao
- Bioinformatics and Computational Biology Program, Worcester Polytechnic Institute, Worcester, MA, USA
- Computer Science Department, Worcester Polytechnic Institute, Worcester, MA, USA
| | - Ben Jordan
- Department of Molecular Physiology and Biological Physics, University of Virginia, Charlottesville, VA, USA
| | - Zachary Peters Wakefield
- Bioinformatics Program, Boston University, Boston, MA, USA
- Department of Biology, Boston University, Boston, MA, USA
| | - Ana Fiszbein
- Bioinformatics Program, Boston University, Boston, MA, USA
- Department of Biology, Boston University, Boston, MA, USA
| | - David R. Cooper
- Department of Molecular Physiology and Biological Physics, University of Virginia, Charlottesville, VA, USA
| | - Peter J. Castaldi
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women’s Hospital, Boston, MA, USA
- Division of General Medicine and Primary Care, Department of Medicine, Brigham and Women’s Hospital, Boston, MA, USA
| | - Dmitry Korkin
- Bioinformatics and Computational Biology Program, Worcester Polytechnic Institute, Worcester, MA, USA
- Computer Science Department, Worcester Polytechnic Institute, Worcester, MA, USA
| | - Gloria Sheynkman
- Department of Molecular Physiology and Biological Physics, University of Virginia, Charlottesville, VA, USA
- Department of Biochemistry and Molecular Genetics, University of Virginia, Charlottesville, VA, USA
- Center for Public Health Genomics, University of Virginia, Charlottesville, VA, USA
- UVA Cancer Center, University of Virginia, Charlottesville, VA, USA
| |
Collapse
|
8
|
Xu F, Liu S, Zhao A, Shang M, Wang Q, Jiang S, Cheng Q, Chen X, Zhai X, Zhang J, Wang X, Yan J. iFLAS: positive-unlabeled learning facilitates full-length transcriptome-based identification and functional exploration of alternatively spliced isoforms in maize. THE NEW PHYTOLOGIST 2024; 241:2606-2620. [PMID: 38291701 DOI: 10.1111/nph.19554] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/24/2023] [Accepted: 01/06/2024] [Indexed: 02/01/2024]
Abstract
The advent of full-length transcriptome sequencing technologies has accelerated the discovery of novel splicing isoforms. However, existing alternative splicing (AS) tools are either tailored for short-read RNA-Seq data or designed for human and animal studies. The disparities in AS patterns between plants and animals still pose a challenge to the reliable identification and functional exploration of novel isoforms in plants. Here, we developed integrated full-length alternative splicing analysis (iFLAS), a plant-optimized AS toolkit that introduced a semi-supervised machine learning method known as positive-unlabeled (PU) learning to accurately identify novel isoforms. iFLAS also enables the investigation of AS functions from various perspectives, such as differential AS, poly(A) tail length, and allele-specific AS (ASAS) analyses. By applying iFLAS to three full-length transcriptome sequencing datasets, we systematically identified and functionally characterized maize (Zea mays) AS patterns. We found intron retention not only introduces premature termination codons, resulting in lower expression levels of isoforms, but may also regulate the length of 3'UTR and poly(A) tail, thereby affecting the functional differentiation of isoforms. Moreover, we observed distinct ASAS patterns in two genes within heterosis offspring, highlighting their potential value in breeding. These results underscore the broad applicability of iFLAS in plant full-length transcriptome-based AS research.
Collapse
Affiliation(s)
- Feng Xu
- State Key Laboratory of Maize Bio-Breeding, National Maize Improvement Center, Frontiers Science Center for Molecular Design Breeding, College of Agronomy and Biotechnology, China Agricultural University, Beijing, 100094, China
| | - Songyu Liu
- State Key Laboratory of Maize Bio-Breeding, National Maize Improvement Center, Frontiers Science Center for Molecular Design Breeding, College of Agronomy and Biotechnology, China Agricultural University, Beijing, 100094, China
| | - Anwen Zhao
- State Key Laboratory of Maize Bio-Breeding, National Maize Improvement Center, Frontiers Science Center for Molecular Design Breeding, College of Agronomy and Biotechnology, China Agricultural University, Beijing, 100094, China
| | - Meiqi Shang
- State Key Laboratory of Maize Bio-Breeding, National Maize Improvement Center, Frontiers Science Center for Molecular Design Breeding, College of Agronomy and Biotechnology, China Agricultural University, Beijing, 100094, China
| | - Qian Wang
- State Key Laboratory of Maize Bio-Breeding, National Maize Improvement Center, Frontiers Science Center for Molecular Design Breeding, College of Agronomy and Biotechnology, China Agricultural University, Beijing, 100094, China
| | - Shuqin Jiang
- State Key Laboratory of Maize Bio-Breeding, National Maize Improvement Center, Frontiers Science Center for Molecular Design Breeding, College of Agronomy and Biotechnology, China Agricultural University, Beijing, 100094, China
| | - Qian Cheng
- State Key Laboratory of Maize Bio-Breeding, National Maize Improvement Center, Frontiers Science Center for Molecular Design Breeding, College of Agronomy and Biotechnology, China Agricultural University, Beijing, 100094, China
| | - Xingming Chen
- Molbreeding Biotechnology Co., Ltd, Shijiazhuang, Hebei Province, 051430, China
| | - Xiaoguang Zhai
- Molbreeding Biotechnology Co., Ltd, Shijiazhuang, Hebei Province, 051430, China
| | - Jianan Zhang
- Molbreeding Biotechnology Co., Ltd, Shijiazhuang, Hebei Province, 051430, China
| | - Xiangfeng Wang
- State Key Laboratory of Maize Bio-Breeding, National Maize Improvement Center, Frontiers Science Center for Molecular Design Breeding, College of Agronomy and Biotechnology, China Agricultural University, Beijing, 100094, China
| | - Jun Yan
- State Key Laboratory of Maize Bio-Breeding, National Maize Improvement Center, Frontiers Science Center for Molecular Design Breeding, College of Agronomy and Biotechnology, China Agricultural University, Beijing, 100094, China
| |
Collapse
|
9
|
Boti MA, Adamopoulos PG, Vassilacopoulou D, Scorilas A. Unraveling the Concealed Transcriptomic Landscape of PTEN in Human Malignancies. Curr Genomics 2023; 24:250-262. [PMID: 38169628 PMCID: PMC10758127 DOI: 10.2174/0113892029265367231013113304] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2023] [Revised: 07/29/2023] [Accepted: 09/19/2023] [Indexed: 01/05/2024] Open
Abstract
Background Phosphatase and tensin homolog, widely known as PTEN, is a major negative regulator of the PI3K/AKT/mTOR signaling pathway, involved in the regulation of a variety of important cellular processes, including cell proliferation, growth, survival, and metabolism. Since most of the molecules involved in this biological pathway have been described as key regulators in cancer, the study of the corresponding genes at several levels is crucial. Objective Although previous studies have elucidated the physiological role of PTEN under normal conditions and its involvement in carcinogenesis and cancer progression, the transcriptional profile of PTEN has been poorly investigated. Methods In this study, instead of conducting the "gold-standard" direct RNA sequencing that fails to detect less abundant novel mRNAs due to the decreased sequencing depth, we designed and implemented a multiplexed PTEN-targeted sequencing approach that combined both short- and long-read sequencing. Results Our study has highlighted a broad spectrum of previously unknown PTEN mRNA transcripts and assessed their expression patterns in a wide range of human cancer and non-cancer cell lines, shedding light on the involvement of PTEN in cell cycle dysregulation and thus tumor development. Conclusion The identification of the described novel PTEN splice variants could have significant implications for understanding PTEN regulation and function, and provide new insights into PTEN biology, opening new avenues for monitoring PTEN-related diseases, including cancer.
Collapse
Affiliation(s)
- Michaela A. Boti
- Department of Biochemistry and Molecular Biology, Faculty of Biology, National and Kapodistrian University of Athens, Athens, Greece
| | - Panagiotis G. Adamopoulos
- Department of Biochemistry and Molecular Biology, Faculty of Biology, National and Kapodistrian University of Athens, Athens, Greece
| | - Dido Vassilacopoulou
- Department of Biochemistry and Molecular Biology, Faculty of Biology, National and Kapodistrian University of Athens, Athens, Greece
| | - Andreas Scorilas
- Department of Biochemistry and Molecular Biology, Faculty of Biology, National and Kapodistrian University of Athens, Athens, Greece
| |
Collapse
|
10
|
Nguyen LAC, Mori M, Yasuda Y, Galipon J. Functional Consequences of Shifting Transcript Boundaries in Glucose Starvation. Mol Cell Biol 2023; 43:611-628. [PMID: 37937348 PMCID: PMC10761120 DOI: 10.1080/10985549.2023.2270406] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2023] [Accepted: 10/10/2023] [Indexed: 11/09/2023] Open
Abstract
Glucose is a major source of carbon and essential for the survival of many organisms, ranging from yeast to human. A sudden 60-fold reduction of glucose in exponentially growing fission yeast induces transcriptome-wide changes in gene expression. This regulation is multilayered, and the boundaries of transcripts are known to vary, with functional consequences at the protein level. By combining direct RNA sequencing with 5'-CAGE and short-read sequencing, we accurately defined the 5'- and 3'-ends of transcripts that are both poly(A) tailed and 5'-capped in glucose starvation, followed by proteome analysis. Our results confirm previous experimentally validated loci with alternative isoforms and reveal several transcriptome-wide patterns. First, we show that sense-antisense gene pairs are more strongly anticorrelated when a time lag is taken into account. Second, we show that the glucose starvation response initially elicits a shortening of 3'-UTRs and poly(A) tails, followed by a shortening of the 5'-UTRs at later time points. These result in domain gains and losses in proteins involved in the stress response. Finally, the relatively poor overlap both between differentially expressed genes (DEGs), differential transcript usage events (DTUs), and differentially detected proteins (DDPs) highlight the need for further study on post-transcriptional regulation mechanisms in glucose starvation.
Collapse
Affiliation(s)
- Lan Anh Catherine Nguyen
- Institute for Advanced Biosciences, Keio University, Yamagata, Tsuruoka, Japan
- Systems Biology Program, Graduate School of Media and Governance, Keio University, Kanagawa, Fujisawa, Japan
| | - Masaru Mori
- Institute for Advanced Biosciences, Keio University, Yamagata, Tsuruoka, Japan
- Systems Biology Program, Graduate School of Media and Governance, Keio University, Kanagawa, Fujisawa, Japan
- Institute of Innovation for Future Society, Nagoya University, Aichi, Nagoya, Japan
| | - Yuji Yasuda
- Institute for Advanced Biosciences, Keio University, Yamagata, Tsuruoka, Japan
- Faculty of Environment and Information Studies, Keio University, Kanagawa, Fujisawa, Japan
| | - Josephine Galipon
- Institute for Advanced Biosciences, Keio University, Yamagata, Tsuruoka, Japan
- Systems Biology Program, Graduate School of Media and Governance, Keio University, Kanagawa, Fujisawa, Japan
- Graduate School of Science and Engineering, Yamagata University, Yamagata, Yonezawa, Japan
| |
Collapse
|
11
|
Al-Dossary O, Furtado A, KharabianMasouleh A, Alsubaie B, Al-Mssallem I, Henry RJ. Long read sequencing to reveal the full complexity of a plant transcriptome by targeting both standard and long workflows. PLANT METHODS 2023; 19:112. [PMID: 37865785 PMCID: PMC10589961 DOI: 10.1186/s13007-023-01091-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/17/2022] [Accepted: 10/13/2023] [Indexed: 10/23/2023]
Abstract
BACKGROUND Long read sequencing allows the analysis of full-length transcripts in plants without the challenges of reliable transcriptome assembly. Long read sequencing of transcripts from plant genomes has often utilized sized transcript libraries. However, the value of including libraries of differing sizes has not been established. METHODS A comprehensive transcriptome of the leaves of Jojoba (Simmondsia chinensis) was generated from two different PacBio library preparations: standard workflow (SW) and long workflow (LW). RESULTS The importance of using both transcript groups in the analysis was demonstrated by the high proportion of unique sequences (74.6%) that were not shared between the groups. A total of 37.8% longer transcripts were only detected in the long dataset. The completeness of the combined transcriptome was indicated by the presence of 98.7% of genes predicted in the jojoba male reference genome. The high coverage of the transcriptome was further confirmed by BUSCO analysis showing the presence of 96.9% of the genes from the core viridiplantae_odb10 lineage. The high-quality isoforms post Cd-Hit merged dataset of the two workflows had a total of 167,866 isoforms. Most of the transcript isoforms were protein-coding sequences (71.7%) containing open reading frames (ORFs) ≥ 100 amino acids (aa). Alternative splicing and intron retention were the basis of most transcript diversity when analysed at the whole genome level and by specific analysis of the apetala2 gene families. CONCLUSION This suggests the need to specifically target the capture of longer transcripts to provide more comprehensive genome coverage in plant transcriptome analysis and reveal the high level of alternative splicing.
Collapse
Affiliation(s)
- Othman Al-Dossary
- Queensland Alliance for Agriculture and Food Innovation, University of Queensland, Brisbane, 4072, Australia
- College of Agriculture and Food Sciences, King Faisal University, 36362, Al Hofuf, Saudi Arabia
| | - Agnelo Furtado
- Queensland Alliance for Agriculture and Food Innovation, University of Queensland, Brisbane, 4072, Australia
| | - Ardashir KharabianMasouleh
- Queensland Alliance for Agriculture and Food Innovation, University of Queensland, Brisbane, 4072, Australia
| | - Bader Alsubaie
- Queensland Alliance for Agriculture and Food Innovation, University of Queensland, Brisbane, 4072, Australia
- College of Agriculture and Food Sciences, King Faisal University, 36362, Al Hofuf, Saudi Arabia
| | - Ibrahim Al-Mssallem
- College of Agriculture and Food Sciences, King Faisal University, 36362, Al Hofuf, Saudi Arabia
| | - Robert J Henry
- Queensland Alliance for Agriculture and Food Innovation, University of Queensland, Brisbane, 4072, Australia.
- ARC Centre of Excellence for Plant Success in Nature and Agriculture, University of Queensland, Brisbane, 4072, Australia.
| |
Collapse
|
12
|
Xia Y, Jin Z, Zhang C, Ouyang L, Dong Y, Li J, Guo L, Jing B, Shi Y, Miao S, Xi R. TAGET: a toolkit for analyzing full-length transcripts from long-read sequencing. Nat Commun 2023; 14:5935. [PMID: 37741817 PMCID: PMC10518008 DOI: 10.1038/s41467-023-41649-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2022] [Accepted: 09/13/2023] [Indexed: 09/25/2023] Open
Abstract
Single-molecule Real-time Isoform Sequencing (Iso-seq) of transcriptomes by PacBio can generate very long and accurate reads, thus providing an ideal platform for full-length transcriptome analysis. We present an integrated computational toolkit named TAGET for Iso-seq full-length transcript data analyses, including transcript alignment, annotation, gene fusion detection, and quantification analyses such as differential expression gene analysis and differential isoform usage analysis. We evaluate the performance of TAGET using a public Iso-seq dataset and newly sequenced Iso-seq datasets from tumor patients. TAGET gives significantly more precise novel splice site prediction and enables more accurate novel isoform and gene fusion discoveries, as validated by experimental validations and comparisons with RNA-seq data. We identify and experimentally validate a differential isoform usage gene ECM1, and further show that its isoform ECM1b may be a tumor-suppressor in laryngocarcinoma. Our results demonstrate that TAGET provides a valuable computational toolkit and can be applied to many full-length transcriptome studies.
Collapse
Affiliation(s)
- Yuchao Xia
- College of Science, Beijing Information Science and Technology University, 100192, Beijing, China
- Beijing GeneX Health Co.,Ltd, 100195, Beijing, China
| | - Zijie Jin
- Peking University International Cancer Institute, Health Science Center, Peking University, 100191, Beijing, China
- School of Mathematical Sciences, Peking University, 100871, Beijing, China
| | | | - Linkun Ouyang
- Academy for Advanced Interdisciplinary Studies, Peking University, 100871, Beijing, China
| | - Yuhao Dong
- Beijing GeneX Health Co.,Ltd, 100195, Beijing, China
| | - Juan Li
- Department of Biomedical Engineering, College of Future Technology, Peking University, 100871, Beijing, China
| | - Lvze Guo
- Beijing GeneX Health Co.,Ltd, 100195, Beijing, China
| | - Biyang Jing
- Beijing GeneX Health Co.,Ltd, 100195, Beijing, China
| | - Yang Shi
- BeiGene (Beijing) Co., Ltd., Beijing, China
| | - Susheng Miao
- Department of Head and Neck Surgery, Harbin Medical University Cancer Hospital, 150081, Harbin, China.
| | - Ruibin Xi
- School of Mathematical Sciences, Peking University, 100871, Beijing, China.
- Academy for Advanced Interdisciplinary Studies, Peking University, 100871, Beijing, China.
- Center for Statistical Science, Peking University, 100871, Beijing, China.
| |
Collapse
|
13
|
Pardo-Palacios FJ, Arzalluz-Luque A, Kondratova L, Salguero P, Mestre-Tomás J, Amorín R, Estevan-Morió E, Liu T, Nanni A, McIntyre L, Tseng E, Conesa A. SQANTI3: curation of long-read transcriptomes for accurate identification of known and novel isoforms. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.05.17.541248. [PMID: 37398077 PMCID: PMC10312485 DOI: 10.1101/2023.05.17.541248] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/04/2023]
Abstract
The emergence of long-read RNA sequencing (lrRNA-seq) has provided an unprecedented opportunity to analyze transcriptomes at isoform resolution. However, the technology is not free from biases, and transcript models inferred from these data require quality control and curation. In this study, we introduce SQANTI3, a tool specifically designed to perform quality analysis on transcriptomes constructed using lrRNA-seq data. SQANTI3 provides an extensive naming framework to describe transcript model diversity in comparison to the reference transcriptome. Additionally, the tool incorporates a wide range of metrics to characterize various structural properties of transcript models, such as transcription start and end sites, splice junctions, and other structural features. These metrics can be utilized to filter out potential artifacts. Moreover, SQANTI3 includes a Rescue module that prevents the loss of known genes and transcripts exhibiting evidence of expression but displaying low-quality features. Lastly, SQANTI3 incorporates IsoAnnotLite, which enables functional annotation at the isoform level and facilitates functional iso-transcriptomics analyses. We demonstrate the versatility of SQANTI3 in analyzing different data types, isoform reconstruction pipelines, and sequencing platforms, and how it provides novel biological insights into isoform biology. The SQANTI3 software is available at https://github.com/ConesaLab/SQANTI3 .
Collapse
|
14
|
McCabe SD, Nobel AB, Love MI. ACTOR: a latent Dirichlet model to compare expressed isoform proportions to a reference panel. Biostatistics 2023; 24:388-405. [PMID: 33948626 PMCID: PMC10102900 DOI: 10.1093/biostatistics/kxab013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2020] [Revised: 03/19/2021] [Accepted: 03/23/2021] [Indexed: 11/13/2022] Open
Abstract
The relative proportion of RNA isoforms expressed for a given gene has been associated with disease states in cancer, retinal diseases, and neurological disorders. Examination of relative isoform proportions can help determine biological mechanisms, but such analyses often require a per-gene investigation of splicing patterns. Leveraging large public data sets produced by genomic consortia as a reference, one can compare splicing patterns in a data set of interest with those of a reference panel in which samples are divided into distinct groups, such as tissue of origin, or disease status. We propose A latent Dirichlet model to Compare expressed isoform proportions TO a Reference panel (ACTOR), a latent Dirichlet model with Dirichlet Multinomial observations to compare expressed isoform proportions in a data set to an independent reference panel. We use a variational Bayes procedure to estimate posterior distributions for the group membership of one or more samples. Using the Genotype-Tissue Expression project as a reference data set, we evaluate ACTOR on simulated and real RNA-seq data sets to determine tissue-type classifications of genes. ACTOR is publicly available as an R package at https://github.com/mccabes292/actor.
Collapse
Affiliation(s)
- Sean D McCabe
- Department of Biostatistics, University of North Carolina at Chapel Hill, 135 Dauer Drive, Chapel Hill, NC 27599-7400, USA
| | - Andrew B Nobel
- Department of Statistics and Operations Research, University of North Carolina at Chapel Hill, 318 Hanes Hall, Chapel Hill, NC 27599-3260, USA and Department of Biostatistics, University of North Carolina at Chapel Hill, 135 Dauer Drive, Chapel Hill, NC 27599-7400, USA
| | - Michael I Love
- Department of Biostatistics, University of North Carolina at Chapel Hill, 135 Dauer Drive, Chapel Hill, NC 27599-7400, USA and Department of Genetics, University of North Carolina at Chapel Hill, 120 Mason Farm Rd, Chapel Hill, NC 27514, USA
| |
Collapse
|
15
|
Abood A, Mesner LD, Jeffery ED, Murali M, Lehe M, Saquing J, Farber CR, Sheynkman GM. Long-read proteogenomics to connect disease-associated sQTLs to the protein isoform effectors of disease. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.03.17.531557. [PMID: 36993769 PMCID: PMC10055087 DOI: 10.1101/2023.03.17.531557] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/31/2023]
Abstract
A major fraction of loci identified by genome-wide association studies (GWASs) lead to alterations in alternative splicing, but interpretation of how such alterations impact proteins is hindered by the technical limitations of short-read RNA-seq, which cannot directly link splicing events to full-length transcript or protein isoforms. Long-read RNA-seq represents a powerful tool to define and quantify transcript isoforms, and recently, infer protein isoform existence. Here we present a novel approach that integrates information from GWAS, splicing QTL (sQTL), and PacBio long-read RNA-seq in a disease-relevant model to infer the effects of sQTLs on the ultimate protein isoform products they encode. We demonstrate the utility of our approach using bone mineral density (BMD) GWAS data. We identified 1,863 sQTLs from the Genotype-Tissue Expression (GTEx) project in 732 protein-coding genes which colocalized with BMD associations (H 4 PP ≥ 0.75). We generated deep coverage PacBio long-read RNA-seq data (N=∼22 million full-length reads) on human osteoblasts, identifying 68,326 protein-coding isoforms, of which 17,375 (25%) were novel. By casting the colocalized sQTLs directly onto protein isoforms, we connected 809 sQTLs to 2,029 protein isoforms from 441 genes expressed in osteoblasts. Using these data, we created one of the first proteome-scale resources defining full-length isoforms impacted by colocalized sQTLs. Overall, we found that 74 sQTLs influenced isoforms likely impacted by nonsense mediated decay (NMD) and 190 that potentially resulted in the expression of new protein isoforms. Finally, we identified colocalizing sQTLs in TPM2 for splice junctions between two mutually exclusive exons, and two different transcript termination sites, making it impossible to interpret without long-read RNA-seq data. siRNA mediated knockdown in osteoblasts showed two TPM2 isoforms with opposing effects on mineralization. We expect our approach to be widely generalizable across diverse clinical traits and accelerate system-scale analyses of protein isoform activities modulated by GWAS loci.
Collapse
|
16
|
Wu S, Schmitz U. Single-cell and long-read sequencing to enhance modelling of splicing and cell-fate determination. Comput Struct Biotechnol J 2023; 21:2373-2380. [PMID: 37066125 PMCID: PMC10091034 DOI: 10.1016/j.csbj.2023.03.023] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2022] [Revised: 03/13/2023] [Accepted: 03/13/2023] [Indexed: 04/03/2023] Open
Abstract
Single-cell sequencing technologies have revolutionised the life sciences and biomedical research. Single-cell sequencing provides high-resolution data on cell heterogeneity, allowing high-fidelity cell type identification, and lineage tracking. Computational algorithms and mathematical models have been developed to make sense of the data, compensate for errors and simulate the biological processes, which has led to breakthroughs in our understanding of cell differentiation, cell-fate determination and tissue cell composition. The development of long-read (a.k.a. third-generation) sequencing technologies has produced powerful tools for investigating alternative splicing, isoform expression (at the RNA level), genome assembly and the detection of complex structural variants (at the DNA level). In this review, we provide an overview of the recent advancements in single-cell and long-read sequencing technologies, with a particular focus on the computational algorithms that help in correcting, analysing, and interpreting the resulting data. Additionally, we review some mathematical models that use single-cell and long-read sequencing data to study cell-fate determination and alternative splicing, respectively. Moreover, we highlight the emerging opportunities in modelling cell-fate determination that result from the combination of single-cell and long-read sequencing technologies.
Collapse
Affiliation(s)
- Siyuan Wu
- Department of Molecular & Cell Biology, James Cook University, Townsville 4811, Queensland, Australia
- Centre for Tropical Bioinformatics and Molecular Biology, James Cook University, Cairns 4870, Queensland, Australia
- School of Mathematics, Monash University, Melbourne 3800, Victoria, Australia
| | - Ulf Schmitz
- Department of Molecular & Cell Biology, James Cook University, Townsville 4811, Queensland, Australia
- Centre for Tropical Bioinformatics and Molecular Biology, James Cook University, Cairns 4870, Queensland, Australia
| |
Collapse
|
17
|
Weber R, Ghoshdastider U, Spies D, Duré C, Valdivia-Francia F, Forny M, Ormiston M, Renz PF, Taborsky D, Yigit M, Bernasconi M, Yamahachi H, Sendoel A. Monitoring the 5'UTR landscape reveals isoform switches to drive translational efficiencies in cancer. Oncogene 2023; 42:638-650. [PMID: 36550360 PMCID: PMC9957725 DOI: 10.1038/s41388-022-02578-2] [Citation(s) in RCA: 20] [Impact Index Per Article: 20.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2022] [Revised: 12/08/2022] [Accepted: 12/12/2022] [Indexed: 12/24/2022]
Abstract
Transcriptional and translational control are key determinants of gene expression, however, to what extent these two processes can be collectively coordinated is still poorly understood. Here, we use Nanopore long-read sequencing and cap analysis of gene expression (CAGE-seq) to document the landscape of 5' and 3' untranslated region (UTR) isoforms and transcription start sites of epidermal stem cells, wild-type keratinocytes and squamous cell carcinomas. Focusing on squamous cell carcinomas, we show that a small cohort of genes with alternative 5'UTR isoforms exhibit overall increased translational efficiencies and are enriched in ribosomal proteins and splicing factors. By combining polysome fractionations and CAGE-seq, we further characterize two of these UTR isoform genes with identical coding sequences and demonstrate that the underlying transcription start site heterogeneity frequently results in 5' terminal oligopyrimidine (TOP) and pyrimidine-rich translational element (PRTE) motif switches to drive mTORC1-dependent translation of the mRNA. Genome-wide, we show that highly translated squamous cell carcinoma transcripts switch towards increased use of 5'TOP and PRTE motifs, have generally shorter 5'UTRs and expose decreased RNA secondary structures. Notably, we found that the two 5'TOP motif-containing, but not the TOP-less, RPL21 transcript isoforms strongly correlated with overall survival in human head and neck squamous cell carcinoma patients. Our findings warrant isoform-specific analyses in human cancer datasets and suggest that switching between 5'UTR isoforms is an elegant and simple way to alter protein synthesis rates, set their sensitivity to the mTORC1-dependent nutrient-sensing pathway and direct the translational potential of an mRNA by the precise 5'UTR sequence.
Collapse
Affiliation(s)
- Ramona Weber
- Institute for Regenerative Medicine (IREM), University of Zurich, Wagistrasse 12, CH-8952, Schlieren-Zurich, Switzerland
| | - Umesh Ghoshdastider
- Institute for Regenerative Medicine (IREM), University of Zurich, Wagistrasse 12, CH-8952, Schlieren-Zurich, Switzerland
| | - Daniel Spies
- Institute for Regenerative Medicine (IREM), University of Zurich, Wagistrasse 12, CH-8952, Schlieren-Zurich, Switzerland
| | - Clara Duré
- Institute for Regenerative Medicine (IREM), University of Zurich, Wagistrasse 12, CH-8952, Schlieren-Zurich, Switzerland
- Life Science Zurich Graduate School, Molecular Life Science Program, University of Zurich/ETH Zurich, Zurich, Switzerland
| | - Fabiola Valdivia-Francia
- Institute for Regenerative Medicine (IREM), University of Zurich, Wagistrasse 12, CH-8952, Schlieren-Zurich, Switzerland
- Life Science Zurich Graduate School, Molecular Life Science Program, University of Zurich/ETH Zurich, Zurich, Switzerland
| | - Merima Forny
- Institute for Regenerative Medicine (IREM), University of Zurich, Wagistrasse 12, CH-8952, Schlieren-Zurich, Switzerland
| | - Mark Ormiston
- Institute for Regenerative Medicine (IREM), University of Zurich, Wagistrasse 12, CH-8952, Schlieren-Zurich, Switzerland
| | - Peter F Renz
- Institute for Regenerative Medicine (IREM), University of Zurich, Wagistrasse 12, CH-8952, Schlieren-Zurich, Switzerland
| | - David Taborsky
- Institute for Regenerative Medicine (IREM), University of Zurich, Wagistrasse 12, CH-8952, Schlieren-Zurich, Switzerland
- Life Science Zurich Graduate School, Molecular Life Science Program, University of Zurich/ETH Zurich, Zurich, Switzerland
| | - Merve Yigit
- Institute for Regenerative Medicine (IREM), University of Zurich, Wagistrasse 12, CH-8952, Schlieren-Zurich, Switzerland
- Life Science Zurich Graduate School, Molecular Life Science Program, University of Zurich/ETH Zurich, Zurich, Switzerland
| | - Martino Bernasconi
- Institute for Regenerative Medicine (IREM), University of Zurich, Wagistrasse 12, CH-8952, Schlieren-Zurich, Switzerland
| | - Homare Yamahachi
- Institute for Regenerative Medicine (IREM), University of Zurich, Wagistrasse 12, CH-8952, Schlieren-Zurich, Switzerland
| | - Ataman Sendoel
- Institute for Regenerative Medicine (IREM), University of Zurich, Wagistrasse 12, CH-8952, Schlieren-Zurich, Switzerland.
| |
Collapse
|
18
|
Prjibelski AD, Mikheenko A, Joglekar A, Smetanin A, Jarroux J, Lapidus AL, Tilgner HU. Accurate isoform discovery with IsoQuant using long reads. Nat Biotechnol 2023:10.1038/s41587-022-01565-y. [PMID: 36593406 DOI: 10.1038/s41587-022-01565-y] [Citation(s) in RCA: 39] [Impact Index Per Article: 39.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2022] [Accepted: 10/13/2022] [Indexed: 01/04/2023]
Abstract
Annotating newly sequenced genomes and determining alternative isoforms from long-read RNA data are complex and incompletely solved problems. Here we present IsoQuant-a computational tool using intron graphs that accurately reconstructs transcripts both with and without reference genome annotation. For novel transcript discovery, IsoQuant reduces the false-positive rate fivefold and 2.5-fold for Oxford Nanopore reference-based or reference-free mode, respectively. IsoQuant also improves performance for Pacific Biosciences data.
Collapse
Affiliation(s)
- Andrey D Prjibelski
- Center for Algorithmic Biotechnology, Institute of Translational Biomedicine, St. Petersburg State University, St. Petersburg, Russia. .,Department of Computer Science, University of Helsinki, Helsinki, Finland.
| | - Alla Mikheenko
- Center for Algorithmic Biotechnology, Institute of Translational Biomedicine, St. Petersburg State University, St. Petersburg, Russia
| | - Anoushka Joglekar
- Tri-Institutional Computational Biology and Medicine, Weill Cornell Medicine, New York, NY, USA.,Brain and Mind Research Institute, Weill Cornell Medicine, New York, NY, USA.,Center for Neurogenetics, Weill Cornell Medicine, New York, NY, USA
| | | | - Julien Jarroux
- Brain and Mind Research Institute, Weill Cornell Medicine, New York, NY, USA.,Center for Neurogenetics, Weill Cornell Medicine, New York, NY, USA
| | - Alla L Lapidus
- Center for Algorithmic Biotechnology, Institute of Translational Biomedicine, St. Petersburg State University, St. Petersburg, Russia
| | - Hagen U Tilgner
- Brain and Mind Research Institute, Weill Cornell Medicine, New York, NY, USA. .,Center for Neurogenetics, Weill Cornell Medicine, New York, NY, USA.
| |
Collapse
|
19
|
Lio CT, Grabert G, Louadi Z, Fenn A, Baumbach J, Kacprowski T, List M, Tsoy O. Systematic analysis of alternative splicing in time course data using Spycone. Bioinformatics 2022; 39:6965022. [PMID: 36579860 PMCID: PMC9831059 DOI: 10.1093/bioinformatics/btac846] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2022] [Revised: 11/16/2022] [Accepted: 12/28/2022] [Indexed: 12/30/2022] Open
Abstract
MOTIVATION During disease progression or organism development, alternative splicing may lead to isoform switches that demonstrate similar temporal patterns and reflect the alternative splicing co-regulation of such genes. Tools for dynamic process analysis usually neglect alternative splicing. RESULTS Here, we propose Spycone, a splicing-aware framework for time course data analysis. Spycone exploits a novel IS detection algorithm and offers downstream analysis such as network and gene set enrichment. We demonstrate the performance of Spycone using simulated and real-world data of SARS-CoV-2 infection. AVAILABILITY AND IMPLEMENTATION The Spycone package is available as a PyPI package. The source code of Spycone is available under the GPLv3 license at https://github.com/yollct/spycone and the documentation at https://spycone.readthedocs.io/en/latest/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Chit Tong Lio
- Institute for Computational Systems Biology, University of Hamburg, Notkestrasse 9, Hamburg 22607, Germany,Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, Freising 85354, Germany
| | - Gordon Grabert
- Division Data Science in Biomedicine, Peter L. Reichertz Institute for Medical Informatics of Technische Universität Braunschweig and Hannover Medical School, Braunschweig 38106, Germany,Braunschweig Integrated Centre of Systems Biology (BRICS), TU Braunschweig, Braunschweig 38106, Germany
| | - Zakaria Louadi
- Institute for Computational Systems Biology, University of Hamburg, Notkestrasse 9, Hamburg 22607, Germany,Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, Freising 85354, Germany
| | - Amit Fenn
- Institute for Computational Systems Biology, University of Hamburg, Notkestrasse 9, Hamburg 22607, Germany,Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, Freising 85354, Germany
| | - Jan Baumbach
- Institute for Computational Systems Biology, University of Hamburg, Notkestrasse 9, Hamburg 22607, Germany,Institute of Mathematics and Computer Science, University of Southern Denmark, Odense 5000, Denmark
| | - Tim Kacprowski
- Division Data Science in Biomedicine, Peter L. Reichertz Institute for Medical Informatics of Technische Universität Braunschweig and Hannover Medical School, Braunschweig 38106, Germany,Braunschweig Integrated Centre of Systems Biology (BRICS), TU Braunschweig, Braunschweig 38106, Germany
| | | | - Olga Tsoy
- To whom correspondence should be addressed.
| |
Collapse
|
20
|
Miller AR, Wijeratne S, McGrath SD, Schieffer KM, Miller KE, Lee K, Mathew M, LaHaye S, Fitch JR, Kelly BJ, White P, Mardis ER, Wilson RK, Cottrell CE, Magrini V. Pacific Biosciences Fusion and Long Isoform Pipeline for Cancer Transcriptome-Based Resolution of Isoform Complexity. J Mol Diagn 2022; 24:1292-1306. [PMID: 36191838 DOI: 10.1016/j.jmoldx.2022.09.003] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2021] [Revised: 08/05/2022] [Accepted: 09/13/2022] [Indexed: 01/13/2023] Open
Abstract
Genomic profiling using short-read sequencing has utility in detecting disease-associated variation in both DNA and RNA. However, given the frequent occurrence of structural variation in cancer, molecular profiling using long-read sequencing improves the resolution of such events. For example, the Pacific Biosciences long-read RNA-sequencing (Iso-Seq) transcriptome protocol provides full-length isoform characterization, discernment of allelic phasing, and isoform discovery, and identifies expressed fusion partners. The Pacific Biosciences Fusion and Long Isoform Pipeline (PB_FLIP) incorporates a suite of RNA-sequencing software analysis tools and scripts to identify expressed fusion partners and isoforms. In addition, sequencing of a commercial reference (Spike-In RNA Variants) with known isoform complexity was performed and demonstrated high recall of the Iso-Seq and PB_FLIP workflow to benchmark our protocol and analysis performance. This study describes the utility of Iso-Seq and PB_FLIP analysis in improving deconvolution of complex structural variants and isoform detection within an institutional pediatric and adolescent/young adult translational cancer research cohort. The exemplar case studies demonstrate that Iso-Seq and PB_FLIP discover novel expressed fusion partners, resolve complex intragenic alterations, and discriminate between allele-specific expression profiles.
Collapse
Affiliation(s)
- Anthony R Miller
- The Steve and Cindy Rasmussen Institute for Genomic Medicine, Abigail Wexner Research Institute at Nationwide Children's Hospital, Columbus, Ohio
| | - Saranga Wijeratne
- The Steve and Cindy Rasmussen Institute for Genomic Medicine, Abigail Wexner Research Institute at Nationwide Children's Hospital, Columbus, Ohio
| | - Sean D McGrath
- The Steve and Cindy Rasmussen Institute for Genomic Medicine, Abigail Wexner Research Institute at Nationwide Children's Hospital, Columbus, Ohio
| | - Kathleen M Schieffer
- The Steve and Cindy Rasmussen Institute for Genomic Medicine, Abigail Wexner Research Institute at Nationwide Children's Hospital, Columbus, Ohio
| | - Katherine E Miller
- The Steve and Cindy Rasmussen Institute for Genomic Medicine, Abigail Wexner Research Institute at Nationwide Children's Hospital, Columbus, Ohio; Department of Pediatrics, The Ohio State University College of Medicine, Columbus, Ohio
| | - Kristy Lee
- The Steve and Cindy Rasmussen Institute for Genomic Medicine, Abigail Wexner Research Institute at Nationwide Children's Hospital, Columbus, Ohio; Department of Pediatrics, The Ohio State University College of Medicine, Columbus, Ohio; Department of Pathology, The Ohio State University College of Medicine, Columbus, Ohio
| | - Mariam Mathew
- The Steve and Cindy Rasmussen Institute for Genomic Medicine, Abigail Wexner Research Institute at Nationwide Children's Hospital, Columbus, Ohio; Department of Pediatrics, The Ohio State University College of Medicine, Columbus, Ohio; Department of Pathology, The Ohio State University College of Medicine, Columbus, Ohio
| | - Stephanie LaHaye
- The Steve and Cindy Rasmussen Institute for Genomic Medicine, Abigail Wexner Research Institute at Nationwide Children's Hospital, Columbus, Ohio
| | - James R Fitch
- The Steve and Cindy Rasmussen Institute for Genomic Medicine, Abigail Wexner Research Institute at Nationwide Children's Hospital, Columbus, Ohio
| | - Benjamin J Kelly
- The Steve and Cindy Rasmussen Institute for Genomic Medicine, Abigail Wexner Research Institute at Nationwide Children's Hospital, Columbus, Ohio
| | - Peter White
- The Steve and Cindy Rasmussen Institute for Genomic Medicine, Abigail Wexner Research Institute at Nationwide Children's Hospital, Columbus, Ohio; Department of Pediatrics, The Ohio State University College of Medicine, Columbus, Ohio
| | - Elaine R Mardis
- The Steve and Cindy Rasmussen Institute for Genomic Medicine, Abigail Wexner Research Institute at Nationwide Children's Hospital, Columbus, Ohio; Department of Pediatrics, The Ohio State University College of Medicine, Columbus, Ohio; Department of Neurosurgery, The Ohio State University College of Medicine, Columbus, Ohio
| | - Richard K Wilson
- The Steve and Cindy Rasmussen Institute for Genomic Medicine, Abigail Wexner Research Institute at Nationwide Children's Hospital, Columbus, Ohio; Department of Pediatrics, The Ohio State University College of Medicine, Columbus, Ohio
| | - Catherine E Cottrell
- The Steve and Cindy Rasmussen Institute for Genomic Medicine, Abigail Wexner Research Institute at Nationwide Children's Hospital, Columbus, Ohio; Department of Pediatrics, The Ohio State University College of Medicine, Columbus, Ohio; Department of Pathology, The Ohio State University College of Medicine, Columbus, Ohio.
| | - Vincent Magrini
- The Steve and Cindy Rasmussen Institute for Genomic Medicine, Abigail Wexner Research Institute at Nationwide Children's Hospital, Columbus, Ohio; Department of Pediatrics, The Ohio State University College of Medicine, Columbus, Ohio
| |
Collapse
|
21
|
Castaldi PJ, Abood A, Farber CR, Sheynkman GM. Bridging the splicing gap in human genetics with long-read RNA sequencing: finding the protein isoform drivers of disease. Hum Mol Genet 2022; 31:R123-R136. [PMID: 35960994 PMCID: PMC9585682 DOI: 10.1093/hmg/ddac196] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2022] [Revised: 08/08/2022] [Accepted: 08/09/2022] [Indexed: 02/04/2023] Open
Abstract
Aberrant splicing underlies many human diseases, including cancer, cardiovascular diseases and neurological disorders. Genome-wide mapping of splicing quantitative trait loci (sQTLs) has shown that genetic regulation of alternative splicing is widespread. However, identification of the corresponding isoform or protein products associated with disease-associated sQTLs is challenging with short-read RNA-seq, which cannot precisely characterize full-length transcript isoforms. Furthermore, contemporary sQTL interpretation often relies on reference transcript annotations, which are incomplete. Solutions to these issues may be found through integration of newly emerging long-read sequencing technologies. Long-read sequencing offers the capability to sequence full-length mRNA transcripts and, in some cases, to link sQTLs to transcript isoforms containing disease-relevant protein alterations. Here, we provide an overview of sQTL mapping approaches, the use of long-read sequencing to characterize sQTL effects on isoforms, the linkage of RNA isoforms to protein-level functions and comment on future directions in the field. Based on recent progress, long-read RNA sequencing promises to be part of the human disease genetics toolkit to discover and treat protein isoforms causing rare and complex diseases.
Collapse
Affiliation(s)
- Peter J Castaldi
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women’s Hospital, Boston, MA 02115, USA
- Division of General Medicine and Primary Care, Department of Medicine, Brigham and Women’s Hospital, Boston, MA 02115, USA
| | - Abdullah Abood
- Center for Public Health Genomics, School of Medicine, University of Virginia, Charlottesville, VA 22903, USA
- Department of Biochemistry and Molecular Genetics, School of Medicine, University of Virginia, Charlottesville, VA 22903, USA
| | - Charles R Farber
- Center for Public Health Genomics, School of Medicine, University of Virginia, Charlottesville, VA 22903, USA
- Department of Biochemistry and Molecular Genetics, School of Medicine, University of Virginia, Charlottesville, VA 22903, USA
- Department of Public Health Sciences, School of Medicine, University of Virginia, Charlottesville, VA 22903, USA
| | - Gloria M Sheynkman
- Center for Public Health Genomics, School of Medicine, University of Virginia, Charlottesville, VA 22903, USA
- Department of Biochemistry and Molecular Genetics, School of Medicine, University of Virginia, Charlottesville, VA 22903, USA
- Department of Molecular Physiology and Biological Physics, University of Virginia, Charlottesville, VA 22903, USA
- UVA Comprehensive Cancer Center, University of Virginia, Charlottesville, VA 22903, USA
| |
Collapse
|
22
|
Nanni AV, Morse AM, Newman JRB, Choquette NE, Wedow JM, Liu Z, Leakey ADB, Conesa A, Ainsworth EA, McIntyre LM. Variation in leaf transcriptome responses to elevated ozone corresponds with physiological sensitivity to ozone across maize inbred lines. Genetics 2022; 221:iyac080. [PMID: 35579358 PMCID: PMC9339315 DOI: 10.1093/genetics/iyac080] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2022] [Accepted: 04/27/2022] [Indexed: 11/13/2022] Open
Abstract
We examine the impact of sustained elevated ozone concentration on the leaf transcriptome of 5 diverse maize inbred genotypes, which vary in physiological sensitivity to ozone (B73, Mo17, Hp301, C123, and NC338), using long reads to assemble transcripts and short reads to quantify expression of these transcripts. More than 99% of the long reads, 99% of the assembled transcripts, and 97% of the short reads map to both B73 and Mo17 reference genomes. Approximately 95% of the genes with assembled transcripts belong to known B73-Mo17 syntenic loci and 94% of genes with assembled transcripts are present in all temperate lines in the nested association mapping pan-genome. While there is limited evidence for alternative splicing in response to ozone stress, there is a difference in the magnitude of differential expression among the 5 genotypes. The transcriptional response to sustained ozone stress in the ozone resistant B73 genotype (151 genes) was modest, while more than 3,300 genes were significantly differentially expressed in the more sensitive NC338 genotype. There is the potential for tandem duplication in 30% of genes with assembled transcripts, but there is no obvious association between potential tandem duplication and differential expression. Genes with a common response across the 5 genotypes (83 genes) were associated with photosynthesis, in particular photosystem I. The functional annotation of genes not differentially expressed in B73 but responsive in the other 4 genotypes (789) identifies reactive oxygen species. This suggests that B73 has a different response to long-term ozone exposure than the other 4 genotypes. The relative magnitude of the genotypic response to ozone, and the enrichment analyses are consistent regardless of whether aligning short reads to: long read assembled transcripts; the B73 reference; the Mo17 reference. We find that prolonged ozone exposure directly impacts the photosynthetic machinery of the leaf.
Collapse
Affiliation(s)
- Adalena V Nanni
- Department of Molecular Genetics and Microbiology, University of Florida, Gainesville, FL 32611, USA
- Genetics Institute, University of Florida, Gainesville, FL 32611, USA
| | - Alison M Morse
- Department of Molecular Genetics and Microbiology, University of Florida, Gainesville, FL 32611, USA
- Genetics Institute, University of Florida, Gainesville, FL 32611, USA
| | - Jeremy R B Newman
- Genetics Institute, University of Florida, Gainesville, FL 32611, USA
- Department of Pathology, University of Florida, Gainesville, FL 32611, USA
| | - Nicole E Choquette
- Department of Plant Biology, Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
- Department of Crop Sciences, Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
| | - Jessica M Wedow
- Department of Plant Biology, Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
- Department of Crop Sciences, Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
| | - Zihao Liu
- Department of Molecular Genetics and Microbiology, University of Florida, Gainesville, FL 32611, USA
- Genetics Institute, University of Florida, Gainesville, FL 32611, USA
| | - Andrew D B Leakey
- Department of Plant Biology, Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
- Department of Crop Sciences, Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
| | - Ana Conesa
- Department of Cell and Microbial Sciences, University of Florida, Gainesville, FL 32611, USA
- Institute for Integrative Systems Biology, Spanish National Research Council, 46980 Paterna, Spain
| | - Elizabeth A Ainsworth
- Department of Plant Biology, Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
- Department of Crop Sciences, Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
- USDA ARS Global Change and Photosynthesis Research Unit, Urbana, IL 61801, USA
| | - Lauren M McIntyre
- Department of Molecular Genetics and Microbiology, University of Florida, Gainesville, FL 32611, USA
- Genetics Institute, University of Florida, Gainesville, FL 32611, USA
| |
Collapse
|
23
|
Reixachs‐Solé M, Eyras E. Uncovering the impacts of alternative splicing on the proteome with current omics techniques. WILEY INTERDISCIPLINARY REVIEWS. RNA 2022; 13:e1707. [PMID: 34979593 PMCID: PMC9542554 DOI: 10.1002/wrna.1707] [Citation(s) in RCA: 19] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/26/2021] [Revised: 11/27/2021] [Accepted: 11/29/2021] [Indexed: 12/15/2022]
Abstract
The high-throughput sequencing of cellular RNAs has underscored a broad effect of isoform diversification through alternative splicing on the transcriptome. Moreover, the differential production of transcript isoforms from gene loci has been recognized as a critical mechanism in cell differentiation, organismal development, and disease. Yet, the extent of the impact of alternative splicing on protein production and cellular function remains a matter of debate. Multiple experimental and computational approaches have been developed in recent years to address this question. These studies have unveiled how molecular changes at different steps in the RNA processing pathway can lead to differences in protein production and have functional effects. New and emerging experimental technologies open exciting new opportunities to develop new methods to fully establish the connection between messenger RNA expression and protein production and to further investigate how RNA variation impacts the proteome and cell function. This article is categorized under: RNA Processing > Splicing Regulation/Alternative Splicing Translation > Regulation RNA Evolution and Genomics > Computational Analyses of RNA.
Collapse
Affiliation(s)
- Marina Reixachs‐Solé
- The John Curtin School of Medical ResearchAustralian National UniversityCanberraAustralian Capital TerritoryAustralia
- EMBL Australia Partner Laboratory Network and the Australian National UniversityCanberraAustralian Capital TerritoryAustralia
| | - Eduardo Eyras
- The John Curtin School of Medical ResearchAustralian National UniversityCanberraAustralian Capital TerritoryAustralia
- EMBL Australia Partner Laboratory Network and the Australian National UniversityCanberraAustralian Capital TerritoryAustralia
- Catalan Institution for Research and Advanced StudiesBarcelonaSpain
- Hospital del Mar Medical Research Institute (IMIM)BarcelonaSpain
| |
Collapse
|
24
|
Zhou D, Tran Y, Abou Elela S, Scott MS. SAPFIR: A webserver for the identification of alternative protein features. BMC Bioinformatics 2022; 23:250. [PMID: 35751026 PMCID: PMC9229502 DOI: 10.1186/s12859-022-04804-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2022] [Accepted: 06/20/2022] [Indexed: 11/29/2022] Open
Abstract
Background Alternative splicing can increase the diversity of gene functions by generating multiple isoforms with different sequences and functions. However, the extent to which splicing events have functional consequences remains unclear and predicting the impact of splicing events on protein activity is limited to gene-specific analysis. Results To accelerate the identification of functionally relevant alternative splicing events we created SAPFIR, a predictor of protein features associated with alternative splicing events. This webserver tool uses InterProScan to predict protein features such as functional domains, motifs and sites in the human and mouse genomes and link them to alternative splicing events. Alternative protein features are displayed as functions of the transcripts and splice sites. SAPFIR could be used to analyze proteins generated from a single gene or a group of genes and can directly identify alternative protein features in large sequence data sets. The accuracy and utility of SAPFIR was validated by its ability to rediscover previously validated alternative protein domains. In addition, our de novo analysis of public datasets using SAPFIR indicated that only a small portion of alternative protein domains was conserved between human and mouse, and that in human, genes involved in nervous system process, regulation of DNA-templated transcription and aging are more likely to produce isoforms missing functional domains due to alternative splicing. Conclusion Overall SAPFIR represents a new tool for the rapid identification of functional alternative splicing events and enables the identification of cellular functions affected by a defined splicing program. SAPFIR is freely available at https://bioinfo-scottgroup.med.usherbrooke.ca/sapfir/, a website implemented in Python, with all major browsers supported. The source code is available at https://github.com/DelongZHOU/SAPFIR. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-022-04804-w.
Collapse
Affiliation(s)
- Delong Zhou
- Département de Microbiologie et d'infectiologie, Faculté de Médecine et des Sciences de la Santé, Université de Sherbrooke, Sherbrooke, QC, J1E 4K8, Canada
| | - Yvan Tran
- Département de Biochimie et Génomique Fonctionnelle, Faculté de Médecine et des Sciences de la Santé, Université de Sherbrooke, Sherbrooke, QC, J1E 4K8, Canada
| | - Sherif Abou Elela
- Département de Microbiologie et d'infectiologie, Faculté de Médecine et des Sciences de la Santé, Université de Sherbrooke, Sherbrooke, QC, J1E 4K8, Canada.
| | - Michelle S Scott
- Département de Biochimie et Génomique Fonctionnelle, Faculté de Médecine et des Sciences de la Santé, Université de Sherbrooke, Sherbrooke, QC, J1E 4K8, Canada.
| |
Collapse
|
25
|
Katsoula G, Steinberg J, Tuerlings M, Coutinho de Almeida R, Southam L, Swift D, Meulenbelt I, Wilkinson JM, Zeggini E. A molecular map of long non-coding RNA expression, isoform switching and alternative splicing in osteoarthritis. Hum Mol Genet 2022; 31:2090-2105. [PMID: 35088088 PMCID: PMC9239745 DOI: 10.1093/hmg/ddac017] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2021] [Revised: 11/22/2021] [Accepted: 01/10/2022] [Indexed: 11/30/2022] Open
Abstract
Osteoarthritis is a prevalent joint disease and a major cause of disability worldwide with no curative therapy. Development of disease-modifying therapies requires a better understanding of the molecular mechanisms underpinning disease. A hallmark of osteoarthritis is cartilage degradation. To define molecular events characterizing osteoarthritis at the whole transcriptome level, we performed deep RNA sequencing in paired samples of low- and high-osteoarthritis grade knee cartilage derived from 124 patients undergoing total joint replacement. We detected differential expression between low- and high-osteoarthritis grade articular cartilage for 365 genes and identified a 38-gene signature in osteoarthritis cartilage by replicating our findings in an independent dataset. We also found differential expression for 25 novel long non-coding RNA genes (lncRNAs) and identified potential lncRNA interactions with RNA-binding proteins in osteoarthritis. We assessed alterations in the relative usage of individual gene transcripts and identified differential transcript usage for 82 genes, including ABI3BP, coding for an extracellular matrix protein, AKT1S1, a negative regulator of the mTOR pathway and TPRM4, coding for a transient receptor potential channel. We further assessed genome-wide differential splicing, for the first time in osteoarthritis, and detected differential splicing for 209 genes, which were enriched for extracellular matrix, proteoglycans and integrin surface interactions terms. In the largest study of its kind in osteoarthritis, we find that isoform and splicing changes, in addition to extensive differences in both coding and non-coding sequence expression, are associated with disease and demonstrate a novel layer of genomic complexity to osteoarthritis pathogenesis.
Collapse
Affiliation(s)
- Georgia Katsoula
- Technical University of Munich (TUM), School of Medicine, Munich 81675, Germany
- Institute of Translational Genomics, Helmholtz Zentrum München-German Research Center for Environmental Health, Neuherberg 85764, Germany
| | - Julia Steinberg
- Institute of Translational Genomics, Helmholtz Zentrum München-German Research Center for Environmental Health, Neuherberg 85764, Germany
- Daffodil Centre, University of Sydney, a joint venture with Cancer Council NSW, Sydney, NSW 1340, Australia
| | - Margo Tuerlings
- Department of Biomedical Data Sciences, Section Molecular Epidemiology, Leiden University Medical Center, Leiden 2333 ZC, The Netherlands
| | - Rodrigo Coutinho de Almeida
- Department of Biomedical Data Sciences, Section Molecular Epidemiology, Leiden University Medical Center, Leiden 2333 ZC, The Netherlands
| | - Lorraine Southam
- Institute of Translational Genomics, Helmholtz Zentrum München-German Research Center for Environmental Health, Neuherberg 85764, Germany
| | - Diane Swift
- Department of Oncology and Metabolism, University of Sheffield, Metabolic Bone Unit, Sorby Wing Northern General Hospital Sheffield, Sheffield, S5 7AU, UK
| | - Ingrid Meulenbelt
- Department of Biomedical Data Sciences, Section Molecular Epidemiology, Leiden University Medical Center, Leiden 2333 ZC, The Netherlands
| | - J Mark Wilkinson
- Department of Oncology and Metabolism, University of Sheffield, Metabolic Bone Unit, Sorby Wing Northern General Hospital Sheffield, Sheffield, S5 7AU, UK
| | - Eleftheria Zeggini
- Institute of Translational Genomics, Helmholtz Zentrum München-German Research Center for Environmental Health, Neuherberg 85764, Germany
- Technical University of Munich (TUM) and Klinikum Rechts der Isar, TUM School of Medicine, Munich 81675, Germany
| |
Collapse
|
26
|
Arzalluz-Luque A, Salguero P, Tarazona S, Conesa A. acorde unravels functionally interpretable networks of isoform co-usage from single cell data. Nat Commun 2022; 13:1828. [PMID: 35383181 PMCID: PMC8983708 DOI: 10.1038/s41467-022-29497-w] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2021] [Accepted: 03/16/2022] [Indexed: 12/13/2022] Open
Abstract
Alternative splicing (AS) is a highly-regulated post-transcriptional mechanism known to modulate isoform expression within genes and contribute to cell-type identity. However, the extent to which alternative isoforms establish co-expression networks that may be relevant in cellular function has not been explored yet. Here, we present acorde, a pipeline that successfully leverages bulk long reads and single-cell data to confidently detect alternative isoform co-expression relationships. To achieve this, we develop and validate percentile correlations, an innovative approach that overcomes data sparsity and yields accurate co-expression estimates from single-cell data. Next, acorde uses correlations to cluster co-expressed isoforms into a network, unraveling cell type-specific alternative isoform usage patterns. By selecting same-gene isoforms between these clusters, we subsequently detect and characterize genes with co-differential isoform usage (coDIU) across cell types. Finally, we predict functional elements from long read-defined isoforms and provide insight into biological processes, motifs, and domains potentially controlled by the coordination of post-transcriptional regulation. The code for acorde is available at https://github.com/ConesaLab/acorde .
Collapse
Affiliation(s)
- Angeles Arzalluz-Luque
- Department of Applied Statistics, Operations Research and Quality, Universitat Politècnica de València, Valencia, Spain
- Institute for Integrative Systems Biology (CSIC-UV), Spanish National Research Council, Paterna, Valencia, Spain
| | - Pedro Salguero
- Department of Applied Statistics, Operations Research and Quality, Universitat Politècnica de València, Valencia, Spain
| | - Sonia Tarazona
- Department of Applied Statistics, Operations Research and Quality, Universitat Politècnica de València, Valencia, Spain.
| | - Ana Conesa
- Institute for Integrative Systems Biology (CSIC-UV), Spanish National Research Council, Paterna, Valencia, Spain.
- Microbiology and Cell Sciences Department, Institute for Food and Agricultural Research, University of Florida, Gainesville, FL, USA.
| |
Collapse
|
27
|
Tseng E, Underwood JG, Evans Hutzenbiler BD, Trojahn S, Kingham B, Shevchenko O, Bernberg E, Vierra M, Robbins CT, Jansen HT, Kelley JL. Long-read isoform sequencing reveals tissue-specific isoform expression between active and hibernating brown bears (Ursus arctos). G3 (BETHESDA, MD.) 2022; 12:6472356. [PMID: 35100340 PMCID: PMC9210309 DOI: 10.1093/g3journal/jkab422] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/12/2021] [Accepted: 11/17/2021] [Indexed: 11/13/2022]
Abstract
Understanding hibernation in brown bears (Ursus arctos) can provide insight into some human diseases. During hibernation, brown bears experience periods of insulin resistance, physical inactivity, extreme bradycardia, obesity, and the absence of urine production. These states closely mimic aspects of human diseases such as type 2 diabetes, muscle atrophy, as well as renal and heart failure. The reversibility of these states from hibernation to active season enables the identification of mediators with possible therapeutic value for humans. Recent studies have identified genes and pathways that are differentially expressed between active and hibernation seasons in bears. However, little is known about the role of differential expression of gene isoforms on hibernation physiology. To identify both distinct and novel mRNA isoforms, full-length RNA-sequencing (Iso-Seq) was performed on adipose, skeletal muscle, and liver from three individual bears sampled during both active and hibernation seasons. The existing reference genome annotation was improved by combining it with the Iso-Seq data. Short-read RNA-sequencing data from six individuals were mapped to the new reference annotation to quantify differential isoform usage (DIU) between tissues and seasons. We identified differentially expressed isoforms in all three tissues, to varying degrees. Adipose had a high level of DIU with isoform switching, regardless of whether the genes were differentially expressed. Our analyses revealed that DIU, even in the absence of differential gene expression, is an important mechanism for modulating genes during hibernation. These findings demonstrate the value of isoform expression studies and will serve as the basis for deeper exploration into hibernation biology.
Collapse
Affiliation(s)
| | | | - Brandon D Evans Hutzenbiler
- Department of Integrative Physiology and Neuroscience, Washington State University, Pullman, WA 99164, USA.,School of the Environment, Washington State University, Pullman, WA 99164, USA
| | - Shawn Trojahn
- School of Biological Sciences, Washington State University, Pullman, WA 99164, USA
| | - Brewster Kingham
- Sequencing & Genotyping Center, Delaware Biotechnology Institute, University of Delaware, Newark, DE 19711, USA
| | - Olga Shevchenko
- Sequencing & Genotyping Center, Delaware Biotechnology Institute, University of Delaware, Newark, DE 19711, USA
| | - Erin Bernberg
- Sequencing & Genotyping Center, Delaware Biotechnology Institute, University of Delaware, Newark, DE 19711, USA
| | | | - Charles T Robbins
- School of the Environment, Washington State University, Pullman, WA 99164, USA.,School of Biological Sciences, Washington State University, Pullman, WA 99164, USA
| | - Heiko T Jansen
- Department of Integrative Physiology and Neuroscience, Washington State University, Pullman, WA 99164, USA
| | - Joanna L Kelley
- School of Biological Sciences, Washington State University, Pullman, WA 99164, USA
| |
Collapse
|
28
|
Louadi Z, Elkjaer ML, Klug M, Lio CT, Fenn A, Illes Z, Bongiovanni D, Baumbach J, Kacprowski T, List M, Tsoy O. Functional enrichment of alternative splicing events with NEASE reveals insights into tissue identity and diseases. Genome Biol 2021; 22:327. [PMID: 34857024 PMCID: PMC8638120 DOI: 10.1186/s13059-021-02538-1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2021] [Accepted: 11/10/2021] [Indexed: 01/27/2023] Open
Abstract
Alternative splicing (AS) is an important aspect of gene regulation. Nevertheless, its role in molecular processes and pathobiology is far from understood. A roadblock is that tools for the functional analysis of AS-set events are lacking. To mitigate this, we developed NEASE, a tool integrating pathways with structural annotations of protein-protein interactions to functionally characterize AS events. We show in four application cases how NEASE can identify pathways contributing to tissue identity and cell type development, and how it highlights splicing-related biomarkers. With a unique view on AS, NEASE generates unique and meaningful biological insights complementary to classical pathways analysis.
Collapse
Affiliation(s)
- Zakaria Louadi
- Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, 85354, Freising, Germany
- Institute for Computational Systems Biology, University of Hamburg, Notkestrasse 9, 22607, Hamburg, Germany
| | - Maria L Elkjaer
- Department of Neurology, Odense University Hospital, Odense, Denmark
- Institute of Clinical Research, University of Southern Denmark, Odense, Denmark
- Institute of Molecular Medicine, University of Southern Denmark, Odense, Denmark
| | - Melissa Klug
- Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, 85354, Freising, Germany
- Department of Internal Medicine I, School of Medicine, University hospital rechts der Isar, Technical University of Munich, Munich, Germany
- German Center for Cardiovascular Research (DZHK), Partner Site Munich Heart Alliance, Munich, Germany
| | - Chit Tong Lio
- Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, 85354, Freising, Germany
- Institute for Computational Systems Biology, University of Hamburg, Notkestrasse 9, 22607, Hamburg, Germany
| | - Amit Fenn
- Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, 85354, Freising, Germany
- Institute for Computational Systems Biology, University of Hamburg, Notkestrasse 9, 22607, Hamburg, Germany
| | - Zsolt Illes
- Department of Neurology, Odense University Hospital, Odense, Denmark
- Institute of Clinical Research, University of Southern Denmark, Odense, Denmark
- Institute of Molecular Medicine, University of Southern Denmark, Odense, Denmark
| | - Dario Bongiovanni
- Department of Internal Medicine I, School of Medicine, University hospital rechts der Isar, Technical University of Munich, Munich, Germany
- German Center for Cardiovascular Research (DZHK), Partner Site Munich Heart Alliance, Munich, Germany
- Department of Cardiovascular Medicine, Humanitas Clinical and Research Center IRCCS and Humanitas University, Rozzano, Milan, Italy
| | - Jan Baumbach
- Institute for Computational Systems Biology, University of Hamburg, Notkestrasse 9, 22607, Hamburg, Germany
- Institute of Mathematics and Computer Science, University of Southern Denmark, Campusvej 55, 5000, Odense, Denmark
| | - Tim Kacprowski
- Division Data Science in Biomedicine, Peter L. Reichertz Institute for Medical Informatics of Technische Universität Braunschweig and Hannover Medical School, Braunschweig, Germany
- Braunschweig Integrated Centre of Systems Biology (BRICS), TU Braunschweig, Braunschweig, Germany
| | - Markus List
- Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, 85354, Freising, Germany.
| | - Olga Tsoy
- Institute for Computational Systems Biology, University of Hamburg, Notkestrasse 9, 22607, Hamburg, Germany.
| |
Collapse
|
29
|
Stephenson M, Nip KM, HafezQorani S, Gagalova KK, Yang C, Warren RL, Birol I. RNA-Scoop: interactive visualization of transcripts in single-cell transcriptomes. NAR Genom Bioinform 2021; 3:lqab105. [PMID: 34859209 PMCID: PMC8633890 DOI: 10.1093/nargab/lqab105] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2021] [Revised: 08/21/2021] [Accepted: 11/26/2021] [Indexed: 11/12/2022] Open
Abstract
Recent advances in single-cell RNA sequencing technologies have made detection of transcripts in single cells possible. The level of resolution provided by these technologies can be used to study changes in transcript usage across cell populations and help investigate new biology. Here, we introduce RNA-Scoop, an interactive cell cluster and transcriptome visualization tool to analyze transcript usage across cell categories and clusters. The tool allows users to examine differential transcript expression across clusters and investigate how usage of specific transcript expression mechanisms varies across cell groups.
Collapse
Affiliation(s)
- Maria Stephenson
- Canada's Michael Smith Genome Sciences Centre, BC Cancer, Vancouver, BC V5Z 4S6, Canada
- Computer Science Co-op Program, University of British Columbia, Vancouver, BC V6T 1Z4, Canada
| | - Ka Ming Nip
- Canada's Michael Smith Genome Sciences Centre, BC Cancer, Vancouver, BC V5Z 4S6, Canada
- Bioinformatics Graduate Program, University of British Columbia, Vancouver, BC V5Z 4S6, Canada
| | - Saber HafezQorani
- Canada's Michael Smith Genome Sciences Centre, BC Cancer, Vancouver, BC V5Z 4S6, Canada
- Bioinformatics Graduate Program, University of British Columbia, Vancouver, BC V5Z 4S6, Canada
| | - Kristina K Gagalova
- Canada's Michael Smith Genome Sciences Centre, BC Cancer, Vancouver, BC V5Z 4S6, Canada
- Bioinformatics Graduate Program, University of British Columbia, Vancouver, BC V5Z 4S6, Canada
| | - Chen Yang
- Canada's Michael Smith Genome Sciences Centre, BC Cancer, Vancouver, BC V5Z 4S6, Canada
- Bioinformatics Graduate Program, University of British Columbia, Vancouver, BC V5Z 4S6, Canada
| | - René L Warren
- Canada's Michael Smith Genome Sciences Centre, BC Cancer, Vancouver, BC V5Z 4S6, Canada
| | - Inanc Birol
- Canada's Michael Smith Genome Sciences Centre, BC Cancer, Vancouver, BC V5Z 4S6, Canada
- Department of Medical Genetics, University of British Columbia, Vancouver, BC V6H 3N1, Canada
| |
Collapse
|
30
|
Altered cell and RNA isoform diversity in aging Down syndrome brains. Proc Natl Acad Sci U S A 2021; 118:2114326118. [PMID: 34795060 PMCID: PMC8617492 DOI: 10.1073/pnas.2114326118] [Citation(s) in RCA: 39] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/08/2021] [Indexed: 12/11/2022] Open
Abstract
Down syndrome (DS) neurocognitive disabilities associated with trisomy 21 are known; however, gene changes within individual brain cells occurring with age are unknown. Here, we interrogated >170,000 cells from 29 aging DS and control brains using single-nucleus RNA sequencing. We observed increases in inhibitory-over-excitatory neurons, microglial activation in the youngest DS brains coinciding with overexpression of genes associated with microglial-mediated synaptic pruning, and overexpression of the chromosome 21 gene RUNX1 that may be a potential driving factor in microglial activation. Single-nucleus long-read sequencing revealed hundreds of thousands of unannotated RNA transcripts. These included diverse species for the Alzheimer’s disease gene—amyloid precursor protein—that contained intra-exonic junctions previously associated with somatic gene recombination, which was also identified in ∼8,000 other genes. Down syndrome (DS), trisomy of human chromosome 21 (HSA21), is characterized by lifelong cognitive impairments and the development of the neuropathological hallmarks of Alzheimer’s disease (AD). The cellular and molecular modifications responsible for these effects are not understood. Here we performed single-nucleus RNA sequencing (snRNA-seq) employing both short- (Illumina) and long-read (Pacific Biosciences) sequencing technologies on a total of 29 DS and non-DS control prefrontal cortex samples. In DS, the ratio of inhibitory-to-excitatory neurons was significantly increased, which was not observed in previous reports examining sporadic AD. DS microglial transcriptomes displayed AD-related aging and activation signatures in advance of AD neuropathology, with increased microglial expression of C1q complement genes (associated with dendritic pruning) and the HSA21 transcription factor gene RUNX1. Long-read sequencing detected vast RNA isoform diversity within and among specific cell types, including numerous sequences that differed between DS and control brains. Notably, over 8,000 genes produced RNAs containing intra-exonic junctions, including amyloid precursor protein (APP) that had previously been associated with somatic gene recombination. These and related results illuminate large-scale cellular and transcriptomic alterations as features of the aging DS brain.
Collapse
|
31
|
Karakulak T, Moch H, von Mering C, Kahraman A. Probing Isoform Switching Events in Various Cancer Types: Lessons From Pan-Cancer Studies. Front Mol Biosci 2021; 8:726902. [PMID: 34888349 PMCID: PMC8650491 DOI: 10.3389/fmolb.2021.726902] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2021] [Accepted: 11/01/2021] [Indexed: 12/03/2022] Open
Abstract
Alternative splicing is an essential regulatory mechanism for gene expression in mammalian cells contributing to protein, cellular, and species diversity. In cancer, alternative splicing is frequently disturbed, leading to changes in the expression of alternatively spliced protein isoforms. Advances in sequencing technologies and analysis methods led to new insights into the extent and functional impact of disturbed alternative splicing events. In this review, we give a brief overview of the molecular mechanisms driving alternative splicing, highlight the function of alternative splicing in healthy tissues and describe how alternative splicing is disrupted in cancer. We summarize current available computational tools for analyzing differential transcript usage, isoform switching events, and the pathogenic impact of cancer-specific splicing events. Finally, the strategies of three recent pan-cancer studies on isoform switching events are compared. Their methodological similarities and discrepancies are highlighted and lessons learned from the comparison are listed. We hope that our assessment will lead to new and more robust methods for cancer-specific transcript detection and help to produce more accurate functional impact predictions of isoform switching events.
Collapse
Affiliation(s)
- Tülay Karakulak
- Department of Molecular Life Sciences, University of Zurich, Zurich, Switzerland
- Department of Pathology and Molecular Pathology, University Hospital Zurich, Zurich, Switzerland
- Swiss Informatics Institute, Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Holger Moch
- Department of Pathology and Molecular Pathology, University Hospital Zurich, Zurich, Switzerland
- Faculty of Medicine, University of Zurich, Zurich, Switzerland
| | - Christian von Mering
- Department of Molecular Life Sciences, University of Zurich, Zurich, Switzerland
- Swiss Informatics Institute, Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Abdullah Kahraman
- Department of Pathology and Molecular Pathology, University Hospital Zurich, Zurich, Switzerland
- Swiss Informatics Institute, Swiss Institute of Bioinformatics, Lausanne, Switzerland
| |
Collapse
|
32
|
Lorenzi C, Barriere S, Arnold K, Luco RF, Oldfield AJ, Ritchie W. IRFinder-S: a comprehensive suite to discover and explore intron retention. Genome Biol 2021; 22:307. [PMID: 34749764 PMCID: PMC8573998 DOI: 10.1186/s13059-021-02515-8] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2021] [Accepted: 10/12/2021] [Indexed: 12/15/2022] Open
Abstract
Accurate quantification and detection of intron retention levels require specialized software. Building on our previous software, we create a suite of tools called IRFinder-S, to analyze and explore intron retention events in multiple samples. Specifically, IRFinder-S allows a better identification of true intron retention events using a convolutional neural network, allows the sharing of intron retention results between labs, integrates a dynamic database to explore and contrast available samples, and provides a tested method to detect differential levels of intron retention.
Collapse
Affiliation(s)
- Claudio Lorenzi
- Institut de Génétique Humaine, Centre National de la Recherche Scientifique (CNRS), Université de Montpellier, Montpellier, France
| | - Sylvain Barriere
- Institut de Génétique Humaine, Centre National de la Recherche Scientifique (CNRS), Université de Montpellier, Montpellier, France
| | - Katharina Arnold
- Institut de Génétique Humaine, Centre National de la Recherche Scientifique (CNRS), Université de Montpellier, Montpellier, France
| | - Reini F Luco
- Institut de Génétique Humaine, Centre National de la Recherche Scientifique (CNRS), Université de Montpellier, Montpellier, France
| | - Andrew J Oldfield
- Institut de Génétique Humaine, Centre National de la Recherche Scientifique (CNRS), Université de Montpellier, Montpellier, France
| | - William Ritchie
- Institut de Génétique Humaine, Centre National de la Recherche Scientifique (CNRS), Université de Montpellier, Montpellier, France.
| |
Collapse
|
33
|
A global map of associations between types of protein posttranslational modifications and human genetic diseases. iScience 2021; 24:102917. [PMID: 34430807 PMCID: PMC8365368 DOI: 10.1016/j.isci.2021.102917] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2021] [Revised: 06/27/2021] [Accepted: 07/27/2021] [Indexed: 12/14/2022] Open
Abstract
There are >200 types of protein posttranslational modifications (PTMs) described in eukaryotes, each with unique proteome coverage and functions. We hypothesized that some genetic diseases may be caused by the removal of a specific type of PTMs by genomic variants and the consequent deregulation of particular functions. We collected >320,000 human PTMs representing 59 types and crossed them with >4M nonsynonymous DNA variants annotated with predicted pathogenicity and disease associations. We report >1.74M PTM-variant co-occurrences that an enrichment analysis distributed into 215 pairwise associations between 18 PTM types and 148 genetic diseases. Of them, 42% were not previously described. Removal of lysine acetylation exerts the most pronounced effect, and less studied PTM types such as S-glutathionylation or S-nitrosylation show relevance. Using pathogenicity predictions, we identified PTM sites that may produce particular diseases if prevented. Our results provide evidence of a substantial impact of PTM-specific removal on the pathogenesis of genetic diseases and phenotypes. There is an enrichment of disease-associated nsSNVs preventing certain types of PTMs We report 215 pairwise associations between 18 PTM types and 148 genetic diseases The removal of lysine acetylation exerts the most pronounced effect We report a set of PTM sites that may produce particular diseases if prevented
Collapse
|
34
|
Ye C, Zhao D, Ye W, Wu X, Ji G, Li QQ, Lin J. QuantifyPoly(A): reshaping alternative polyadenylation landscapes of eukaryotes with weighted density peak clustering. Brief Bioinform 2021; 22:6319934. [PMID: 34255024 DOI: 10.1093/bib/bbab268] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2021] [Revised: 06/23/2021] [Accepted: 06/23/2021] [Indexed: 01/09/2023] Open
Abstract
The dynamic choice of different polyadenylation sites in a gene is referred to as alternative polyadenylation, which functions in many important biological processes. Large-scale messenger RNA 3' end sequencing has revealed that cleavage sites for polyadenylation are presented with microheterogeneity. To date, the conventional determination of polyadenylation site clusters is subjective and arbitrary, leading to inaccurate annotations. Here, we present a weighted density peak clustering method, QuantifyPoly(A), to accurately quantify genome-wide polyadenylation choices. Applying QuantifyPoly(A) on published 3' end sequencing datasets from both animals and plants, their polyadenylation profiles are reshaped into myriads of novel polyadenylation site clusters. Most of these novel polyadenylation site clusters show significantly dynamic usage across different biological samples or associate with binding sites of trans-acting factors. Upstream sequences of these clusters are enriched with polyadenylation signals UGUA, UAAA and/or AAUAAA in a species-dependent manner. Polyadenylation site clusters also exhibit species specificity, while plants ones generally show higher microheterogeneity than that of animals. QuantifyPoly(A) is broadly applicable to any types of 3' end sequencing data and species for accurate quantification and construction of the complex and dynamic polyadenylation landscape and enables us to decode alternative polyadenylation events invisible to conventional methods at a much higher resolution.
Collapse
Affiliation(s)
- Congting Ye
- Key Laboratory of the Ministry of Education for Coastal and Wetland Ecosystems, College of the Environment and Ecology, Xiamen University, Xiamen, Fujian 361102, China
| | - Danhui Zhao
- Key Laboratory of the Ministry of Education for Coastal and Wetland Ecosystems, College of the Environment and Ecology, Xiamen University, Xiamen, Fujian 361102, China
| | - Wenbin Ye
- Department of Automation, Xiamen University, Xiamen, Fujian 361102, China
| | - Xiaohui Wu
- Department of Automation, Xiamen University, Xiamen, Fujian 361102, China
| | - Guoli Ji
- Department of Automation, Xiamen University, Xiamen, Fujian 361102, China
| | - Qingshun Q Li
- Key Laboratory of the Ministry of Education for Coastal and Wetland Ecosystems, College of the Environment and Ecology, Xiamen University, Xiamen, Fujian 361102, China.,Graduate College of Biomedical Sciences, Western University of Health Sciences, Pomona, CA 91766, USA
| | - Juncheng Lin
- Key Laboratory of the Ministry of Education for Coastal and Wetland Ecosystems, College of the Environment and Ecology, Xiamen University, Xiamen, Fujian 361102, China.,FAFU-UCR Joint Center, Horticulture Biology and Metabolomics Center, Haixia Institute of Science and Technology, Fujian Agriculture and Forestry University, Fuzhou, Fujian 350002, China
| |
Collapse
|
35
|
Alternative mRNA Processing of Innate Response Pathways in Respiratory Syncytial Virus (RSV) Infection. Viruses 2021; 13:v13020218. [PMID: 33572560 PMCID: PMC7912025 DOI: 10.3390/v13020218] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2021] [Revised: 01/27/2021] [Accepted: 01/28/2021] [Indexed: 12/14/2022] Open
Abstract
The innate immune response (IIR) involves rapid genomic expression of protective interferons (IFNs) and inflammatory cytokines triggered by intracellular viral replication. Although the transcriptional control of the innate pathway is known in substantial detail, little is understood about the complexity of alternative splicing (AS) and alternative polyadenylation (APA) of mRNAs underlying the cellular IIR. In this study, we applied single-molecule, real-time (SMRT) sequencing with mRNA quantitation using short-read mRNA sequencing to characterize changes in mRNA processing in the epithelial response to respiratory syncytial virus (RSV) replication. Mock or RSV-infected human small-airway epithelial cells (hSAECs) were profiled using SMRT sequencing and the curated transcriptome analyzed by structural and quality annotation of novel transcript isoforms (SQANTI). We identified 113,082 unique isoforms; 28,561 represented full splice matches, and 45% of genes expressed six or greater AS mRNA isoforms. Identification of differentially expressed AS isoforms was accomplished by mapping a short-read RNA sequencing expression matrix to the curated transcriptome, and 905 transcripts underwent differential polyadenylation site analysis enriched in protein secretion, translation, and mRNA degradation. We focused on 355 genes showing differential isoform utilization (DIU), indicating where a new AS isoform becomes a major fraction of mRNA isoforms expressed. In pathway and network enrichment analyses, we observed that DIU transcripts are substantially enriched in cell cycle control and IIR pathways. Interestingly, the RelA/IRF7 innate regulators showed substantial DIU where major transcripts included distinct isoforms with exon occlusion, intron inclusion, and alternative transcription start site utilization. We validated the presence of RelA and IRF7 AS isoforms as well as their induction by RSV using eight isoform-specific RT-PCR assays. These isoforms were identified in both immortalized and primary small-airway epithelial cells. We concluded that the cell cycle and IIR are differentially spliced in response to RSV. These data indicate that substantial post-transcriptional complexity regulates the antiviral response.
Collapse
|