1
|
Bryce-Smith S, Burri D, Gazzara MR, Herrmann CJ, Danecka W, Fitzsimmons CM, Wan YK, Zhuang F, Fansler MM, Fernández JM, Ferret M, Gonzalez-Uriarte A, Haynes S, Herdman C, Kanitz A, Katsantoni M, Marini F, McDonnel E, Nicolet B, Poon CL, Rot G, Schärfen L, Wu PJ, Yoon Y, Barash Y, Zavolan M. Extensible benchmarking of methods that identify and quantify polyadenylation sites from RNA-seq data. RNA (NEW YORK, N.Y.) 2023; 29:1839-1855. [PMID: 37816550 PMCID: PMC10653393 DOI: 10.1261/rna.079849.123] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/21/2023] [Accepted: 09/21/2023] [Indexed: 10/12/2023]
Abstract
The tremendous rate with which data is generated and analysis methods emerge makes it increasingly difficult to keep track of their domain of applicability, assumptions, limitations, and consequently, of the efficacy and precision with which they solve specific tasks. Therefore, there is an increasing need for benchmarks, and for the provision of infrastructure for continuous method evaluation. APAeval is an international community effort, organized by the RNA Society in 2021, to benchmark tools for the identification and quantification of the usage of alternative polyadenylation (APA) sites from short-read, bulk RNA-sequencing (RNA-seq) data. Here, we reviewed 17 tools and benchmarked eight on their ability to perform APA identification and quantification, using a comprehensive set of RNA-seq experiments comprising real, synthetic, and matched 3'-end sequencing data. To support continuous benchmarking, we have incorporated the results into the OpenEBench online platform, which allows for continuous extension of the set of methods, metrics, and challenges. We envisage that our analyses will assist researchers in selecting the appropriate tools for their studies, while the containers and reproducible workflows could easily be deployed and extended to evaluate new methods or data sets.
Collapse
Affiliation(s)
- Sam Bryce-Smith
- Department of Neuromuscular Diseases, UCL Queen Square Motor Neuron Disease Centre, UCL Queen Square Institute of Neurology, UCL, London WC1N 3BG, United Kingdom
| | - Dominik Burri
- Biozentrum, University of Basel, 4056 Basel, Switzerland
- Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - Matthew R Gazzara
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA
| | - Christina J Herrmann
- Biozentrum, University of Basel, 4056 Basel, Switzerland
- Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - Weronika Danecka
- Institute for Cell Biology, School of Biological Sciences, The University of Edinburgh, Edinburgh EH9 3FF, United Kingdom
| | - Christina M Fitzsimmons
- Laboratory of Cell Biology, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, Maryland 20892, USA
| | - Yuk Kei Wan
- Genome Institute of Singapore, Buona Vista, Singapore 138672
- Yong Loo Lin School of Medicine, National University of Singapore, Kent Ridge, Singapore 119228
| | - Farica Zhuang
- Department of Computer and Information Science, School of Engineering, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA
| | - Mervin M Fansler
- Tri-Institutional Program in Computational Biology and Medicine, Weill Cornell Graduate Studies, New York, New York 10065, USA
- Cancer Biology and Genetics, Sloan-Kettering Institute, MSKCC, New York, New York 10065, USA
| | - José M Fernández
- Life Sciences Department, Barcelona Supercomputing Center, 08034 Barcelona, Spain
- Spanish National Bioinformatics Institute (INB/ELIXIR-ES), 28029 Madrid, Spain
| | - Meritxell Ferret
- Life Sciences Department, Barcelona Supercomputing Center, 08034 Barcelona, Spain
- Spanish National Bioinformatics Institute (INB/ELIXIR-ES), 28029 Madrid, Spain
| | - Asier Gonzalez-Uriarte
- Life Sciences Department, Barcelona Supercomputing Center, 08034 Barcelona, Spain
- Spanish National Bioinformatics Institute (INB/ELIXIR-ES), 28029 Madrid, Spain
| | - Samuel Haynes
- Institute for Cell Biology, School of Biological Sciences, The University of Edinburgh, Edinburgh EH9 3FF, United Kingdom
| | - Chelsea Herdman
- Department of Neurobiology, University of Utah, Salt Lake City, Utah 84132, USA
| | - Alexander Kanitz
- Biozentrum, University of Basel, 4056 Basel, Switzerland
- Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - Maria Katsantoni
- Biozentrum, University of Basel, 4056 Basel, Switzerland
- Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - Federico Marini
- Institute of Medical Biostatistics, Epidemiology and Informatics (IMBEI), University Medical Center of the Johannes Gutenberg-University Mainz, 55118 Mainz, Germany
| | - Euan McDonnel
- Leeds Institute for Data Analytics, School of Molecular and Cellular Biology, University of Leeds, Leeds LS2 9NL, United Kingdom
| | - Ben Nicolet
- Department of Hematopoiesis, Sanquin Research, Landsteiner Laboratory, Amsterdam UMC, University of Amsterdam, 1066 CX Amsterdam, The Netherlands
- Oncode Institute, 3521 AL Utrecht, The Netherlands
| | - Chi-Lam Poon
- Graduate School of Medical Sciences, Weill Cornell Medicine, New York, New York 10065, USA
| | - Gregor Rot
- Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
- Institute of Molecular Life Sciences, University of Zurich, 8057 Zurich, Switzerland
| | - Leonard Schärfen
- Department of Molecular Biophysics & Biochemistry, Yale University, New Haven, Connecticut 06520, USA
| | - Pin-Jou Wu
- Center for Plant Molecular Biology (ZMBP), University of Tübingen, 72076 Tübingen, Germany
| | - Yoseop Yoon
- Department of Microbiology and Molecular Genetics, School of Medicine, University of California Irvine, Irvine, California 92617, USA
| | - Yoseph Barash
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA
- Department of Computer and Information Science, School of Engineering, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA
| | - Mihaela Zavolan
- Biozentrum, University of Basel, 4056 Basel, Switzerland
- Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| |
Collapse
|
2
|
Grlickova-Duzevik E, Reimonn TM, Michael M, Tian T, Owyoung J, McGrath-Conwell A, Neufeld P, Mueth M, Molliver DC, Ward PJ, Harrison BJ. Members of the CUGBP Elav-like family of RNA-binding proteins are expressed in distinct populations of primary sensory neurons. J Comp Neurol 2023; 531:1425-1442. [PMID: 37537886 DOI: 10.1002/cne.25520] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2023] [Revised: 05/16/2023] [Accepted: 06/10/2023] [Indexed: 08/05/2023]
Abstract
Primary sensory dorsal root ganglia (DRG) neurons are diverse, with distinct populations that respond to specific stimuli. Previously, we observed that functionally distinct populations of DRG neurons express mRNA transcript variants with different 3' untranslated regions (3'UTRs). 3'UTRs harbor binding sites for interaction with RNA-binding proteins (RBPs) for transporting mRNAs to subcellular domains, modulating transcript stability, and regulating the rate of translation. In the current study, analysis of publicly available single-cell RNA-sequencing data generated from adult mice revealed that 17 3'UTR-binding RBPs were enriched in specific populations of DRG neurons. This included four members of the CUG triplet repeat (CUGBP) Elav-like family (CELF): CELF2 and CELF4 were enriched in peptidergic, CELF6 in both peptidergic and nonpeptidergic, and CELF3 in tyrosine hydroxylase-expressing neurons. Immunofluorescence studies confirmed that 60% of CELF4+ neurons are small-diameter C fibers and 33% medium-diameter myelinated (likely Aδ) fibers and showed that CELF4 is distributed to peripheral termini. Coexpression analyses using transcriptomic data and immunofluorescence revealed that CELF4 is enriched in nociceptive neurons that express GFRA3, CGRP, and the capsaicin receptor TRPV1. Reanalysis of published transcriptomic data from macaque DRG revealed a highly similar distribution of CELF members, and reanalysis of single-nucleus RNA-sequencing data derived from mouse and rat DRG after sciatic injury revealed differential expression of CELFs in specific populations of sensory neurons. We propose that CELF RBPs may regulate the fate of mRNAs in populations of nociceptors, and may play a role in pain and/or neuronal regeneration following nerve injury.
Collapse
Affiliation(s)
- Eliza Grlickova-Duzevik
- Biomedical Sciences, College of Osteopathic Medicine, University of New England, Biddeford, Maine, USA
- Center for Excellence in the Neurosciences, University of New England, Biddeford, Maine, USA
| | - Thomas M Reimonn
- Program in Bioinformatics and Integrative Biology, University of Massachusetts Medical School, Worcester, Massachusetts, USA
| | - Merilla Michael
- Biomedical Sciences, College of Osteopathic Medicine, University of New England, Biddeford, Maine, USA
- Center for Excellence in the Neurosciences, University of New England, Biddeford, Maine, USA
| | - Tina Tian
- Medical Scientist Training Program, Emory University, Atlanta, Georgia, USA
- Neuroscience Graduate Program, Emory University, Atlanta, Georgia, USA
- Department of Cell Biology, Emory University School of Medicine, Atlanta, Georgia, USA
| | - Jordan Owyoung
- Department of Cell Biology, Emory University School of Medicine, Atlanta, Georgia, USA
- Genetics and Molecular Biology Graduate Program, Emory University, Atlanta, Georgia, USA
| | - Aidan McGrath-Conwell
- Center for Excellence in the Neurosciences, University of New England, Biddeford, Maine, USA
- College of Arts and Sciences, University of New England, Biddeford, Maine, USA
| | - Peter Neufeld
- Center for Excellence in the Neurosciences, University of New England, Biddeford, Maine, USA
- College of Arts and Sciences, University of New England, Biddeford, Maine, USA
| | - Madison Mueth
- Center for Excellence in the Neurosciences, University of New England, Biddeford, Maine, USA
- Graduate School of Biomedical Science and Engineering, University of Maine, Orono, Maine, USA
| | - Derek C Molliver
- Biomedical Sciences, College of Osteopathic Medicine, University of New England, Biddeford, Maine, USA
- Center for Excellence in the Neurosciences, University of New England, Biddeford, Maine, USA
| | - Patricia Jillian Ward
- Neuroscience Graduate Program, Emory University, Atlanta, Georgia, USA
- Department of Cell Biology, Emory University School of Medicine, Atlanta, Georgia, USA
| | - Benjamin J Harrison
- Biomedical Sciences, College of Osteopathic Medicine, University of New England, Biddeford, Maine, USA
- Center for Excellence in the Neurosciences, University of New England, Biddeford, Maine, USA
| |
Collapse
|
3
|
Bryce-Smith S, Burri D, Gazzara MR, Herrmann CJ, Danecka W, Fitzsimmons CM, Wan YK, Zhuang F, Fansler MM, Fernández JM, Ferret M, Gonzalez-Uriarte A, Haynes S, Herdman C, Kanitz A, Katsantoni M, Marini F, McDonnel E, Nicolet B, Poon CL, Rot G, Schärfen L, Wu PJ, Yoon Y, Barash Y, Zavolan M. Extensible benchmarking of methods that identify and quantify polyadenylation sites from RNA-seq data. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.06.23.546284. [PMID: 37425672 PMCID: PMC10327023 DOI: 10.1101/2023.06.23.546284] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/11/2023]
Abstract
The tremendous rate with which data is generated and analysis methods emerge makes it increasingly difficult to keep track of their domain of applicability, assumptions, and limitations and consequently, of the efficacy and precision with which they solve specific tasks. Therefore, there is an increasing need for benchmarks, and for the provision of infrastructure for continuous method evaluation. APAeval is an international community effort, organized by the RNA Society in 2021, to benchmark tools for the identification and quantification of the usage of alternative polyadenylation (APA) sites from short-read, bulk RNA-sequencing (RNA-seq) data. Here, we reviewed 17 tools and benchmarked eight on their ability to perform APA identification and quantification, using a comprehensive set of RNA-seq experiments comprising real, synthetic, and matched 3'-end sequencing data. To support continuous benchmarking, we have incorporated the results into the OpenEBench online platform, which allows for seamless extension of the set of methods, metrics, and challenges. We envisage that our analyses will assist researchers in selecting the appropriate tools for their studies. Furthermore, the containers and reproducible workflows generated in the course of this project can be seamlessly deployed and extended in the future to evaluate new methods or datasets.
Collapse
Affiliation(s)
- Sam Bryce-Smith
- UCL Queen Square Motor Neuron Disease Centre, Department of Neuromuscular Diseases, UCL Queen Square Institute of Neurology, UCL, London, UK
| | - Dominik Burri
- Biozentrum, University of Basel, Basel, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Matthew R. Gazzara
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, USA
| | - Christina J. Herrmann
- Biozentrum, University of Basel, Basel, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Weronika Danecka
- Institute for Cell Biology, School of Biological Sciences, The University of Edinburgh, Edinburgh, United Kingdom
| | - Christina M. Fitzsimmons
- Laboratory of Cell Biology, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, Maryland, USA
| | - Yuk Kei Wan
- Genome Institute of Singapore, Buona Vista, Singapore
- National University of Singapore, Kent Ridge, Singapore
| | - Farica Zhuang
- Department of Computer and Information Science, School of Engineering, University of Pennsylvania, Philadelphia, USA
| | - Mervin M. Fansler
- Tri-Institutional Program in Computational Biology and Medicine, Weill Cornell GraduateStudies, New York, NY, USA
- Cancer Biology and Genetics, Sloan-Kettering Institute, MSKCC, New York, NY, USA
| | - José M. Fernández
- Barcelona Supercomputing Center, Barcelona, Spain
- Spanish National Bioinformatics Institute (INB/ELIXIR-ES)
| | - Meritxell Ferret
- Barcelona Supercomputing Center, Barcelona, Spain
- Spanish National Bioinformatics Institute (INB/ELIXIR-ES)
| | - Asier Gonzalez-Uriarte
- Barcelona Supercomputing Center, Barcelona, Spain
- Spanish National Bioinformatics Institute (INB/ELIXIR-ES)
| | - Samuel Haynes
- Institute for Cell Biology, School of Biological Sciences, The University of Edinburgh, Edinburgh, United Kingdom
| | | | - Alexander Kanitz
- Biozentrum, University of Basel, Basel, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Maria Katsantoni
- Biozentrum, University of Basel, Basel, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Federico Marini
- Institute of Medical Biostatistics, Epidemiology and Informatics (IMBEI) - UniversityMedical Center of the Johannes Gutenberg, University Mainz, Germany
| | - Euan McDonnel
- Leeds Institute for Data Analytics, School of Molecular and Cellular Biology, University of Leeds, United Kingdom
| | - Ben Nicolet
- Department of Hematopoiesis, Sanquin Research, Landsteiner Laboratory, AmsterdamUMC, University of Amsterdam, and Oncode Institute, Amsterdam, The Netherlands
| | | | - Gregor Rot
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
- Institute of Molecular Life Sciences, Zurich, Switzerland
| | - Leonard Schärfen
- Department of Molecular Biophysics & Biochemistry, Yale University, New Haven CT, USA
| | - Pin-Jou Wu
- Center for Plant Molecular Biology (ZMBP), University of Tübingen, Germany
| | - Yoseop Yoon
- Department of Microbiology and Molecular Genetics, School of Medicine, University of California Irvine, Irvine, California, USA
| | - Yoseph Barash
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, USA
- Department of Computer and Information Science, School of Engineering, University of Pennsylvania, Philadelphia, USA
| | - Mihaela Zavolan
- Biozentrum, University of Basel, Basel, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| |
Collapse
|
4
|
Bai Y, Qin Y, Fan Z, Morrison RM, Nam K, Zarour HM, Koldamova R, Padiath QS, Kim S, Park HJ. scMAPA: Identification of cell-type-specific alternative polyadenylation in complex tissues. Gigascience 2022; 11:6576244. [PMID: 35488860 PMCID: PMC9055853 DOI: 10.1093/gigascience/giac033] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2021] [Revised: 11/18/2021] [Accepted: 03/15/2022] [Indexed: 01/06/2023] Open
Abstract
Background Alternative polyadenylation (APA) causes shortening or lengthening of the 3ʹ-untranslated region (3ʹ-UTR) of genes (APA genes) in diverse cellular processes such as cell proliferation and differentiation. To identify cell-type–specific APA genes in scRNA-Seq data, current bioinformatic methods have several limitations. First, they assume certain read coverage shapes in the scRNA-Seq data, which can be violated in multiple APA genes. Second, their identification is limited between 2 cell types and not directly applicable to the data of multiple cell types. Third, they do not control undesired source of variance, which potentially introduces noise to the cell-type–specific identification of APA genes. Findings We developed a combination of a computational change-point algorithm and a statistical model, single-cell Multi-group identification of APA (scMAPA). To avoid the assumptions on the read coverage shape, scMAPA formulates a change-point problem after transforming the 3ʹ biased scRNA-Seq data to represent the full-length 3ʹ-UTR signal. To identify cell-type–specific APA genes while adjusting for undesired source of variation, scMAPA models APA isoforms in consideration of the cell types and the undesired source. In our novel simulation data and data from human peripheral blood mononuclear cells, scMAPA outperforms existing methods in sensitivity, robustness, and stability. In mouse brain data consisting of multiple cell types sampled from multiple regions, scMAPA identifies cell-type–specific APA genes, elucidating novel roles of APA for dividing immune cells and differentiated neuron cells and in multiple brain disorders. Conclusions scMAPA elucidates the cell-type–specific function of APA events and sheds novel insights into the functional roles of APA events in complex tissues.
Collapse
Affiliation(s)
- Yulong Bai
- Department of Human Genetics, Graduate School of Public Health, University of Pittsburgh, Pittsburgh, PA 15261, USA
| | - Yidi Qin
- Department of Human Genetics, Graduate School of Public Health, University of Pittsburgh, Pittsburgh, PA 15261, USA
| | - Zhenjiang Fan
- Department of Computer Science, School of Computing and Information, University of Pittsburgh, Pittsburgh, PA 15213, USA
| | - Robert M Morrison
- Department of Medicine and Division of Hematology/Oncology, University of Pittsburgh, School of Medicine, Pittsburgh, PA 15213, USA.,Department of Immunology, University of Pittsburgh, School of Medicine, Pittsburgh, PA 15213, USA.,Department of Computational and Systems Biology, University of Pittsburgh Medical Center, Pittsburgh, PA 15213, USA
| | - KyongNyon Nam
- Department of Environmental and Occupational Health, Graduate School of Public Health, University of Pittsburgh, Pittsburgh, PA 15261, USA
| | - Hassane M Zarour
- Department of Medicine and Division of Hematology/Oncology, University of Pittsburgh, School of Medicine, Pittsburgh, PA 15213, USA.,Department of Immunology, University of Pittsburgh, School of Medicine, Pittsburgh, PA 15213, USA
| | - Radosveta Koldamova
- Department of Environmental and Occupational Health, Graduate School of Public Health, University of Pittsburgh, Pittsburgh, PA 15261, USA
| | - Quasar Saleem Padiath
- Department of Human Genetics, Graduate School of Public Health, University of Pittsburgh, Pittsburgh, PA 15261, USA.,Department of Neurobiology, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA
| | - Soyeon Kim
- Department of Pediatrics, School of Medicine, University of Pittsburgh, Pittsburgh, PA, 15224, USA.,Division of Pediatric Pulmonary Medicine, UPMC Children's Hospital of Pittsburgh, Pittsburgh, PA 15224, USA
| | - Hyun Jung Park
- Department of Human Genetics, Graduate School of Public Health, University of Pittsburgh, Pittsburgh, PA 15261, USA
| |
Collapse
|
5
|
Guerra-Almeida D, Tschoeke DA, da-Fonseca RN. Understanding small ORF diversity through a comprehensive transcription feature classification. DNA Res 2021; 28:6317669. [PMID: 34240112 PMCID: PMC8435553 DOI: 10.1093/dnares/dsab007] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2020] [Indexed: 11/13/2022] Open
Abstract
Small open reading frames (small ORFs/sORFs/smORFs) are potentially coding sequences smaller than 100 codons that have historically been considered junk DNA by gene prediction software and in annotation screening; however, the advent of next-generation sequencing has contributed to the deeper investigation of junk DNA regions and their transcription products, resulting in the emergence of smORFs as a new focus of interest in systems biology. Several smORF peptides were recently reported in noncanonical mRNAs as new players in numerous biological contexts; however, their relevance is still overlooked in coding potential analysis. Hence, this review proposes a smORF classification based on transcriptional features, discussing the most promising approaches to investigate smORFs based on their different characteristics. First, smORFs were divided into nonexpressed (intergenic) and expressed (genic) smORFs. Second, genic smORFs were classified as smORFs located in noncoding RNAs (ncRNAs) or canonical mRNAs. Finally, smORFs in ncRNAs were further subdivided into sequences located in small or long RNAs, whereas smORFs located in canonical mRNAs were subdivided into several specific classes depending on their localization along the gene. We hope that this review provides new insights into large-scale annotations and reinforces the role of smORFs as essential components of a hidden coding DNA world.
Collapse
Affiliation(s)
- Diego Guerra-Almeida
- Institute of Biodiversity and Sustainability, Federal University of Rio de Janeiro, Rio de Janeiro, Brazil
| | - Diogo Antonio Tschoeke
- Alberto Luiz Coimbra Institute of Graduate Studies and Engineering Research (COPPE), Biomedical Engineering Program, Federal University of Rio de Janeiro, Rio de Janeiro, Brazil
| | - Rodrigo Nunes- da-Fonseca
- Institute of Biodiversity and Sustainability, Federal University of Rio de Janeiro, Rio de Janeiro, Brazil.,National Institute of Science and Technology in Molecular Entomology, Rio de Janeiro, Brazil
| |
Collapse
|
6
|
Gerber S, Schratt G, Germain PL. Streamlining differential exon and 3' UTR usage with diffUTR. BMC Bioinformatics 2021; 22:189. [PMID: 33849458 PMCID: PMC8045333 DOI: 10.1186/s12859-021-04114-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2021] [Accepted: 03/30/2021] [Indexed: 12/13/2022] Open
Abstract
Background Despite the importance of alternative poly-adenylation and 3′ UTR length for a variety of biological phenomena, there are limited means of detecting UTR changes from standard transcriptomic data. Results We present the diffUTR Bioconductor package which streamlines and improves upon differential exon usage (DEU) analyses, and leverages existing DEU tools and alternative poly-adenylation site databases to enable differential 3′ UTR usage analysis. We demonstrate the diffUTR features and show that it is more flexible and more accurate than state-of-the-art alternatives, both in simulations and in real data. Conclusions diffUTR enables differential 3′ UTR analysis and more generally facilitates DEU and the exploration of their results. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-021-04114-7.
Collapse
Affiliation(s)
- Stefan Gerber
- Group of Computational Neurogenomics, D-HEST Institute for Neurosciences, ETH Zürich, Winterthurerstrasse 190, 8057, Zurich, Switzerland.,Lab of Systems Neuroscience, D-HEST Institute for Neurosciences, ETH Zürich, Winterthurerstrasse 190, 8057, Zurich, Switzerland
| | - Gerhard Schratt
- Lab of Systems Neuroscience, D-HEST Institute for Neurosciences, ETH Zürich, Winterthurerstrasse 190, 8057, Zurich, Switzerland
| | - Pierre-Luc Germain
- Group of Computational Neurogenomics, D-HEST Institute for Neurosciences, ETH Zürich, Winterthurerstrasse 190, 8057, Zurich, Switzerland. .,Lab of Statistical Bioinformatics, DMLS, University of Zürich, Winterthurerstrasse 190, 8057, Zurich, Switzerland. .,SIB Swiss Institute of Bioinformatics, Zurich, Switzerland.
| |
Collapse
|
7
|
Zinski AL, Carrion S, Michal JJ, Gartstein MA, Quock RM, Davis JF, Jiang Z. Genome-to-phenome research in rats: progress and perspectives. Int J Biol Sci 2021; 17:119-133. [PMID: 33390838 PMCID: PMC7757052 DOI: 10.7150/ijbs.51628] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2020] [Accepted: 10/06/2020] [Indexed: 01/07/2023] Open
Abstract
Because of their relatively short lifespan (<4 years), rats have become the second most used model organism to study health and diseases in humans who may live for up to 120 years. First-, second- and third-generation sequencing technologies and platforms have produced increasingly greater sequencing depth and accurate reads, leading to significant advancements in the rat genome assembly during the last 20 years. In fact, whole genome sequencing (WGS) of 47 strains have been completed. This has led to the discovery of genome variants in rats, which have been widely used to detect quantitative trait loci underlying complex phenotypes based on gene, haplotype, and sweep association analyses. DNA variants can also reveal strain, chromosome and gene functional evolutions. In parallel, phenome programs have advanced significantly in rats during the last 15 years and more than 10 databases host genome and/or phenome information. In order to discover the bridges between genome and phenome, systems genetics and integrative genomics approaches have been developed. On the other hand, multiple level information transfers from genome to phenome are executed by differential usage of alternative transcriptional start (ATS) and polyadenylation (APA) sites per gene. We used our own experiments to demonstrate how alternative transcriptome analysis can lead to enrichment of phenome-related causal pathways in rats. Development of advanced genome-to-phenome assays will certainly enhance rats as models for human biomedical research.
Collapse
Affiliation(s)
- Amy L. Zinski
- Department of Animal Sciences, Washington State University, Pullman, WA 99164-7620
| | - Shane Carrion
- Department of Animal Sciences, Washington State University, Pullman, WA 99164-7620
| | - Jennifer J. Michal
- Department of Animal Sciences, Washington State University, Pullman, WA 99164-7620
| | - Maria A. Gartstein
- Department of Psychology, Washington State University, Pullman, WA 99164-4820
| | - Raymond M. Quock
- Department of Psychology, Washington State University, Pullman, WA 99164-4820
| | - Jon F. Davis
- Department of Integrative Physiology and Neuroscience, Washington State University, Pullman, WA 99164-7620
| | - Zhihua Jiang
- Department of Animal Sciences, Washington State University, Pullman, WA 99164-7620
| |
Collapse
|
8
|
Doulazmi M, Cros C, Dusart I, Trembleau A, Dubacq C. Alternative polyadenylation produces multiple 3' untranslated regions of odorant receptor mRNAs in mouse olfactory sensory neurons. BMC Genomics 2019; 20:577. [PMID: 31299892 PMCID: PMC6624953 DOI: 10.1186/s12864-019-5927-3] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2019] [Accepted: 06/23/2019] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Odorant receptor genes constitute the largest gene family in mammalian genomes and this family has been extensively studied in several species, but to date far less attention has been paid to the characterization of their mRNA 3' untranslated regions (3'UTRs). Given the increasing importance of UTRs in the understanding of RNA metabolism, and the growing interest in alternative polyadenylation especially in the nervous system, we aimed at identifying the alternative isoforms of odorant receptor mRNAs generated through 3'UTR variation. RESULTS We implemented a dedicated pipeline using IsoSCM instead of Cufflinks to analyze RNA-Seq data from whole olfactory mucosa of adult mice and obtained an extensive description of the 3'UTR isoforms of odorant receptor mRNAs. To validate our bioinformatics approach, we exhaustively analyzed the 3'UTR isoforms produced from 2 pilot genes, using molecular approaches including northern blot and RNA ligation mediated polyadenylation test. Comparison between datasets further validated the pipeline and confirmed the alternative polyadenylation patterns of odorant receptors. Qualitative and quantitative analyses of the annotated 3' regions demonstrate that 1) Odorant receptor 3'UTRs are longer than previously described in the literature; 2) More than 77% of odorant receptor mRNAs are subject to alternative polyadenylation, hence generating at least 2 detectable 3'UTR isoforms; 3) Splicing events in 3'UTRs are restricted to a limited subset of odorant receptor genes; and 4) Comparison between male and female data shows no sex-specific differences in odorant receptor 3'UTR isoforms. CONCLUSIONS We demonstrated for the first time that odorant receptor genes are extensively subject to alternative polyadenylation. This ground-breaking change to the landscape of 3'UTR isoforms of Olfr mRNAs opens new avenues for investigating their respective functions, especially during the differentiation of olfactory sensory neurons.
Collapse
Affiliation(s)
- Mohamed Doulazmi
- CNRS, Institut de Biologie Paris Seine, Biological adaptation and ageing, B2A, Sorbonne Université, F-75005 Paris, France
| | - Cyril Cros
- CNRS, INSERM, Institut de Biologie Paris Seine, Neuroscience Paris Seine, NPS, Sorbonne Université, F-75005 Paris, France
- Present Address: Columbia University, New York, NY 10027 USA
| | - Isabelle Dusart
- CNRS, INSERM, Institut de Biologie Paris Seine, Neuroscience Paris Seine, NPS, Sorbonne Université, F-75005 Paris, France
| | - Alain Trembleau
- CNRS, INSERM, Institut de Biologie Paris Seine, Neuroscience Paris Seine, NPS, Sorbonne Université, F-75005 Paris, France
| | - Caroline Dubacq
- CNRS, INSERM, Institut de Biologie Paris Seine, Neuroscience Paris Seine, NPS, Sorbonne Université, F-75005 Paris, France
| |
Collapse
|