1
|
Gutierrez PA, Wei J, Sun Y, Tong L. Molecular basis for the recognition of the AUUAAA polyadenylation signal by mPSF. RNA (NEW YORK, N.Y.) 2022; 28:1534-1541. [PMID: 36130077 PMCID: PMC9745836 DOI: 10.1261/rna.079322.122] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/19/2022] [Accepted: 08/22/2022] [Indexed: 06/15/2023]
Abstract
The polyadenylation signal (PAS) is a key sequence element for 3'-end cleavage and polyadenylation of messenger RNA precursors (pre-mRNAs). This hexanucleotide motif is recognized by the mammalian polyadenylation specificity factor (mPSF), consisting of CPSF160, WDR33, CPSF30, and Fip1 subunits. Recent studies have revealed how the AAUAAA PAS, the most frequently observed PAS, is recognized by mPSF. We report here the structure of human mPSF in complex with the AUUAAA PAS, the second most frequently identified PAS. Conformational differences are observed for the A1 and U2 nucleotides in AUUAAA compared to the A1 and A2 nucleotides in AAUAAA, while the binding modes of the remaining 4 nt are essentially identical. The 5' phosphate of U2 moves by 2.6 Å and the U2 base is placed near the six-membered ring of A2 in AAUAAA, where it makes two hydrogen bonds with zinc finger 2 (ZF2) of CPSF30, which undergoes conformational changes as well. We also attempted to determine the binding modes of two rare PAS hexamers, AAGAAA and GAUAAA, but did not observe the RNA in the cryo-electron microscopy density. The residues in CPSF30 (ZF2 and ZF3) and WDR33 that recognize PAS are disordered in these two structures.
Collapse
Affiliation(s)
- Pedro A Gutierrez
- Department of Biological Sciences, Columbia University, New York, New York 10027, USA
| | - Jia Wei
- Department of Biological Sciences, Columbia University, New York, New York 10027, USA
| | - Yadong Sun
- Department of Biological Sciences, Columbia University, New York, New York 10027, USA
| | - Liang Tong
- Department of Biological Sciences, Columbia University, New York, New York 10027, USA
| |
Collapse
|
2
|
Zingone A, Sinha S, Ante M, Nguyen C, Daujotyte D, Bowman ED, Sinha N, Mitchell KA, Chen Q, Yan C, Loher P, Meerzaman D, Ruppin E, Ryan BM. A comprehensive map of alternative polyadenylation in African American and European American lung cancer patients. Nat Commun 2021; 12:5605. [PMID: 34556645 PMCID: PMC8460807 DOI: 10.1038/s41467-021-25763-5] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2020] [Accepted: 07/22/2021] [Indexed: 11/09/2022] Open
Abstract
Deciphering the post-transcriptional mechanisms (PTM) regulating gene expression is critical to understand the dynamics underlying transcriptomic regulation in cancer. Alternative polyadenylation (APA)-regulation of mRNA 3'UTR length by alternating poly(A) site usage-is a key PTM mechanism whose comprehensive analysis in cancer remains an important open challenge. Here we use a method and analysis pipeline that sequences 3'end-enriched RNA directly to overcome the saturation limitation of traditional 5'-3' based sequencing. We comprehensively map the APA landscape in lung cancer in a cohort of 98 tumor/non-involved tissues derived from European American and African American patients. We identify a global shortening of 3'UTR transcripts in lung cancer, with notable functional implications on the expression of both coding and noncoding genes. We find that APA of non-coding RNA transcripts (long non-coding RNAs and microRNAs) is a recurrent event in lung cancer and discover that the selection of alternative polyA sites is a form of non-coding RNA expression control. Our results indicate that mRNA transcripts from EAs are two times more likely than AAs to undergo APA in lung cancer. Taken together, our findings comprehensively map and identify the important functional role of alternative polyadenylation in determining transcriptomic heterogeneity in lung cancer.
Collapse
Affiliation(s)
- Adriana Zingone
- Laboratory of Human Carcinogenesis, Center for Cancer Research, National Cancer Institute, Bethesda, MD, 20892, US
| | - Sanju Sinha
- Laboratory of Human Carcinogenesis, Center for Cancer Research, National Cancer Institute, Bethesda, MD, 20892, US
- Cancer Data Science Laboratory, Center for Cancer Research, National Cancer Institute, Bethesda, MD, US
| | - Michael Ante
- Lexogen GmbH, Campus Vienna Biocenter 5, 1030, Vienna, Austria
- Ares Genetics GmbH, Karl-Farkas-Gasse 18, 1030, Vienna, Austria
| | - Cu Nguyen
- Computational Genomics Research, Center for Biomedical Informatics and Information Technology (CBIIT), National Cancer Institute, 9609 Medical Center Drive, Rockville, MD, 20850, US
| | - Dalia Daujotyte
- Lexogen GmbH, Campus Vienna Biocenter 5, 1030, Vienna, Austria
| | - Elise D Bowman
- Laboratory of Human Carcinogenesis, Center for Cancer Research, National Cancer Institute, Bethesda, MD, 20892, US
| | - Neelam Sinha
- Cancer Data Science Laboratory, Center for Cancer Research, National Cancer Institute, Bethesda, MD, US
| | - Khadijah A Mitchell
- Laboratory of Human Carcinogenesis, Center for Cancer Research, National Cancer Institute, Bethesda, MD, 20892, US
| | - Qingrong Chen
- Computational Genomics Research, Center for Biomedical Informatics and Information Technology (CBIIT), National Cancer Institute, 9609 Medical Center Drive, Rockville, MD, 20850, US
| | - Chunhua Yan
- Computational Genomics Research, Center for Biomedical Informatics and Information Technology (CBIIT), National Cancer Institute, 9609 Medical Center Drive, Rockville, MD, 20850, US
| | - Phillipe Loher
- Computational Medicine Center, Sidney Kimmel Medical College, Thomas Jefferson University, Philadelphia, PA, 19017, US
| | - Daoud Meerzaman
- Computational Genomics Research, Center for Biomedical Informatics and Information Technology (CBIIT), National Cancer Institute, 9609 Medical Center Drive, Rockville, MD, 20850, US
| | - Eytan Ruppin
- Cancer Data Science Laboratory, Center for Cancer Research, National Cancer Institute, Bethesda, MD, US
| | - Bríd M Ryan
- Laboratory of Human Carcinogenesis, Center for Cancer Research, National Cancer Institute, Bethesda, MD, 20892, US.
| |
Collapse
|
3
|
Kandhari N, Kraupner-Taylor CA, Harrison PF, Powell DR, Beilharz TH. The Detection and Bioinformatic Analysis of Alternative 3 ' UTR Isoforms as Potential Cancer Biomarkers. Int J Mol Sci 2021; 22:5322. [PMID: 34070203 PMCID: PMC8158509 DOI: 10.3390/ijms22105322] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2021] [Revised: 05/06/2021] [Accepted: 05/06/2021] [Indexed: 12/17/2022] Open
Abstract
Alternative transcript cleavage and polyadenylation is linked to cancer cell transformation, proliferation and outcome. This has led researchers to develop methods to detect and bioinformatically analyse alternative polyadenylation as potential cancer biomarkers. If incorporated into standard prognostic measures such as gene expression and clinical parameters, these could advance cancer prognostic testing and possibly guide therapy. In this review, we focus on the existing methodologies, both experimental and computational, that have been applied to support the use of alternative polyadenylation as cancer biomarkers.
Collapse
Affiliation(s)
- Nitika Kandhari
- Development and Stem Cells Program, Department of Biochemistry and Molecular Biology, Monash Biomedicine Discovery Institute, Monash University, Melbourne, VIC 3800, Australia; (N.K.); (C.A.K.-T.); (P.F.H.)
| | - Calvin A. Kraupner-Taylor
- Development and Stem Cells Program, Department of Biochemistry and Molecular Biology, Monash Biomedicine Discovery Institute, Monash University, Melbourne, VIC 3800, Australia; (N.K.); (C.A.K.-T.); (P.F.H.)
| | - Paul F. Harrison
- Development and Stem Cells Program, Department of Biochemistry and Molecular Biology, Monash Biomedicine Discovery Institute, Monash University, Melbourne, VIC 3800, Australia; (N.K.); (C.A.K.-T.); (P.F.H.)
- Monash Bioinformatics Platform, Monash University, Melbourne, VIC 3800, Australia;
| | - David R. Powell
- Monash Bioinformatics Platform, Monash University, Melbourne, VIC 3800, Australia;
| | - Traude H. Beilharz
- Development and Stem Cells Program, Department of Biochemistry and Molecular Biology, Monash Biomedicine Discovery Institute, Monash University, Melbourne, VIC 3800, Australia; (N.K.); (C.A.K.-T.); (P.F.H.)
| |
Collapse
|
4
|
Jensen MK, Elrod ND, Yalamanchili HK, Ji P, Lin A, Liu Z, Wagner EJ. Application and design considerations for 3'-end sequencing using click-chemistry. Methods Enzymol 2021; 655:1-23. [PMID: 34183117 DOI: 10.1016/bs.mie.2021.03.012] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023]
Abstract
Over the past 15 years, investigations into alternative polyadenylation (APA) and its function in cellular physiology and pathology have greatly expanded due to the emergent appreciation of its key role in driving transcriptomic diversity. This growth has necessitated the development of new technologies capable of monitoring cleavage and polyadenylation events genome-wide. Advancements in approaches include both the creation of computational tools to re-analyze RNA-seq to identify APA events as well as targeted sequencing approaches customized to focus on the 3'-end of mRNA. Here we describe a streamlined protocol for polyA-Click-seq (PAC-seq), which utilizes click-chemistry to create mRNA 3'-ends sequencing libraries. Importantly, we offer additional considerations not present in our previous protocols including the use of spike-ins, unique molecular identifier primers, and guidance for appropriate depth of PAC-seq. In conjunction with the companion chapter on PolyA-miner (Yalamanchili et al., 2021) to computationally analyze PAC-seq data, we provide a complete experimental pipeline to analyze mRNA 3'-end usage in eukaryotic cells.
Collapse
Affiliation(s)
- Madeline K Jensen
- Department of Biochemistry and Molecular Biology, The University of Texas Medical Branch at Galveston, Galveston, TX, United States
| | - Nathan D Elrod
- Department of Biochemistry and Molecular Biology, The University of Texas Medical Branch at Galveston, Galveston, TX, United States
| | - Hari Krishna Yalamanchili
- Department of Pediatrics, Baylor College of Medicine, Houston, TX, United States; Jan and Dan Duncan Neurological Research Institute, Texas Children's Hospital, Houston, TX, United States; USDA/ARS Children's Nutrition Research Center, Department of Pediatrics, Baylor College of Medicine, Houston, TX, United States
| | - Ping Ji
- Department of Biochemistry and Molecular Biology, The University of Texas Medical Branch at Galveston, Galveston, TX, United States
| | - Ai Lin
- Department of Biochemistry and Molecular Biology, The University of Texas Medical Branch at Galveston, Galveston, TX, United States; Department of Etiology and Carcinogenesis, National Cancer Center/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Zhandong Liu
- Department of Pediatrics, Baylor College of Medicine, Houston, TX, United States; Jan and Dan Duncan Neurological Research Institute, Texas Children's Hospital, Houston, TX, United States
| | - Eric J Wagner
- Department of Biochemistry and Molecular Biology, The University of Texas Medical Branch at Galveston, Galveston, TX, United States.
| |
Collapse
|
5
|
Yalamanchili HK, Alcott CE, Ji P, Wagner EJ, Zoghbi HY, Liu Z. PolyA-miner: accurate assessment of differential alternative poly-adenylation from 3'Seq data using vector projections and non-negative matrix factorization. Nucleic Acids Res 2020; 48:e69. [PMID: 32463457 PMCID: PMC7337927 DOI: 10.1093/nar/gkaa398] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2019] [Revised: 04/05/2020] [Accepted: 05/04/2020] [Indexed: 12/23/2022] Open
Abstract
Almost 70% of human genes undergo alternative polyadenylation (APA) and generate mRNA transcripts with varying lengths, typically of the 3′ untranslated regions (UTR). APA plays an important role in development and cellular differentiation, and its dysregulation can cause neuropsychiatric diseases and increase cancer severity. Increasing awareness of APA’s role in human health and disease has propelled the development of several 3′ sequencing (3′Seq) techniques that allow for precise identification of APA sites. However, despite the recent data explosion, there are no robust computational tools that are precisely designed to analyze 3′Seq data. Analytical approaches that have been used to analyze these data predominantly use proximal to distal usage. With about 50% of human genes having more than two APA isoforms, current methods fail to capture the entirety of APA changes and do not account for non-proximal to non-distal changes. Addressing these key challenges, this study demonstrates PolyA-miner, an algorithm to accurately detect and assess differential alternative polyadenylation specifically from 3′Seq data. Genes are abstracted as APA matrices, and differential APA usage is inferred using iterative consensus non-negative matrix factorization (NMF) based clustering. PolyA-miner accounts for all non-proximal to non-distal APA switches using vector projections and reflects precise gene-level 3′UTR changes. It can also effectively identify novel APA sites that are otherwise undetected when using reference-based approaches. Evaluation on multiple datasets—first-generation MicroArray Quality Control (MAQC) brain and Universal Human Reference (UHR) PolyA-seq data, recent glioblastoma cell line NUDT21 knockdown Poly(A)-ClickSeq (PAC-seq) data, and our own mouse hippocampal and human stem cell-derived neuron PAC-seq data—strongly supports the value and protocol-independent applicability of PolyA-miner. Strikingly, in the glioblastoma cell line data, PolyA-miner identified more than twice the number of genes with APA changes than initially reported. With the emerging importance of APA in human development and disease, PolyA-miner can significantly improve data analysis and help decode the underlying APA dynamics.
Collapse
Affiliation(s)
- Hari Krishna Yalamanchili
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA.,Jan and Dan Duncan Neurological Research Institute, Texas Children's Hospital, Houston, TX 77030, USA
| | - Callison E Alcott
- Jan and Dan Duncan Neurological Research Institute, Texas Children's Hospital, Houston, TX 77030, USA.,Program in Developmental Biology, Baylor College of Medicine, Houston, TX 77030, USA.,Medical Scientist Training Program, Baylor College of Medicine, Houston, TX 77030, USA
| | - Ping Ji
- Department of Biochemistry & Molecular Biology, University of Texas Medical Branch, Galveston, TX, 77555, USA
| | - Eric J Wagner
- Department of Biochemistry & Molecular Biology, University of Texas Medical Branch, Galveston, TX, 77555, USA
| | - Huda Y Zoghbi
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA.,Jan and Dan Duncan Neurological Research Institute, Texas Children's Hospital, Houston, TX 77030, USA.,Howard Hughes Medical Institute, Houston, TX 77030, USA.,Department of Pediatrics, Baylor College of Medicine, Houston, TX 77030, USA.,Department of Neuroscience, Baylor College of Medicine, Houston, TX 77030, USA
| | - Zhandong Liu
- Jan and Dan Duncan Neurological Research Institute, Texas Children's Hospital, Houston, TX 77030, USA.,Department of Pediatrics, Baylor College of Medicine, Houston, TX 77030, USA
| |
Collapse
|
6
|
Balázs Z, Tombácz D, Csabai Z, Moldován N, Snyder M, Boldogkői Z. Template-switching artifacts resemble alternative polyadenylation. BMC Genomics 2019; 20:824. [PMID: 31703623 PMCID: PMC6839120 DOI: 10.1186/s12864-019-6199-7] [Citation(s) in RCA: 26] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2019] [Accepted: 10/17/2019] [Indexed: 02/09/2023] Open
Abstract
BACKGROUND Alternative polyadenylation is commonly examined using cDNA sequencing, which is known to be affected by template-switching artifacts. However, the effects of such template-switching artifacts on alternative polyadenylation are generally disregarded, while alternative polyadenylation artifacts are attributed to internal priming. RESULTS Here, we analyzed both long-read cDNA sequencing and direct RNA sequencing data of two organisms, generated by different sequencing platforms. We developed a filtering algorithm which takes into consideration that template-switching can be a source of artifactual polyadenylation when filtering out spurious polyadenylation sites. The algorithm outperformed the conventional internal priming filters based on comparison to direct RNA sequencing data. We also showed that the polyadenylation artifacts arise in cDNA sequencing at consecutive stretches of as few as three adenines. There was no substantial difference between the lengths of poly(A) tails at the artifactual and the true transcriptional end sites even though it is expected that internal priming artifacts have shorter poly(A) tails than genuine polyadenylated reads. CONCLUSIONS Our findings suggest that template switching plays an important role in the generation of spurious polyadenylation and support the need for more rigorous filtering of artifactual polyadenylation sites in cDNA data, or that alternative polyadenylation should be annotated using native RNA sequencing.
Collapse
Affiliation(s)
- Zsolt Balázs
- Department of Medical Biology, Faculty of Medicine, University of Szeged, Szeged, Hungary
| | - Dóra Tombácz
- Department of Medical Biology, Faculty of Medicine, University of Szeged, Szeged, Hungary.,Department of Genetics, School of Medicine, Stanford University, Stanford, CA, USA
| | - Zsolt Csabai
- Department of Medical Biology, Faculty of Medicine, University of Szeged, Szeged, Hungary
| | - Norbert Moldován
- Department of Medical Biology, Faculty of Medicine, University of Szeged, Szeged, Hungary
| | - Michael Snyder
- Department of Genetics, School of Medicine, Stanford University, Stanford, CA, USA
| | - Zsolt Boldogkői
- Department of Medical Biology, Faculty of Medicine, University of Szeged, Szeged, Hungary.
| |
Collapse
|
7
|
Abstract
3' untranslated regions (3' UTRs) of messenger RNAs (mRNAs) are best known to regulate mRNA-based processes, such as mRNA localization, mRNA stability, and translation. In addition, 3' UTRs can establish 3' UTR-mediated protein-protein interactions (PPIs), and thus can transmit genetic information encoded in 3' UTRs to proteins. This function has been shown to regulate diverse protein features, including protein complex formation or posttranslational modifications, but is also expected to alter protein conformations. Therefore, 3' UTR-mediated information transfer can regulate protein features that are not encoded in the amino acid sequence. This review summarizes both 3' UTR functions-the regulation of mRNA and protein-based processes-and highlights how each 3' UTR function was discovered with a focus on experimental approaches used and the concepts that were learned. This review also discusses novel approaches to study 3' UTR functions in the future by taking advantage of recent advances in technology.
Collapse
Affiliation(s)
- Christine Mayr
- Department of Cancer Biology and Genetics, Memorial Sloan Kettering Cancer Center, New York, New York 10065
| |
Collapse
|
8
|
Vickovic S, Eraslan G, Salmén F, Klughammer J, Stenbeck L, Schapiro D, Äijö T, Bonneau R, Bergenstråhle L, Navarro JF, Gould J, Griffin GK, Borg Å, Ronaghi M, Frisén J, Lundeberg J, Regev A, Ståhl PL. High-definition spatial transcriptomics for in situ tissue profiling. Nat Methods 2019; 16:987-990. [PMID: 31501547 PMCID: PMC6765407 DOI: 10.1038/s41592-019-0548-y] [Citation(s) in RCA: 602] [Impact Index Per Article: 120.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2019] [Accepted: 08/02/2019] [Indexed: 12/21/2022]
Abstract
Spatial and molecular characteristics determine tissue function, yet high-resolution methods to capture both concurrently are lacking. Here, we developed high-definition spatial transcriptomics, which captures RNA from histological tissue sections on a dense, spatially barcoded bead array. Each experiment recovers several hundred thousand transcript-coupled spatial barcodes at 2-μm resolution, as demonstrated in mouse brain and primary breast cancer. This opens the way to high-resolution spatial analysis of cells and tissues.
Collapse
Affiliation(s)
- Sanja Vickovic
- Klarman Cell Observatory, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
- Science for Life Laboratory, Department of Gene Technology, KTH Royal Institute of Technology, Stockholm, Sweden.
| | - Gökcen Eraslan
- Klarman Cell Observatory, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Fredrik Salmén
- Science for Life Laboratory, Department of Gene Technology, KTH Royal Institute of Technology, Stockholm, Sweden
| | - Johanna Klughammer
- Klarman Cell Observatory, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Linnea Stenbeck
- Science for Life Laboratory, Department of Gene Technology, KTH Royal Institute of Technology, Stockholm, Sweden
| | - Denis Schapiro
- Klarman Cell Observatory, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Laboratory of Systems Pharmacology, Harvard Medical School, Boston, MA, USA
| | - Tarmo Äijö
- Center for Computational Biology, Flatiron Institute, New York, NY, USA
| | - Richard Bonneau
- Center for Data Science, New York University, New York, NY, USA
- Department of Pathology, Brigham and Women's Hospital, Boston, MA, USA
| | - Ludvig Bergenstråhle
- Science for Life Laboratory, Department of Gene Technology, KTH Royal Institute of Technology, Stockholm, Sweden
| | - José Fernandéz Navarro
- Science for Life Laboratory, Department of Gene Technology, KTH Royal Institute of Technology, Stockholm, Sweden
| | - Joshua Gould
- Klarman Cell Observatory, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Gabriel K Griffin
- Klarman Cell Observatory, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Pathology, Brigham and Women's Hospital, Boston, MA, USA
| | - Åke Borg
- Division of Oncology and Pathology, Department of Clinical Sciences Lund, Lund University, Lund, Sweden
| | | | - Jonas Frisén
- Department of Cell and Molecular Biology, Karolinska Institute, Stockholm, Sweden
| | - Joakim Lundeberg
- Science for Life Laboratory, Department of Gene Technology, KTH Royal Institute of Technology, Stockholm, Sweden.
- Department of Bioengineering, Stanford University, Stanford, CA, USA.
| | - Aviv Regev
- Klarman Cell Observatory, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Howard Hughes Medical Institute and Koch Institute for Integrative Cancer Research, Department of Biology, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Patrik L Ståhl
- Science for Life Laboratory, Department of Gene Technology, KTH Royal Institute of Technology, Stockholm, Sweden
| |
Collapse
|
9
|
Wang R, Nambiar R, Zheng D, Tian B. PolyA_DB 3 catalogs cleavage and polyadenylation sites identified by deep sequencing in multiple genomes. Nucleic Acids Res 2019; 46:D315-D319. [PMID: 29069441 PMCID: PMC5753232 DOI: 10.1093/nar/gkx1000] [Citation(s) in RCA: 143] [Impact Index Per Article: 28.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2017] [Accepted: 10/12/2017] [Indexed: 12/11/2022] Open
Abstract
PolyA_DB is a database cataloging cleavage and polyadenylation sites (PASs) in several genomes. Previous versions were based mainly on expressed sequence tags (ESTs), which had a limited amount and could lead to inaccurate PAS identification due to the presence of internal A-rich sequences in transcripts. Here, we present an updated version of the database based solely on deep sequencing data. First, PASs are mapped by the 3′ region extraction and deep sequencing (3′READS) method, ensuring unequivocal PAS identification. Second, a large volume of data based on diverse biological samples increases PAS coverage by 3.5-fold over the EST-based version and provides PAS usage information. Third, strand-specific RNA-seq data are used to extend annotated 3′ ends of genes to obtain more thorough annotations of alternative polyadenylation (APA) sites. Fourth, conservation information of PAS across mammals sheds light on significance of APA sites. The database (URL: http://www.polya-db.org/v3) currently holds PASs in human, mouse, rat and chicken, and has links to the UCSC genome browser for further visualization and for integration with other genomic data.
Collapse
Affiliation(s)
- Ruijia Wang
- Department of Microbiology, Biochemistry and Molecular Genetics, Rutgers New Jersey Medical School and Rutgers Cancer Institute of New Jersey, Newark, NJ 07103, USA
| | - Ram Nambiar
- Department of Computer Science, New Jersey Institute of Technology, Newark, NJ 07102, USA
| | - Dinghai Zheng
- Department of Microbiology, Biochemistry and Molecular Genetics, Rutgers New Jersey Medical School and Rutgers Cancer Institute of New Jersey, Newark, NJ 07103, USA
| | - Bin Tian
- Department of Microbiology, Biochemistry and Molecular Genetics, Rutgers New Jersey Medical School and Rutgers Cancer Institute of New Jersey, Newark, NJ 07103, USA
| |
Collapse
|
10
|
Zhu S, Wu X, Fu H, Ye C, Chen M, Jiang Z, Ji G. Modeling of Genome-Wide Polyadenylation Signals in Xenopus tropicalis. Front Genet 2019; 10:647. [PMID: 31333724 PMCID: PMC6616101 DOI: 10.3389/fgene.2019.00647] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2019] [Accepted: 06/18/2019] [Indexed: 12/22/2022] Open
Abstract
Alternative polyadenylation (APA) is an important post-transcriptional modification event to process messenger RNA (mRNA) for transcriptional termination, transport, and translation. In the present study, we characterized poly(A) signals in Xenopus tropicalis using 70,918 highly confident poly(A) sites derived from 16,511 protein-coding genes to understand their roles in the regulation of embryo development and gender difference. We examined potential factors, including the gene length, the number of introns in a gene, and the intron length, that may affect the prevalence of APA. We observed 12 prominent poly(A) signal patterns, which accounted for approximately 92% of total APA sites in Xenopus tropicalis. Among them, three patterns are specific to X. tropicalis, so they are absent in other animals such as humans or mice. We catalogued APA sites based on their genomic regions and developed a bioinformatics pipeline to identify over-represented signal patterns for each class. Then the schema of cis elements for APA sites in each genomic region was proposed. More importantly, APA usage is dramatically dynamic in embryos along five developmental stages and well-coordinated with the maternal-to-zygotic transition event. We used an entropy-based method to identify developmental stage-specific APA sites and identified significant signal patterns around specific sites and constitutive sites. We found that the APA frequency in different genomic regions varies with developmental stages and that those sites located in intron or coding sequence regions contribute most to the dynamics of gene expression during developmental stages. This study deciphers the characteristics and poly(A) signal patterns for both canonical APA sites and non-canonical APA sites across different developmental stages and gender dimorphisms in X. tropicalis, providing new insights into the dynamic regulation of distal and proximal APA.
Collapse
Affiliation(s)
- Sheng Zhu
- Department of Automation, Xiamen University, Xiamen, China.,National Institute for Data Science in Health and Medicine, Xiamen University, Xiamen, China
| | - Xiaohui Wu
- Department of Automation, Xiamen University, Xiamen, China.,National Institute for Data Science in Health and Medicine, Xiamen University, Xiamen, China.,Innovation Center for Cell Signaling Network, Xiamen University, Xiamen, China
| | - Hongjuan Fu
- Department of Automation, Xiamen University, Xiamen, China
| | - Congting Ye
- National Institute for Data Science in Health and Medicine, Xiamen University, Xiamen, China.,Key Laboratory of the Ministry of Education for Coastal and Wetland Ecosystems, College of the Environment and Ecology, Xiamen University, Xiamen, China
| | - Moliang Chen
- Department of Automation, Xiamen University, Xiamen, China
| | - Zhihua Jiang
- Department of Animal Sciences and Center for Reproductive Biology, Washington State University, Pullman, WA, United States
| | - Guoli Ji
- Department of Automation, Xiamen University, Xiamen, China.,National Institute for Data Science in Health and Medicine, Xiamen University, Xiamen, China.,Innovation Center for Cell Signaling Network, Xiamen University, Xiamen, China
| |
Collapse
|
11
|
Genome-wide atlas of alternative polyadenylation in the forage legume red clover. Sci Rep 2018; 8:11379. [PMID: 30054540 PMCID: PMC6063945 DOI: 10.1038/s41598-018-29699-7] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2018] [Accepted: 07/05/2018] [Indexed: 12/13/2022] Open
Abstract
Studies on prevalence and significance of alternative polyadenylation (APA) in plants have been so far limited mostly to the model plants. Here, a genome-wide analysis of APA was carried out in different tissue types in the non-model forage legume red clover (Trifolium pratense L). A profile of poly(A) sites in different tissue types was generated using so-called 'poly(A)-tag sequencing' (PATseq) approach. Our analysis revealed tissue-wise dynamics of usage of poly(A) sites located at different genomic locations. We also identified poly(A) sites and underlying genes displaying APA in different tissues. Functional categories enriched in groups of genes manifesting APA between tissue types were determined. Analysis of spatial expression of genes encoding different poly(A) factors showed significant differential expression of genes encoding orthologs of FIP1(V) and PCFS4, suggesting that these two factors may play a role in regulating spatial APA in red clover. Our analysis also revealed a high degree of conservation in diverse plant species of APA events in mRNAs encoding two key polyadenylation factors, CPSF30 and FIP1(V). Together with our previously reported study of spatial gene expression in red clover, this study will provide a comprehensive account of transcriptome dynamics in this non-model forage legume.
Collapse
|
12
|
Navarro JF, Sjöstrand J, Salmén F, Lundeberg J, Ståhl PL. ST Pipeline: an automated pipeline for spatial mapping of unique transcripts. Bioinformatics 2018; 33:2591-2593. [PMID: 28398467 DOI: 10.1093/bioinformatics/btx211] [Citation(s) in RCA: 50] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2016] [Accepted: 04/06/2017] [Indexed: 01/01/2023] Open
Abstract
Motivation In recent years we have witnessed an increase in novel RNA-seq based techniques for transcriptomics analysis. Spatial transcriptomics is a novel RNA-seq based technique that allows spatial mapping of transcripts in tissue sections. The spatial resolution adds an extra level of complexity, which requires the development of new tools and algorithms for efficient and accurate data processing. Results Here we present a pipeline to automatically and efficiently process RNA-seq data obtained from spatial transcriptomics experiments to generate datasets for downstream analysis. Availability and implementation The ST Pipeline is open source under a MIT license and it is available at https://github.com/SpatialTranscriptomicsResearch/st_pipeline. Contact jose.fernandez.navarro@scilifelab.se. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- José Fernández Navarro
- Division of Gene Technology, School of Biotechnology, Royal Institute of Technology (KTH), SE-106 91 Science for Life Laboratory, Solna, Sweden
| | - Joel Sjöstrand
- Division of Gene Technology, School of Biotechnology, Royal Institute of Technology (KTH), SE-106 91 Science for Life Laboratory, Solna, Sweden
| | - Fredrik Salmén
- Division of Gene Technology, School of Biotechnology, Royal Institute of Technology (KTH), SE-106 91 Science for Life Laboratory, Solna, Sweden
| | - Joakim Lundeberg
- Division of Gene Technology, School of Biotechnology, Royal Institute of Technology (KTH), SE-106 91 Science for Life Laboratory, Solna, Sweden
| | - Patrik L Ståhl
- Division of Gene Technology, School of Biotechnology, Royal Institute of Technology (KTH), SE-106 91 Science for Life Laboratory, Solna, Sweden.,Department of Cell and Molecular Biology, SE-171 77 Karolinska Institutet, Stockholm, Sweden
| |
Collapse
|
13
|
Sanfilippo P, Wen J, Lai EC. Landscape and evolution of tissue-specific alternative polyadenylation across Drosophila species. Genome Biol 2017; 18:229. [PMID: 29191225 PMCID: PMC5707805 DOI: 10.1186/s13059-017-1358-0] [Citation(s) in RCA: 41] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2017] [Accepted: 11/08/2017] [Indexed: 12/19/2022] Open
Abstract
BACKGROUND Drosophila melanogaster has one of best-described transcriptomes of any multicellular organism. Nevertheless, the paucity of 3'-sequencing data in this species precludes comprehensive assessment of alternative polyadenylation (APA), which is subject to broad tissue-specific control. RESULTS Here, we generate deep 3'-sequencing data from 23 developmental stages, tissues, and cell lines of D. melanogaster, yielding a comprehensive atlas of ~ 62,000 polyadenylated ends. These data broadly extend the annotated transcriptome, identify ~ 40,000 novel 3' termini, and reveal that two-thirds of Drosophila genes are subject to APA. Furthermore, we dramatically expand the numbers of genes known to be subject to tissue-specific APA, such as 3' untranslated region (UTR) lengthening in head and 3' UTR shortening in testis, and characterize new tissue and developmental 3' UTR patterns. Our thorough 3' UTR annotations permit reassessment of post-transcriptional regulatory networks, via conserved miRNA and RNA binding protein sites. To evaluate the evolutionary conservation and divergence of APA patterns, we generate developmental and tissue-specific 3'-seq libraries from Drosophila yakuba and Drosophila virilis. We document broadly analogous tissue-specific APA trends in these species, but also observe significant alterations in 3' end usage across orthologs. We exploit the population of functionally evolving poly(A) sites to gain clear evidence that evolutionary divergence in core polyadenylation signal (PAS) and downstream sequence element (DSE) motifs drive broad alterations in 3' UTR isoform expression across the Drosophila phylogeny. CONCLUSIONS These data provide a critical resource for the Drosophila community and offer many insights into the complex control of alternative tissue-specific 3' UTR formation and its consequences for post-transcriptional regulatory networks.
Collapse
Affiliation(s)
- Piero Sanfilippo
- Department of Developmental Biology, Sloan-Kettering Institute, New York, New York, 10065, USA
- Louis V. Gerstner, Jr. Graduate School of Biomedical Sciences, Memorial Sloan Kettering Cancer Center, New York, New York, 10065, USA
| | - Jiayu Wen
- Department of Developmental Biology, Sloan-Kettering Institute, New York, New York, 10065, USA
- Present address: Biochemistry and Biomedical Sciences, Research School of Biology, ANU College of Science, The Australian National University, Canberra, ACT 2601, Australia
| | - Eric C Lai
- Department of Developmental Biology, Sloan-Kettering Institute, New York, New York, 10065, USA.
- Louis V. Gerstner, Jr. Graduate School of Biomedical Sciences, Memorial Sloan Kettering Cancer Center, New York, New York, 10065, USA.
| |
Collapse
|
14
|
Banerjee A, Vest KE, Pavlath GK, Corbett AH. Nuclear poly(A) binding protein 1 (PABPN1) and Matrin3 interact in muscle cells and regulate RNA processing. Nucleic Acids Res 2017; 45:10706-10725. [PMID: 28977530 PMCID: PMC5737383 DOI: 10.1093/nar/gkx786] [Citation(s) in RCA: 52] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2017] [Accepted: 08/27/2017] [Indexed: 01/01/2023] Open
Abstract
The polyadenylate binding protein 1 (PABPN1) is a ubiquitously expressed RNA binding protein vital for multiple steps in RNA metabolism. Although PABPN1 plays a critical role in the regulation of RNA processing, mutation of the gene encoding this ubiquitously expressed RNA binding protein causes a specific form of muscular dystrophy termed oculopharyngeal muscular dystrophy (OPMD). Despite the tissue-specific pathology that occurs in this disease, only recently have studies of PABPN1 begun to explore the role of this protein in skeletal muscle. We have used co-immunoprecipitation and mass spectrometry to identify proteins that interact with PABPN1 in mouse skeletal muscles. Among the interacting proteins we identified Matrin 3 (MATR3) as a novel protein interactor of PABPN1. The MATR3 gene is mutated in a form of distal myopathy and amyotrophic lateral sclerosis (ALS). We demonstrate, that like PABPN1, MATR3 is critical for myogenesis. Furthermore, MATR3 controls critical aspects of RNA processing including alternative polyadenylation and intron retention. We provide evidence that MATR3 also binds and regulates the levels of long non-coding RNA (lncRNA) Neat1 and together with PABPN1 is required for normal paraspeckle function. We demonstrate that PABPN1 and MATR3 are required for paraspeckles, as well as for adenosine to inosine (A to I) RNA editing of Ctn RNA in muscle cells. We provide a functional link between PABPN1 and MATR3 through regulation of a common lncRNA target with downstream impact on paraspeckle morphology and function. We extend our analysis to a mouse model of OPMD and demonstrate altered paraspeckle morphology in the presence of endogenous levels of alanine-expanded PABPN1. In this study, we report protein-binding partners of PABPN1, which could provide insight into novel functions of PABPN1 in skeletal muscle and identify proteins that could be sequestered with alanine-expanded PABPN1 in the nuclear aggregates found in OPMD.
Collapse
Affiliation(s)
- Ayan Banerjee
- Department of Biology, Emory University, Atlanta, GA 30322, USA
| | - Katherine E Vest
- Department of Pharmacology, Emory University School of Medicine, Atlanta, GA 30322, USA
| | - Grace K Pavlath
- Department of Pharmacology, Emory University School of Medicine, Atlanta, GA 30322, USA
| | - Anita H Corbett
- Department of Biology, Emory University, Atlanta, GA 30322, USA
| |
Collapse
|
15
|
Alternative Polyadenylation: Methods, Findings, and Impacts. GENOMICS PROTEOMICS & BIOINFORMATICS 2017; 15:287-300. [PMID: 29031844 PMCID: PMC5673674 DOI: 10.1016/j.gpb.2017.06.001] [Citation(s) in RCA: 68] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/24/2017] [Revised: 06/01/2017] [Accepted: 06/03/2017] [Indexed: 12/21/2022]
Abstract
Alternative polyadenylation (APA), a phenomenon that RNA molecules with different 3' ends originate from distinct polyadenylation sites of a single gene, is emerging as a mechanism widely used to regulate gene expression. In the present review, we first summarized various methods prevalently adopted in APA study, mainly focused on the next-generation sequencing (NGS)-based techniques specially designed for APA identification, the related bioinformatics methods, and the strategies for APA study in single cells. Then we summarized the main findings and advances so far based on these methods, including the preferences of alternative polyA (pA) site, the biological processes involved, and the corresponding consequences. We especially categorized the APA changes discovered so far and discussed their potential functions under given conditions, along with the possible underlying molecular mechanisms. With more in-depth studies on extensive samples, more signatures and functions of APA will be revealed, and its diverse roles will gradually heave in sight.
Collapse
|
16
|
Genome-wide profiling of the 3' ends of polyadenylated RNAs. Methods 2017; 126:86-94. [PMID: 28602807 DOI: 10.1016/j.ymeth.2017.06.003] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2017] [Revised: 05/29/2017] [Accepted: 06/03/2017] [Indexed: 11/24/2022] Open
Abstract
Alternative polyadenylation (APA) diversifies the 3' termini of a majority of mRNAs in most eukaryotes, and is consequently inferred to have substantial consequences for the utilization of post-transcriptional regulatory mechanisms. Since conventional RNA-sequencing methods do not accurately define mRNA termini, a number of protocols have been developed that permit sequencing of the 3' ends of polyadenylated transcripts (3'-seq). We present here our experimental protocol to generate 3'-seq libraries using a dT-priming approach, including extensive details on considerations that will enable successful library cloning. We pair this with a set of computational tools that allow the user to process the raw sequence data into a filtered set of clusters that represent high-confidence functional polyadenylation sites. The data are single-nucleotide resolution and quantitative, and can be used for downstream analyses of APA.
Collapse
|
17
|
Abstract
Alternative polyadenylation (APA) is an RNA-processing mechanism that generates distinct 3' termini on mRNAs and other RNA polymerase II transcripts. It is widespread across all eukaryotic species and is recognized as a major mechanism of gene regulation. APA exhibits tissue specificity and is important for cell proliferation and differentiation. In this Review, we discuss the roles of APA in diverse cellular processes, including mRNA metabolism, protein diversification and protein localization, and more generally in gene regulation. We also discuss the molecular mechanisms underlying APA, such as variation in the concentration of core processing factors and RNA-binding proteins, as well as transcription-based regulation.
Collapse
|
18
|
Erson-Bensan AE, Can T. Alternative Polyadenylation: Another Foe in Cancer. Mol Cancer Res 2016; 14:507-17. [PMID: 27075335 DOI: 10.1158/1541-7786.mcr-15-0489] [Citation(s) in RCA: 35] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2015] [Accepted: 03/30/2016] [Indexed: 11/16/2022]
Abstract
Advancements in sequencing and transcriptome analysis methods have led to seminal discoveries that have begun to unravel the complexity of cancer. These studies are paving the way toward the development of improved diagnostics, prognostic predictions, and targeted treatment options. However, it is clear that pieces of the cancer puzzle are still missing. In an effort to have a more comprehensive understanding of the development and progression of cancer, we have come to appreciate the value of the noncoding regions of our genomes, partly due to the discovery of miRNAs and their significance in gene regulation. Interestingly, the miRNA-mRNA interactions are not solely dependent on variations in miRNA levels. Instead, the majority of genes harbor multiple polyadenylation signals on their 3' UTRs (untranslated regions) that can be differentially selected on the basis of the physiologic state of cells, resulting in alternative 3' UTR isoforms. Deregulation of alternative polyadenylation (APA) has increasing interest in cancer research, because APA generates mRNA 3' UTR isoforms with potentially different stabilities, subcellular localizations, translation efficiencies, and functions. This review focuses on the link between APA and cancer and discusses the mechanisms as well as the tools available for investigating APA events in cancer. Overall, detection of deregulated APA-generated isoforms in cancer may implicate some proto-oncogene activation cases of unknown causes and may help the discovery of novel cases; thus, contributing to a better understanding of molecular mechanisms of cancer. Mol Cancer Res; 14(6); 507-17. ©2016 AACR.
Collapse
Affiliation(s)
- Ayse Elif Erson-Bensan
- Department of Biological Sciences, Middle East Technical University (METU) (ODTU), Ankara, Turkey.
| | - Tolga Can
- Department of Computer Engineering, Middle East Technical University (METU) (ODTU), Ankara, Turkey
| |
Collapse
|
19
|
Lum KK, Cristea IM. Proteomic approaches to uncovering virus-host protein interactions during the progression of viral infection. Expert Rev Proteomics 2016; 13:325-40. [PMID: 26817613 PMCID: PMC4919574 DOI: 10.1586/14789450.2016.1147353] [Citation(s) in RCA: 66] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2015] [Accepted: 01/25/2016] [Indexed: 01/10/2023]
Abstract
The integration of proteomic methods to virology has facilitated a significant breadth of biological insight into mechanisms of virus replication, antiviral host responses and viral subversion of host defenses. Throughout the course of infection, these cellular mechanisms rely heavily on the formation of temporally and spatially regulated virus-host protein-protein interactions. Reviewed here are proteomic-based approaches that have been used to characterize this dynamic virus-host interplay. Specifically discussed are the contribution of integrative mass spectrometry, antibody-based affinity purification of protein complexes, cross-linking and protein array techniques for elucidating complex networks of virus-host protein associations during infection with a diverse range of RNA and DNA viruses. The benefits and limitations of applying proteomic methods to virology are explored, and the contribution of these approaches to important biological discoveries and to inspiring new tractable avenues for the design of antiviral therapeutics is highlighted.
Collapse
Affiliation(s)
- Krystal K Lum
- Department of Molecular Biology, Princeton
University, Princeton, NJ, USA
| | - Ileana M Cristea
- Department of Molecular Biology, Princeton
University, Princeton, NJ, USA
| |
Collapse
|
20
|
Evolutionary analysis of selective constraints identifies ameloblastin (AMBN) as a potential candidate for amelogenesis imperfecta. BMC Evol Biol 2015. [PMID: 26223266 PMCID: PMC4518657 DOI: 10.1186/s12862-015-0431-0] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Open
Abstract
Background Ameloblastin (AMBN) is a phosphorylated, proline/glutamine-rich protein secreted during enamel formation. Previous studies have revealed that this enamel matrix protein was present early in vertebrate evolution and certainly plays important roles during enamel formation although its precise functions remain unclear. We performed evolutionary analyses of AMBN in order to (i) identify residues and motifs important for the protein function, (ii) predict mutations responsible for genetic diseases, and (iii) understand its molecular evolution in mammals. Results In silico searches retrieved 56 complete sequences in public databases that were aligned and analyzed computationally. We showed that AMBN is globally evolving under moderate purifying selection in mammals and contains a strong phylogenetic signal. In addition, our analyses revealed codons evolving under significant positive selection. Evidence for positive selection acting on AMBN was observed in catarrhine primates and the aye-aye. We also found that (i) an additional translation initiation site was recruited in the ancestral placental AMBN, (ii) a short exon was duplicated several times in various species including catarrhine primates, and (iii) several polyadenylation sites are present. Conclusions AMBN possesses many positions, which have been subjected to strong selective pressure for 200 million years. These positions correspond to several cleavage sites and hydroxylated, O-glycosylated, and phosphorylated residues. We predict that these conserved positions would be potentially responsible for enamel disorder if substituted. Some motifs that were previously identified as potentially important functionally were confirmed, and we found two, highly conserved, new motifs, the function of which should be tested in the near future. This study illustrates the power of evolutionary analyses for characterizing the functional constraints acting on proteins with yet uncharacterized structure. Electronic supplementary material The online version of this article (doi:10.1186/s12862-015-0431-0) contains supplementary material, which is available to authorized users.
Collapse
|
21
|
Whole transcriptome analysis with sequencing: methods, challenges and potential solutions. Cell Mol Life Sci 2015; 72:3425-39. [PMID: 26018601 DOI: 10.1007/s00018-015-1934-y] [Citation(s) in RCA: 130] [Impact Index Per Article: 14.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2015] [Revised: 04/25/2015] [Accepted: 05/21/2015] [Indexed: 10/23/2022]
Abstract
Whole transcriptome analysis plays an essential role in deciphering genome structure and function, identifying genetic networks underlying cellular, physiological, biochemical and biological systems and establishing molecular biomarkers that respond to diseases, pathogens and environmental challenges. Here, we review transcriptome analysis methods and technologies that have been used to conduct whole transcriptome shotgun sequencing or whole transcriptome tag/target sequencing analyses. We focus on how adaptors/linkers are added to both 5' and 3' ends of mRNA molecules for cloning or PCR amplification before sequencing. Challenges and potential solutions are also discussed. In brief, next generation sequencing platforms have accelerated releases of the large amounts of gene expression data. It is now time for the genome research community to assemble whole transcriptomes of all species and collect signature targets for each gene/transcript, and thus use known genes/transcripts to determine known transcriptomes directly in the near future.
Collapse
|
22
|
Bahrami-Samani E, Vo DT, de Araujo PR, Vogel C, Smith AD, Penalva LOF, Uren PJ. Computational challenges, tools, and resources for analyzing co- and post-transcriptional events in high throughput. WILEY INTERDISCIPLINARY REVIEWS. RNA 2015; 6:291-310. [PMID: 25515586 PMCID: PMC4397117 DOI: 10.1002/wrna.1274] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/16/2014] [Revised: 10/24/2014] [Accepted: 10/29/2014] [Indexed: 11/10/2022]
Abstract
Co- and post-transcriptional regulation of gene expression is complex and multifaceted, spanning the complete RNA lifecycle from genesis to decay. High-throughput profiling of the constituent events and processes is achieved through a range of technologies that continue to expand and evolve. Fully leveraging the resulting data is nontrivial, and requires the use of computational methods and tools carefully crafted for specific data sources and often intended to probe particular biological processes. Drawing upon databases of information pre-compiled by other researchers can further elevate analyses. Within this review, we describe the major co- and post-transcriptional events in the RNA lifecycle that are amenable to high-throughput profiling. We place specific emphasis on the analysis of the resulting data, in particular the computational tools and resources available, as well as looking toward future challenges that remain to be addressed.
Collapse
Affiliation(s)
- Emad Bahrami-Samani
- Molecular and Computational Biology, Department of Biological Sciences, University of Southern California, Los Angeles, CA
| | - Dat T. Vo
- Children’s Cancer Research Institute and Department of Cellular and Structural Biology, University of Texas Health Science Center, San Antonio, TX
| | - Patricia Rosa de Araujo
- Children’s Cancer Research Institute and Department of Cellular and Structural Biology, University of Texas Health Science Center, San Antonio, TX
| | - Christine Vogel
- Center for Genomics and Systems Biology, Department of Biology, New York University, New York, NY
| | - Andrew D. Smith
- Molecular and Computational Biology, Department of Biological Sciences, University of Southern California, Los Angeles, CA
| | - Luiz O. F. Penalva
- Children’s Cancer Research Institute and Department of Cellular and Structural Biology, University of Texas Health Science Center, San Antonio, TX
| | - Philip J. Uren
- Molecular and Computational Biology, Department of Biological Sciences, University of Southern California, Los Angeles, CA
| |
Collapse
|
23
|
Müller S, Rycak L, Afonso-Grunz F, Winter P, Zawada AM, Damrath E, Scheider J, Schmäh J, Koch I, Kahl G, Rotter B. APADB: a database for alternative polyadenylation and microRNA regulation events. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2014; 2014:bau076. [PMID: 25052703 PMCID: PMC4105710 DOI: 10.1093/database/bau076] [Citation(s) in RCA: 68] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Abstract
Alternative polyadenylation (APA) is a widespread mechanism that contributes to the sophisticated dynamics of gene regulation. Approximately 50% of all protein-coding human genes harbor multiple polyadenylation (PA) sites; their selective and combinatorial use gives rise to transcript variants with differing length of their 3′ untranslated region (3′UTR). Shortened variants escape UTR-mediated regulation by microRNAs (miRNAs), especially in cancer, where global 3′UTR shortening accelerates disease progression, dedifferentiation and proliferation. Here we present APADB, a database of vertebrate PA sites determined by 3′ end sequencing, using massive analysis of complementary DNA ends. APADB provides (A)PA sites for coding and non-coding transcripts of human, mouse and chicken genes. For human and mouse, several tissue types, including different cancer specimens, are available. APADB records the loss of predicted miRNA binding sites and visualizes next-generation sequencing reads that support each PA site in a genome browser. The database tables can either be browsed according to organism and tissue or alternatively searched for a gene of interest. APADB is the largest database of APA in human, chicken and mouse. The stored information provides experimental evidence for thousands of PA sites and APA events. APADB combines 3′ end sequencing data with prediction algorithms of miRNA binding sites, allowing to further improve prediction algorithms. Current databases lack correct information about 3′UTR lengths, especially for chicken, and APADB provides necessary information to close this gap. Database URL:http://tools.genxpro.net/apadb/
Collapse
Affiliation(s)
- Sören Müller
- Plant Molecular Biology, Molecular BioSciences, University of Frankfurt am Main, Marie-Curie-Street 9, D-60439 Frankfurt, Germany, GenXPro GmbH, Frankfurt Innovation Center Biotechnology, Altenhöferallee 3, D-60438 Frankfurt, Germany, Molecular Bioinformatics Group, Faculty of Computer Science and Mathematics, Cluster of Excellence Frankfurt "Macromolecular Complexes", Institute of Computer Science, Robert-Mayer-Strasse 11-15, D-60325 Frankfurt am Main, Germany, Department of Internal Medicine IV; Saarland University Medical Center, Kirrberger Strasse, D-66421 Homburg/Saar, Germany, Experimental Neurology, Department of Neurology, Goethe University Medical School, Heinrich, Hoffmann Strasse 7, D-60528 Frankfurt am Main, Germany, Institute for Ecology, Evolution and Diversity, Aquatic Ecotoxicology, University of Frankfurt am Main, Max-von-Laue-Str. 13, D-60438 Frankfurt, Germany and Department of Pediatrics, University Hospital Schleswig-Holstein, Schwanenweg 20, D-24105 Kiel, GermanyPlant Molecular Biology, Molecular BioSciences, University of Frankfurt am Main, Marie-Curie-Street 9, D-60439 Frankfurt, Germany, GenXPro GmbH, Frankfurt Innovation Center Biotechnology, Altenhöferallee 3, D-60438 Frankfurt, Germany, Molecular Bioinformatics Group, Faculty of Computer Science and Mathematics, Cluster of Excellence Frankfurt "Macromolecular Complexes", Institute of Computer Science, Robert-Mayer-Strasse 11-15, D-60325 Frankfurt am Main, Germany, Department of Internal Medicine IV; Saarland University Medical Center, Kirrberger Strasse, D-66421 Homburg/Saar, Germany, Experimental Neurology, Department of Neurology, Goethe University Medical School, Heinrich, Hoffmann Strasse 7, D-60528 Frankfurt am Main, Germany, Institute for Ecology, Evolution and Diversity, Aquatic Ecotoxicology, University of Frankfurt am Main, Max-von-Laue-Str. 13, D-60438 Frankfurt, Germany and Department of Pediatrics, University Hospital Schleswig-Holstein, Schwanenweg 20, D-24105 Kiel, Germany
| | - Lukas Rycak
- Plant Molecular Biology, Molecular BioSciences, University of Frankfurt am Main, Marie-Curie-Street 9, D-60439 Frankfurt, Germany, GenXPro GmbH, Frankfurt Innovation Center Biotechnology, Altenhöferallee 3, D-60438 Frankfurt, Germany, Molecular Bioinformatics Group, Faculty of Computer Science and Mathematics, Cluster of Excellence Frankfurt "Macromolecular Complexes", Institute of Computer Science, Robert-Mayer-Strasse 11-15, D-60325 Frankfurt am Main, Germany, Department of Internal Medicine IV; Saarland University Medical Center, Kirrberger Strasse, D-66421 Homburg/Saar, Germany, Experimental Neurology, Department of Neurology, Goethe University Medical School, Heinrich, Hoffmann Strasse 7, D-60528 Frankfurt am Main, Germany, Institute for Ecology, Evolution and Diversity, Aquatic Ecotoxicology, University of Frankfurt am Main, Max-von-Laue-Str. 13, D-60438 Frankfurt, Germany and Department of Pediatrics, University Hospital Schleswig-Holstein, Schwanenweg 20, D-24105 Kiel, Germany
| | - Fabian Afonso-Grunz
- Plant Molecular Biology, Molecular BioSciences, University of Frankfurt am Main, Marie-Curie-Street 9, D-60439 Frankfurt, Germany, GenXPro GmbH, Frankfurt Innovation Center Biotechnology, Altenhöferallee 3, D-60438 Frankfurt, Germany, Molecular Bioinformatics Group, Faculty of Computer Science and Mathematics, Cluster of Excellence Frankfurt "Macromolecular Complexes", Institute of Computer Science, Robert-Mayer-Strasse 11-15, D-60325 Frankfurt am Main, Germany, Department of Internal Medicine IV; Saarland University Medical Center, Kirrberger Strasse, D-66421 Homburg/Saar, Germany, Experimental Neurology, Department of Neurology, Goethe University Medical School, Heinrich, Hoffmann Strasse 7, D-60528 Frankfurt am Main, Germany, Institute for Ecology, Evolution and Diversity, Aquatic Ecotoxicology, University of Frankfurt am Main, Max-von-Laue-Str. 13, D-60438 Frankfurt, Germany and Department of Pediatrics, University Hospital Schleswig-Holstein, Schwanenweg 20, D-24105 Kiel, GermanyPlant Molecular Biology, Molecular BioSciences, University of Frankfurt am Main, Marie-Curie-Street 9, D-60439 Frankfurt, Germany, GenXPro GmbH, Frankfurt Innovation Center Biotechnology, Altenhöferallee 3, D-60438 Frankfurt, Germany, Molecular Bioinformatics Group, Faculty of Computer Science and Mathematics, Cluster of Excellence Frankfurt "Macromolecular Complexes", Institute of Computer Science, Robert-Mayer-Strasse 11-15, D-60325 Frankfurt am Main, Germany, Department of Internal Medicine IV; Saarland University Medical Center, Kirrberger Strasse, D-66421 Homburg/Saar, Germany, Experimental Neurology, Department of Neurology, Goethe University Medical School, Heinrich, Hoffmann Strasse 7, D-60528 Frankfurt am Main, Germany, Institute for Ecology, Evolution and Diversity, Aquatic Ecotoxicology, University of Frankfurt am Main, Max-von-Laue-Str. 13, D-60438 Frankfurt, Germany and Department of Pediatrics, University Hospital Schleswig-Holstein, Schwanenweg 20, D-24105 Kiel, Germany
| | - Peter Winter
- Plant Molecular Biology, Molecular BioSciences, University of Frankfurt am Main, Marie-Curie-Street 9, D-60439 Frankfurt, Germany, GenXPro GmbH, Frankfurt Innovation Center Biotechnology, Altenhöferallee 3, D-60438 Frankfurt, Germany, Molecular Bioinformatics Group, Faculty of Computer Science and Mathematics, Cluster of Excellence Frankfurt "Macromolecular Complexes", Institute of Computer Science, Robert-Mayer-Strasse 11-15, D-60325 Frankfurt am Main, Germany, Department of Internal Medicine IV; Saarland University Medical Center, Kirrberger Strasse, D-66421 Homburg/Saar, Germany, Experimental Neurology, Department of Neurology, Goethe University Medical School, Heinrich, Hoffmann Strasse 7, D-60528 Frankfurt am Main, Germany, Institute for Ecology, Evolution and Diversity, Aquatic Ecotoxicology, University of Frankfurt am Main, Max-von-Laue-Str. 13, D-60438 Frankfurt, Germany and Department of Pediatrics, University Hospital Schleswig-Holstein, Schwanenweg 20, D-24105 Kiel, Germany
| | - Adam M Zawada
- Plant Molecular Biology, Molecular BioSciences, University of Frankfurt am Main, Marie-Curie-Street 9, D-60439 Frankfurt, Germany, GenXPro GmbH, Frankfurt Innovation Center Biotechnology, Altenhöferallee 3, D-60438 Frankfurt, Germany, Molecular Bioinformatics Group, Faculty of Computer Science and Mathematics, Cluster of Excellence Frankfurt "Macromolecular Complexes", Institute of Computer Science, Robert-Mayer-Strasse 11-15, D-60325 Frankfurt am Main, Germany, Department of Internal Medicine IV; Saarland University Medical Center, Kirrberger Strasse, D-66421 Homburg/Saar, Germany, Experimental Neurology, Department of Neurology, Goethe University Medical School, Heinrich, Hoffmann Strasse 7, D-60528 Frankfurt am Main, Germany, Institute for Ecology, Evolution and Diversity, Aquatic Ecotoxicology, University of Frankfurt am Main, Max-von-Laue-Str. 13, D-60438 Frankfurt, Germany and Department of Pediatrics, University Hospital Schleswig-Holstein, Schwanenweg 20, D-24105 Kiel, Germany
| | - Ewa Damrath
- Plant Molecular Biology, Molecular BioSciences, University of Frankfurt am Main, Marie-Curie-Street 9, D-60439 Frankfurt, Germany, GenXPro GmbH, Frankfurt Innovation Center Biotechnology, Altenhöferallee 3, D-60438 Frankfurt, Germany, Molecular Bioinformatics Group, Faculty of Computer Science and Mathematics, Cluster of Excellence Frankfurt "Macromolecular Complexes", Institute of Computer Science, Robert-Mayer-Strasse 11-15, D-60325 Frankfurt am Main, Germany, Department of Internal Medicine IV; Saarland University Medical Center, Kirrberger Strasse, D-66421 Homburg/Saar, Germany, Experimental Neurology, Department of Neurology, Goethe University Medical School, Heinrich, Hoffmann Strasse 7, D-60528 Frankfurt am Main, Germany, Institute for Ecology, Evolution and Diversity, Aquatic Ecotoxicology, University of Frankfurt am Main, Max-von-Laue-Str. 13, D-60438 Frankfurt, Germany and Department of Pediatrics, University Hospital Schleswig-Holstein, Schwanenweg 20, D-24105 Kiel, Germany
| | - Jessica Scheider
- Plant Molecular Biology, Molecular BioSciences, University of Frankfurt am Main, Marie-Curie-Street 9, D-60439 Frankfurt, Germany, GenXPro GmbH, Frankfurt Innovation Center Biotechnology, Altenhöferallee 3, D-60438 Frankfurt, Germany, Molecular Bioinformatics Group, Faculty of Computer Science and Mathematics, Cluster of Excellence Frankfurt "Macromolecular Complexes", Institute of Computer Science, Robert-Mayer-Strasse 11-15, D-60325 Frankfurt am Main, Germany, Department of Internal Medicine IV; Saarland University Medical Center, Kirrberger Strasse, D-66421 Homburg/Saar, Germany, Experimental Neurology, Department of Neurology, Goethe University Medical School, Heinrich, Hoffmann Strasse 7, D-60528 Frankfurt am Main, Germany, Institute for Ecology, Evolution and Diversity, Aquatic Ecotoxicology, University of Frankfurt am Main, Max-von-Laue-Str. 13, D-60438 Frankfurt, Germany and Department of Pediatrics, University Hospital Schleswig-Holstein, Schwanenweg 20, D-24105 Kiel, Germany
| | - Juliane Schmäh
- Plant Molecular Biology, Molecular BioSciences, University of Frankfurt am Main, Marie-Curie-Street 9, D-60439 Frankfurt, Germany, GenXPro GmbH, Frankfurt Innovation Center Biotechnology, Altenhöferallee 3, D-60438 Frankfurt, Germany, Molecular Bioinformatics Group, Faculty of Computer Science and Mathematics, Cluster of Excellence Frankfurt "Macromolecular Complexes", Institute of Computer Science, Robert-Mayer-Strasse 11-15, D-60325 Frankfurt am Main, Germany, Department of Internal Medicine IV; Saarland University Medical Center, Kirrberger Strasse, D-66421 Homburg/Saar, Germany, Experimental Neurology, Department of Neurology, Goethe University Medical School, Heinrich, Hoffmann Strasse 7, D-60528 Frankfurt am Main, Germany, Institute for Ecology, Evolution and Diversity, Aquatic Ecotoxicology, University of Frankfurt am Main, Max-von-Laue-Str. 13, D-60438 Frankfurt, Germany and Department of Pediatrics, University Hospital Schleswig-Holstein, Schwanenweg 20, D-24105 Kiel, Germany
| | - Ina Koch
- Plant Molecular Biology, Molecular BioSciences, University of Frankfurt am Main, Marie-Curie-Street 9, D-60439 Frankfurt, Germany, GenXPro GmbH, Frankfurt Innovation Center Biotechnology, Altenhöferallee 3, D-60438 Frankfurt, Germany, Molecular Bioinformatics Group, Faculty of Computer Science and Mathematics, Cluster of Excellence Frankfurt "Macromolecular Complexes", Institute of Computer Science, Robert-Mayer-Strasse 11-15, D-60325 Frankfurt am Main, Germany, Department of Internal Medicine IV; Saarland University Medical Center, Kirrberger Strasse, D-66421 Homburg/Saar, Germany, Experimental Neurology, Department of Neurology, Goethe University Medical School, Heinrich, Hoffmann Strasse 7, D-60528 Frankfurt am Main, Germany, Institute for Ecology, Evolution and Diversity, Aquatic Ecotoxicology, University of Frankfurt am Main, Max-von-Laue-Str. 13, D-60438 Frankfurt, Germany and Department of Pediatrics, University Hospital Schleswig-Holstein, Schwanenweg 20, D-24105 Kiel, Germany
| | - Günter Kahl
- Plant Molecular Biology, Molecular BioSciences, University of Frankfurt am Main, Marie-Curie-Street 9, D-60439 Frankfurt, Germany, GenXPro GmbH, Frankfurt Innovation Center Biotechnology, Altenhöferallee 3, D-60438 Frankfurt, Germany, Molecular Bioinformatics Group, Faculty of Computer Science and Mathematics, Cluster of Excellence Frankfurt "Macromolecular Complexes", Institute of Computer Science, Robert-Mayer-Strasse 11-15, D-60325 Frankfurt am Main, Germany, Department of Internal Medicine IV; Saarland University Medical Center, Kirrberger Strasse, D-66421 Homburg/Saar, Germany, Experimental Neurology, Department of Neurology, Goethe University Medical School, Heinrich, Hoffmann Strasse 7, D-60528 Frankfurt am Main, Germany, Institute for Ecology, Evolution and Diversity, Aquatic Ecotoxicology, University of Frankfurt am Main, Max-von-Laue-Str. 13, D-60438 Frankfurt, Germany and Department of Pediatrics, University Hospital Schleswig-Holstein, Schwanenweg 20, D-24105 Kiel, Germany
| | - Björn Rotter
- Plant Molecular Biology, Molecular BioSciences, University of Frankfurt am Main, Marie-Curie-Street 9, D-60439 Frankfurt, Germany, GenXPro GmbH, Frankfurt Innovation Center Biotechnology, Altenhöferallee 3, D-60438 Frankfurt, Germany, Molecular Bioinformatics Group, Faculty of Computer Science and Mathematics, Cluster of Excellence Frankfurt "Macromolecular Complexes", Institute of Computer Science, Robert-Mayer-Strasse 11-15, D-60325 Frankfurt am Main, Germany, Department of Internal Medicine IV; Saarland University Medical Center, Kirrberger Strasse, D-66421 Homburg/Saar, Germany, Experimental Neurology, Department of Neurology, Goethe University Medical School, Heinrich, Hoffmann Strasse 7, D-60528 Frankfurt am Main, Germany, Institute for Ecology, Evolution and Diversity, Aquatic Ecotoxicology, University of Frankfurt am Main, Max-von-Laue-Str. 13, D-60438 Frankfurt, Germany and Department of Pediatrics, University Hospital Schleswig-Holstein, Schwanenweg 20, D-24105 Kiel, Germany
| |
Collapse
|
24
|
Zheng D, Tian B. RNA-binding proteins in regulation of alternative cleavage and polyadenylation. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2014; 825:97-127. [PMID: 25201104 DOI: 10.1007/978-1-4939-1221-6_3] [Citation(s) in RCA: 41] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Almost all eukaryotic pre-mRNAs are processed at the 3' end by the cleavage and polyadenylation (C/P) reaction, which preludes termination of transcription and gives rise to the poly(A) tail of mature mRNA. Genomic studies in recent years have indicated that most eukaryotic mRNA genes have multiple cleavage and polyadenylation sites (pAs), leading to alternative cleavage and polyadenylation (APA) products. APA isoforms generally differ in their 3' untranslated regions (3' UTRs), but can also have different coding sequences (CDSs). APA expands the repertoire of transcripts expressed from the genome, and is highly regulated under various physiological and pathological conditions. Growing lines of evidence have shown that RNA-binding proteins (RBPs) play important roles in regulation of APA. Some RBPs are part of the machinery for C/P; others influence pA choice through binding to adjacent regions. In this chapter, we review cis elements and trans factors involved in C/P, the significance of APA, and increasingly elucidated roles of RBPs in APA regulation. We also discuss analysis of APA using transcriptome-wide techniques as well as molecular biology approaches.
Collapse
Affiliation(s)
- Dinghai Zheng
- Department of Biochemistry and Molecular Biology, University of Medicine and Dentistry of New Jersey (UMDNJ)-New Jersey Medical School, 185 South Orange Ave., Newark, NJ, 07103, USA
| | | |
Collapse
|
25
|
Li XQ, Du D. RNA polyadenylation sites on the genomes of microorganisms, animals, and plants. PLoS One 2013; 8:e79511. [PMID: 24260238 PMCID: PMC3832601 DOI: 10.1371/journal.pone.0079511] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2013] [Accepted: 09/29/2013] [Indexed: 01/15/2023] Open
Abstract
Pre–messenger RNA (mRNA) 3′-end cleavage and subsequent polyadenylation strongly regulate gene expression. In comparison with the upstream or downstream motifs, relatively little is known about the feature differences of polyadenylation [poly(A)] sites among major kingdoms. We suspect that the precise poly(A) sites are very selective, and we therefore mapped mRNA poly(A) sites on complete and nearly complete genomes using mRNA sequences available in the National Center for Biotechnology Information (NCBI) Nucleotide database. In this paper, we describe the mRNA nucleotide [i.e., the poly(A) tail attachment position] that is directly in attachment with the poly(A) tail and the pre-mRNA nucleotide [i.e., the poly(A) tail starting position] that corresponds to the first adenosine of the poly(A) tail in the 29 most-mapped species (2 fungi, 2 protists, 18 animals, and 7 plants). The most representative pre-mRNA dinucleotides covering these two positions were UA, CA, and GA in 17, 10, and 2 of the species, respectively. The pre-mRNA nucleotide at the poly(A) tail starting position was typically an adenosine [i.e., A-type poly(A) sites], sometimes a uridine, and occasionally a cytidine or guanosine. The order was U>C>G at the attachment position but A>>U>C≥G at the starting position. However, in comparison with the mRNA nucleotide composition (base composition), the poly(A) tail attachment position selected C over U in plants and both C and G over U in animals, in both A-type and non-A-type poly(A) sites. Animals, dicot plants, and monocot plants had clear differences in C/G ratios at the poly(A) tail attachment position of the non-A-type poly(A) sites. This study of poly(A) site evolution indicated that the two positions within poly(A) sites had distinct nucleotide compositions and were different among kingdoms.
Collapse
Affiliation(s)
- Xiu-Qing Li
- Molecular Genetics Laboratory, Potato Research Centre, Agriculture and Agri-Food Canada, Fredericton, New Brunswick, Canada
- * E-mail:
| | - Donglei Du
- Quantitative Methods Research Group, Faculty of Business Administration, University of New Brunswick, Fredericton, New Brunswick, Canada
| |
Collapse
|
26
|
Elkon R, Ugalde AP, Agami R. Alternative cleavage and polyadenylation: extent, regulation and function. Nat Rev Genet 2013; 14:496-506. [PMID: 23774734 DOI: 10.1038/nrg3482] [Citation(s) in RCA: 564] [Impact Index Per Article: 51.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
The 3' end of most protein-coding genes and long non-coding RNAs is cleaved and polyadenylated. Recent discoveries have revealed that a large proportion of these genes contains more than one polyadenylation site. Therefore, alternative polyadenylation (APA) is a widespread phenomenon, generating mRNAs with alternative 3' ends. APA contributes to the complexity of the transcriptome by generating isoforms that differ either in their coding sequence or in their 3' untranslated regions (UTRs), thereby potentially regulating the function, stability, localization and translation efficiency of target RNAs. Here, we review our current understanding of the polyadenylation process and the latest progress in the identification of APA events, mechanisms that regulate poly(A) site selection, and biological processes and diseases resulting from APA.
Collapse
Affiliation(s)
- Ran Elkon
- Division of Gene Regulation, The Netherlands Cancer Institute, Plesmanlaan 121, 1066 CX Amsterdam, The Netherlands
| | | | | |
Collapse
|
27
|
Molecular cloning, expression profiles and subcellular localization of cyclin B in ovary of the mud crab, Scylla paramamosain. Genes Genomics 2013. [DOI: 10.1007/s13258-013-0077-5] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
|
28
|
Sun Y, Fu Y, Li Y, Xu A. Genome-wide alternative polyadenylation in animals: insights from high-throughput technologies. J Mol Cell Biol 2012; 4:352-61. [DOI: 10.1093/jmcb/mjs041] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/14/2023] Open
|
29
|
Tian B, Graber JH. Signals for pre-mRNA cleavage and polyadenylation. WILEY INTERDISCIPLINARY REVIEWS-RNA 2011; 3:385-96. [PMID: 22012871 DOI: 10.1002/wrna.116] [Citation(s) in RCA: 159] [Impact Index Per Article: 12.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/10/2023]
Abstract
Pre-mRNA cleavage and polyadenylation is an essential step for 3' end formation of almost all protein-coding transcripts in eukaryotes. The reaction, involving cleavage of nascent mRNA followed by addition of a polyadenylate or poly(A) tail, is controlled by cis-acting elements in the pre-mRNA surrounding the cleavage site. Experimental and bioinformatic studies in the past three decades have elucidated conserved and divergent elements across eukaryotes, from yeast to human. Here we review histories and current models of these elements in a broad range of species.
Collapse
Affiliation(s)
- Bin Tian
- UMDNJ-New Jersey Medical School, Newark, NJ, USA.
| | | |
Collapse
|
30
|
Olafson PU, Temeyer KB, Pruett JH. Multiple transcripts encode glucose 6-phosphate dehydrogenase in the southern cattle tick, Rhipicephalus (Boophilus) microplus. EXPERIMENTAL & APPLIED ACAROLOGY 2011; 53:147-165. [PMID: 20711800 DOI: 10.1007/s10493-010-9392-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/28/2010] [Accepted: 07/16/2010] [Indexed: 05/29/2023]
Abstract
Glucose 6-phosphate dehydrogenase (G6PDH) is an enzyme that plays a critical role in the production of NADPH. Here we describe the identification of four transcripts (G6PDH-A, -B, -C, and -D) that putatively encode the enzyme in the southern cattle tick, Rhipicephalus (Boophilus) microplus. The genomic DNA that is spliced to produce G6PDH-A and -B is 8,600-9,000 bases in length and comprises 12 exons. Comparison of the R. microplus G6PDH gene structure with those available from insects and mammals revealed that the tick gene is most like that of humans. Detection of the four transcripts was evaluated by quantitative RT-PCR using template from larvae, unfed adult females and males, salivary gland tissues from 2- to 3-day-fed adult females and males, and salivary gland tissue of 4- to 5-day-fed adult females. The G6PDH-A and -C transcripts were present in all templates, and both displayed induced expression in salivary gland tissue of fed, adult females but not matched males. The G6PDH-D transcript was detected only in unfed adults and in larvae, a stage in which it was most abundant relative to the other three transcripts. The G6PDH-B transcript, while detectable in all templates, was of low copy number suggesting it is a rare transcript. Induced expression of G6PDH-A and G6PDH-C in fed females may play a role in the tolerance of oxidative stress that is induced upon feeding, and the transcript abundance in fed females may be a function of bloodmeal volume and the time adult females spend on the host relative to adult males.
Collapse
Affiliation(s)
- Pia Untalan Olafson
- USDA, Agricultural Research Service, Knipling-Bushland U. S. Livestock Insects Research Laboratory, 2700 Fredericksburg Rd., Kerrville, TX 78028, USA.
| | | | | |
Collapse
|
31
|
Neilson JR, Sandberg R. Heterogeneity in mammalian RNA 3' end formation. Exp Cell Res 2010; 316:1357-64. [PMID: 20211174 DOI: 10.1016/j.yexcr.2010.02.040] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2010] [Accepted: 02/28/2010] [Indexed: 11/19/2022]
Abstract
Precisely directed cleavage and polyadenylation of mRNA is a fundamental part of eukaryotic gene expression. Yet, 3' end heterogeneity has been documented for thousands of mammalian genes, and usage of one cleavage and polyadenylation signal over another has been shown to impact gene expression in many cases. Building upon the rich biochemical and genetic understanding of the 3' end formation, recent genomic studies have begun to suggest that widespread changes in mRNA cleavage and polyadenylation may be a part of large, dynamic gene regulatory programs. In this review, we begin with a modest overview of the studies that defined the mechanisms of mammalian 3' end formation, and then discuss how recent genomic studies intersect with these more traditional approaches, showing that both will be crucial for expanding our understanding of this facet of gene regulation.
Collapse
Affiliation(s)
- Joel R Neilson
- Department of Molecular Physiology and Biophysics and Dan L. Duncan Cancer Center, Baylor College of Medicine, Houston, TX 77030, USA
| | | |
Collapse
|
32
|
Prediction of non-canonical polyadenylation signals in human genomic sequences based on a novel algorithm using a fuzzy membership function. J Biosci Bioeng 2009; 107:569-78. [DOI: 10.1016/j.jbiosc.2009.01.001] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2008] [Revised: 01/05/2009] [Accepted: 01/05/2009] [Indexed: 11/23/2022]
|
33
|
van Hooff SR, Koster J, Hulsen T, van Schaik BDC, Roos M, van Batenburg MF, Versteeg R, van Kampen AHC. The construction of genome-based transcriptional units. OMICS : A JOURNAL OF INTEGRATIVE BIOLOGY 2009; 13:105-14. [PMID: 19320556 DOI: 10.1089/omi.2008.0036] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
Gene-oriented sequence clusters (transcriptional units) have found many applications in genomics research including the construction of transcriptome maps and identification of splice variants. We developed a new method to construct transcriptional that uses the genomic sequence as a template. We present and discuss our method in detail together with an evaluation of the transcriptional units for human. We constructed 33,007 and 27,792 transcriptional units for human and mouse, respectively. The sensitivity (81%) and specificity (90%) of our method compares favorably to other established methods. We evaluated the representation of experimentally validated and predicted intergenic spliced transcripts in humans and show that we correctly represent a large fraction of these cases by single transcriptional units. Our method performs well, but the evaluation of the final set of transcriptional units show that improvements to the algorithm are still possible. However, because the precise number and types of errors are difficult to track, it is not obvious how to significantly improve the algorithm. We believe that ongoing research efforts are necessary to further improve current methods. This should include detailed documentation, comparison, and evaluation of current methods.
Collapse
Affiliation(s)
- Sander R van Hooff
- Bioinformatics Laboratory, Academic Medical Center, Meibergdreef 9, Amsterdam, The Netherlands
| | | | | | | | | | | | | | | |
Collapse
|
34
|
Vija H, Samel M, Siigur E, Aaspõllu A, Tõnismägi K, Trummal K, Subbi J, Siigur J. VGD and MLD-motifs containing heterodimeric disintegrin viplebedin-2 from Vipera lebetina snake venom. Purification and cDNA cloning. Comp Biochem Physiol B Biochem Mol Biol 2009; 153:253-60. [PMID: 19296915 DOI: 10.1016/j.cbpb.2009.03.002] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2008] [Revised: 03/05/2009] [Accepted: 03/10/2009] [Indexed: 11/29/2022]
Abstract
We have previously demonstrated that the fibrinolytic enzyme lebetase is synthesized with disintegrin-like domain that is cleaved posttranslationally (Siigur et al., 1996). Now we isolated a heterodimeric disintegrin viplebedin-2 containing this disintegrin-like part from Vipera lebetina venom using size-exclusion chromatography on Sephadex G-100 sf and HPLC on C18 column. The molecular masses of viplebedin-2 and tryptic peptides from both chains of viplebedin-2 were determined by MALDI-TOF mass spectrometry. Using cDNA library of the venom gland of a single V. lebetina turanica snake the viplebedin-2 coding cDNAs were cloned and sequenced. Viplebedin-2 chains are synthesized from two different genes. One chain, containing VGD sequence in disintegrin loop, is synthesized as a disintegrin-like part of the PII-type metalloprotease, lebetase. The other chain, containing MLD sequence in disintegrin loop, is synthesized from the gene without metalloproteinase domain. Two polyadenylation signal sequences have been found in MLD sequence coding chain precursor cDNAs. Viplebedin-2 dose-dependently inhibited adhesion of platelets to immobilized collagen and inhibited collagen-induced platelet aggregation.
Collapse
Affiliation(s)
- Heiki Vija
- National Institute of Chemical Physics and Biophysics, Akadeemia tee 23,Tallinn 12618, Estonia
| | | | | | | | | | | | | | | |
Collapse
|
35
|
Bonnet A, Iannuccelli E, Hugot K, Benne F, Bonaldo MF, Soares MB, Hatey F, Tosser-Klopp G. A pig multi-tissue normalised cDNA library: large-scale sequencing, cluster analysis and 9K micro-array resource generation. BMC Genomics 2008; 9:17. [PMID: 18194535 PMCID: PMC2257943 DOI: 10.1186/1471-2164-9-17] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2007] [Accepted: 01/14/2008] [Indexed: 11/30/2022] Open
Abstract
Background Domestic animal breeding and product quality improvement require the control of reproduction, nutrition, health and welfare in these animals. It is thus necessary to improve our knowledge of the major physiological functions and their interactions. This would be greatly enhanced by the availability of expressed gene sequences in the databases and by cDNA arrays allowing the transcriptome analysis of any function. The objective within the AGENAE French program was to initiate a high-throughput cDNA sequencing program of a 38-tissue normalised library and generate a diverse microarray for transcriptome analysis in pig species. Results We constructed a multi-tissue cDNA library, which was normalised and subtracted to reduce the redundancy of the clones. Expressed Sequence Tags were produced and 24449 high-quality sequences were released in EMBL database. The assembly of all the public ESTs (available through SIGENAE website) resulted in 40786 contigs and 54653 singletons. At least one Agenae sequence is present in 11969 contigs (12.5%) and in 9291 of the deeper-than-one-contigs (22.8%). Sequence analysis showed that both normalisation and subtraction processes were successful and that the initial tissue complexity was maintained in the final libraries. A 9K nylon cDNA microarray was produced and is available through CRB-GADIE. It will allow high sensitivity transcriptome analyses in pigs. Conclusion In the present work, a pig multi-tissue cDNA library was constructed and a 9K cDNA microarray designed. It contributes to the Expressed Sequence Tags pig data, and offers a valuable tool for transcriptome analysis.
Collapse
Affiliation(s)
- Agnès Bonnet
- Laboratoire de Génétique Cellulaire, INRA, UMR444, Institut National de la Recherche Agronomique, F-31326 Castanet-Tolosan, France.
| | | | | | | | | | | | | | | |
Collapse
|
36
|
Lee JY, Park JY, Tian B. Identification of mRNA polyadenylation sites in genomes using cDNA sequences, expressed sequence tags, and Trace. Methods Mol Biol 2008; 419:23-37. [PMID: 18369973 DOI: 10.1007/978-1-59745-033-1_2] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/14/2023]
Abstract
Polyadenylation of nascent transcripts is an essential step for most mRNAs in eukaryotic cells. It is directly involved in the termination of transcription and is coupled with other steps of pre-mRNA processing. Recent studies have shown that transcript variants resulting from alternative polyadenylation are widespread for human and mouse genes, contributing to the complexity of mRNA pool in the cell. In addition to 3'-most exons, alternative polyadenylation sites (or poly(A) sites) can be located in internal exons and introns. Identification of poly(A) sites in genomes is critical for understanding the occurrence and significance of alternative polyadenylation events. Bioinformatic methods using cDNA sequences, Expressed Sequence Tags (ESTs), and Trace offer a sensitive and systematic approach to detect poly(A) sites in genomes. Various criteria can be employed to enhance the specificity of the detection, including identifying sequences derived from internal priming of mRNA and polyadenylated RNAs during degradation.
Collapse
Affiliation(s)
- Ju Youn Lee
- Department of Biochemistry and Molecular Biology, New Jersey Medical School, University of Medicine and Dentistry of New Jersey, Newark, NJ, USA
| | | | | |
Collapse
|
37
|
Abstract
In recent years, genome-wide detection of alternative splicing based on Expressed Sequence Tag (EST) sequence alignments with mRNA and genomic sequences has dramatically expanded our understanding of the role of alternative splicing in functional regulation. This chapter reviews the data, methodology, and technical challenges of these genome-wide analyses of alternative splicing, and briefly surveys some of the uses to which such alternative splicing databases have been put. For example, with proper alternative splicing database schema design, it is possible to query genome-wide for alternative splicing patterns that are specific to particular tissues, disease states (e.g., cancer), gender, or developmental stages. EST alignments can be used to estimate exon inclusion or exclusion level of alternatively spliced exons and evolutionary changes for various species can be inferred from exon inclusion level. Such databases can also help automate design of probes for RT-PCR and microarrays, enabling high throughput experimental measurement of alternative splicing.
Collapse
|
38
|
Feng Z, Wu CF, Zhou X, Kuang J. Alternative polyadenylation produces two major transcripts of Alix. Arch Biochem Biophys 2007; 465:328-35. [PMID: 17673164 PMCID: PMC4104816 DOI: 10.1016/j.abb.2007.06.025] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2007] [Revised: 06/21/2007] [Accepted: 06/21/2007] [Indexed: 01/23/2023]
Abstract
The mammalian adaptor protein Alix participates in multiple cellular processes. Since mouse Alix cDNA detects two distinct transcripts of approximately 3.5 and approximately 7.0 kb in various mouse tissues, it is possible that there exist isoforms of Alix protein that perform varied biological functions. In this study, we first demonstrate that four different anti-Alix monoclonal antibodies immunoblot the single Alix protein in nine different mouse tissues. We then show that the two transcripts of 3.2 and 6.4 kb are widely expressed in various human tissues and cell lines. These two transcripts are generated from the same Alix gene localizing at 3p22.3 via alternative polyadenylation, thus containing an identical open reading frame. However, the 3.2-kb transcript is much more active in translation than the 6.4-kb transcript in a randomly selected cell line. These results eliminate the possibility that the two transcript variants encode different isoforms of Alix protein and suggest that alternative polyadenylation is one of the mechanisms controlling Alix protein expression.
Collapse
Affiliation(s)
| | | | - Xi Zhou
- Departments of Experimental Therapeutics, The University of Texas M. D. Anderson Cancer Center, Houston, TX 77030
| | - Jian Kuang
- Departments of Experimental Therapeutics, The University of Texas M. D. Anderson Cancer Center, Houston, TX 77030
| |
Collapse
|
39
|
Abstract
Polyadenylation of nascent transcripts is one of the key mRNA processing events in eukaryotic cells. A large number of human and mouse genes have alternative polyadenylation sites, or poly(A) sites, leading to mRNA variants with different protein products and/or 3′-untranslated regions (3′-UTRs). PolyA_DB 2 contains poly(A) sites identified for genes in several vertebrate species, including human, mouse, rat, chicken and zebrafish, using alignments between cDNA/ESTs and genome sequences. Several new features have been added to the database since its last release, including syntenic genome regions for human poly(A) sites in seven other vertebrates and cis-element information adjacent to poly(A) sites. Trace sequences are used to provide additional evidence for poly(A/T) tails in cDNA/ESTs. The updated database is intended to broaden poly(A) site coverage in vertebrate genomes, and provide means to assess the authenticity of poly(A) sites identified by bioinformatics. The URL for this database is .
Collapse
Affiliation(s)
| | | | | | - Bin Tian
- To whom correspondence should be addressed. Tel: +1 973 972 3615; Fax: +1 973 972 5594;
| |
Collapse
|
40
|
Liu D, Brockman JM, Dass B, Hutchins LN, Singh P, McCarrey JR, MacDonald CC, Graber JH. Systematic variation in mRNA 3'-processing signals during mouse spermatogenesis. Nucleic Acids Res 2006; 35:234-46. [PMID: 17158511 PMCID: PMC1802579 DOI: 10.1093/nar/gkl919] [Citation(s) in RCA: 103] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/16/2023] Open
Abstract
Gene expression and processing during mouse male germ cell maturation (spermatogenesis) is highly specialized. Previous reports have suggested that there is a high incidence of alternative 3′-processing in male germ cell mRNAs, including reduced usage of the canonical polyadenylation signal, AAUAAA. We used EST libraries generated from mouse testicular cells to identify 3′-processing sites used at various stages of spermatogenesis (spermatogonia, spermatocytes and round spermatids) and testicular somatic Sertoli cells. We assessed differences in 3′-processing characteristics in the testicular samples, compared to control sets of widely used 3′-processing sites. Using a new method for comparison of degenerate regulatory elements between sequence samples, we identified significant changes in the use of putative 3′-processing regulatory sequence elements in all spermatogenic cell types. In addition, we observed a trend towards truncated 3′-untranslated regions (3′-UTRs), with the most significant differences apparent in round spermatids. In contrast, Sertoli cells displayed a much smaller trend towards 3′-UTR truncation and no significant difference in 3′-processing regulatory sequences. Finally, we identified a number of genes encoding mRNAs that were specifically subject to alternative 3′-processing during meiosis and postmeiotic development. Our results highlight developmental differences in polyadenylation site choice and in the elements that likely control them during spermatogenesis.
Collapse
Affiliation(s)
- Donglin Liu
- The Jackson Laboratory, 600 Main StreetBar Harbor, ME 04609, USA
| | - J. Michael Brockman
- The Jackson Laboratory, 600 Main StreetBar Harbor, ME 04609, USA
- Bioinformatics Program, Boston University24 Cummington Street, Boston, MA 02215, USA
| | - Brinda Dass
- Department of Cell Biology and Biochemistry, Texas Tech University Health Sciences CenterLubbock, TX 79430, USA
| | | | - Priyam Singh
- The Jackson Laboratory, 600 Main StreetBar Harbor, ME 04609, USA
- Bioinformatics Program, Boston University24 Cummington Street, Boston, MA 02215, USA
| | - John R. McCarrey
- Department of Biology, University of Texas at San AntonioSan Antonio, TX 78249, USA
| | - Clinton C. MacDonald
- Department of Cell Biology and Biochemistry, Texas Tech University Health Sciences CenterLubbock, TX 79430, USA
| | - Joel H. Graber
- The Jackson Laboratory, 600 Main StreetBar Harbor, ME 04609, USA
- Bioinformatics Program, Boston University24 Cummington Street, Boston, MA 02215, USA
- To whom correspondence should be addressed. Tel: +1 207 288 6847; Fax: +1 207 288 6073;
| |
Collapse
|
41
|
Jia J, Fu J, Zheng J, Zhou X, Huai J, Wang J, Wang M, Zhang Y, Chen X, Zhang J, Zhao J, Su Z, Lv Y, Wang G. Annotation and expression profile analysis of 2073 full-length cDNAs from stress-induced maize (Zea mays L.) seedlings. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2006; 48:710-27. [PMID: 17076806 DOI: 10.1111/j.1365-313x.2006.02905.x] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/12/2023]
Abstract
Full-length cDNAs are very important for genome annotation and functional analysis of genes. The number of full-length cDNAs from maize (Zea mays L.) remains limited. Here we report the construction of a full-length enriched cDNA library from osmotically stressed maize seedlings by using the modified CAP trapper method. From this library, 2073 full-length cDNAs were collected and further analyzed by sequencing from both the 5'- and 3'-ends. A total of 1728 (83.4%) sequences did not match known maize mRNA and full-length cDNA sequences in the GenBank database and represent new full-length genes. After alignment of the 2073 full-length cDNAs with 448 maize BAC sequences, it was found that 84 full-length cDNAs could be mapped to the BACs. Of these, 43 genes (51.2%) have been correctly annotated from the BAC clones, 37 genes (44.0%) have been annotated with a different exon-intron structure from our cDNA, and four genes (4.76%) had no annotations in the TIGR database. Expression analysis of 2073 full-length maize cDNAs using a cDNA macroarray led to the identification of 79 genes upregulated by stress treatments and 329 downregulated genes. Of the 79 stress-inducible genes, 30 genes contain ABRE, DRE, MYB, MYC core sequences or other abiotic-responsive cis-acting elements in their promoters. These results suggest that these cis-acting elements and the corresponding transcription factors take part in plant responses to osmotic stress either cooperatively or independently. Additionally, the data suggest that an ethylene signaling pathway may be involved in the maize response to drought stress.
Collapse
Affiliation(s)
- Jinping Jia
- State Key Laboratory of Agrobiotechnology and National Center for Maize Improvement, China Agricultural University, Beijing, China
| | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
42
|
Dong H, Deng Y, Chen J, Wang S, Peng S, Dai C, Fang Y, Shao J, Lou Y, Li D. An exploration of 3'-end processing signals and their tissue distribution in Oryza sativa. Gene 2006; 389:107-13. [PMID: 17187943 DOI: 10.1016/j.gene.2006.10.015] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2006] [Revised: 07/20/2006] [Accepted: 10/13/2006] [Indexed: 11/22/2022]
Abstract
The 3' untranslated regions deeply affect many properties of eukaryotic mRNA. In plants, the polyadenine control signals contained in these regions seem to be more variable than of mammals. Three cDNA libraries derived from the leaf, endosperm and stem tissues of rice were sequenced from the 3'-end. Of the 9911 transcripts analyzed, 5723 unique transcripts were identified from the leaf sequences, 2934 from the endosperm and 1254 from the stem. The information entropy and two statistical methods were used to compile a list of rice poly(A) control signals. Based on their distribution, these signals can be roughly grouped into far-upstream element (FUE), near-upstream element (NUE), T-rich region (TRE) and downstream element (DE). The distribution of rice conserved regions is similar to the previous model from Arabidopsis and yeast, with a few differences in word constructions. Interestingly, we also found the word distributions were diverse in the cleavage site of downstream sequences of different rice tissues. The signal bias in downstream sequences may lead mRNA to be differently cleaved in different rice tissues.
Collapse
Affiliation(s)
- Haitao Dong
- Bioinformatics and Gene Network Research Group, College of Agriculture and Biotechnology, Zhejiang University, Hangzhou, China.
| | | | | | | | | | | | | | | | | | | |
Collapse
|
43
|
Legendre M, Ritchie W, Lopez F, Gautheret D. Differential repression of alternative transcripts: a screen for miRNA targets. PLoS Comput Biol 2006; 2:e43. [PMID: 16699595 PMCID: PMC1458965 DOI: 10.1371/journal.pcbi.0020043] [Citation(s) in RCA: 51] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2005] [Accepted: 03/21/2006] [Indexed: 01/10/2023] Open
Abstract
Alternative polyadenylation sites produce transcript isoforms with 3′ untranslated regions (UTRs) of different lengths. If a microRNA (miRNA) target is present in the UTR, then only those target-containing isoforms should be sensitive to control by a cognate miRNA. We carried out a systematic examination of 3′ UTRs containing multiple poly(A) sites and putative miRNA targets. Based on expressed sequence tag (EST) counts and EST library information, we observed that levels of isoforms containing targets for miR-1 or miR-124, two miRNAs causing downregulation of transcript levels, were reduced in tissues expressing the corresponding miRNA. This analysis was repeated for all conserved 7-mers in 3′ UTRs, resulting in a selection of 312 motifs. We show that this set is significantly enriched in known miRNA targets and mRNA-destabilizing elements, which validates our initial hypothesis. We scanned the human genome for possible cognate miRNAs and identified phylogenetically conserved precursors matching our motifs. This analysis can help identify target-miRNA couples that went undetected in previous screens, but it may also reveal targets for other types of regulatory factors. MicroRNAs (miRNAs) are short RNA molecules that recognize specific target sequences in the 3′ region of mRNAs. These miRNAs can then specifically keep the mRNAs from being expressed, or translated into proteins. In this article, the authors ask what happens when a targeted mRNA has several forms differing by their 3′ regions. Such 3′ variations are very common. If two or more variations are present in a single mRNA, the result is two or more mRNAs with 3′ ends of different lengths. If an miRNA target is located between the two sites of variability, the shorter transcript should be target free and should escape miRNA-mediated inhibition, while longer transcripts should be inhibited. To test this hypothesis, the authors looked at mRNAs that had these variable 3′ ends. Variants containing targets for certain miRNAs appeared to be specifically underrepresented in tissues where these particular miRNAs are found. This principle was used to find other sequence patterns in 3′ regions that had a similar effect, and a list of 312 significant patterns was obtained. The authors then scanned genome sequences and identified possible cognate miRNAs for these patterns. This new knowledge will help further an understanding of how genes are controlled.
Collapse
Affiliation(s)
| | - William Ritchie
- INSERM ERM 206, Université de la Méditerranée, Marseille, France
| | - Fabrice Lopez
- INSERM ERM 206, Université de la Méditerranée, Marseille, France
| | - Daniel Gautheret
- INSERM ERM 206, Université de la Méditerranée, Marseille, France
- * To whom correspondence should be addressed. E-mail:
| |
Collapse
|
44
|
Liu D, Graber JH. Quantitative comparison of EST libraries requires compensation for systematic biases in cDNA generation. BMC Bioinformatics 2006; 7:77. [PMID: 16503995 PMCID: PMC1431573 DOI: 10.1186/1471-2105-7-77] [Citation(s) in RCA: 30] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2005] [Accepted: 02/17/2006] [Indexed: 12/28/2022] Open
Abstract
Background Publicly accessible EST libraries contain valuable information that can be utilized for studies of tissue-specific gene expression and processing of individual genes. This information is, however, confounded by multiple systematic effects arising from the procedures used to generate these libraries. Results We used alignment of ESTs against a reference set of transcripts to estimate the size distributions of the cDNA inserts and sampled mRNA transcripts in individual EST libraries and show how these measurements can be used to inform quantitative comparisons of libraries. While significant attention has been paid to the effects of normalization and substraction, we also find significant biases in transcript sampling introduced by the combined procedures of reverse transcription and selection of cDNA clones for sequencing. Using examples drawn from studies of mRNA 3'-processing (cleavage and polyadenylation), we demonstrate effects of the transcript sampling bias, and provide a method for identifying libraries that can be safely compared without bias. All data sets, supplemental data, and software are available at our supplemental web site [1]. Conclusion The biases we characterize in the transcript sampling of EST libraries represent a significant and heretofore under-appreciated source of false positive candidates for tissue-, cell type-, or developmental stage-specific activity or processing of genes. Uncorrected, quantitative comparison of dissimilar EST libraries will likely result in the identification of statistically significant, but biologically meaningless changes.
Collapse
Affiliation(s)
- Donglin Liu
- The Jackson Laboratory, 600 Main Street, Bar Harbor, ME 04609, USA
| | - Joel H Graber
- The Jackson Laboratory, 600 Main Street, Bar Harbor, ME 04609, USA
| |
Collapse
|
45
|
Soret J, Gabut M, Tazi J. SR Proteins as Potential Targets for Therapy. ALTERNATIVE SPLICING AND DISEASE 2006; 44:65-87. [PMID: 17076265 DOI: 10.1007/978-3-540-34449-0_4] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
Serine- and arginine-rich (SR) proteins constitute a highly conserved family of pre-mRNA splicing factors that play key roles in the regulation of splice site selection, and thereby in the control of alternative splicing processes. In addition to conserved sequences at the splice junctions, splice site selection also depends upon different sets of auxiliary cis regulatory elements known as exonic and intronic splicing enhancers (ESEs and ISEs) or exonic and intronic silencers (ESSs and ISSs). Specific binding of SR proteins to their cognate splicing enhancers as well as binding of splicing repressor to silencer sequences serve to enhance or inhibit recognition of weak splice sites by the splicing machinery. Given that the vast majority of human genes contain introns and that most pre-mRNAs containing multiple exons undergo alternative splicing, mutations disrupting or creating such auxiliary elements can result in aberrant splicing events at the origin of various human diseases. In the past few years, numerous studies have reported several approaches allowing correction of such aberrant splicing events by targeting either the mutated sequences or the splicing regulators whose binding is affected by the mutation. The aim of the present review is to highlight the different means by which it is possible to modulate the activity of SR splicing factors and to bring out those holding the greatest promises for the development of therapeutic treatments.
Collapse
Affiliation(s)
- Johann Soret
- Institut de Génétique Moléculaire de Montpellier, UMR 5535, IFR 122, Centre National de Recherche Scientifique, 1919, route de Mende, 34293 Montpellier, France
| | | | | |
Collapse
|
46
|
Pratt LH, Liang C, Shah M, Sun F, Wang H, Reid SP, Gingle AR, Paterson AH, Wing R, Dean R, Klein R, Nguyen HT, Ma HM, Zhao X, Morishige DT, Mullet JE, Cordonnier-Pratt MM. Sorghum expressed sequence tags identify signature genes for drought, pathogenesis, and skotomorphogenesis from a milestone set of 16,801 unique transcripts. PLANT PHYSIOLOGY 2005; 139:869-84. [PMID: 16169961 PMCID: PMC1256002 DOI: 10.1104/pp.105.066134] [Citation(s) in RCA: 17] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/27/2005] [Revised: 07/20/2005] [Accepted: 07/26/2005] [Indexed: 05/04/2023]
Abstract
Improved knowledge of the sorghum transcriptome will enhance basic understanding of how plants respond to stresses and serve as a source of genes of value to agriculture. Toward this goal, Sorghum bicolor L. Moench cDNA libraries were prepared from light- and dark-grown seedlings, drought-stressed plants, Colletotrichum-infected seedlings and plants, ovaries, embryos, and immature panicles. Other libraries were prepared with meristems from Sorghum propinquum (Kunth) Hitchc. that had been photoperiodically induced to flower, and with rhizomes from S. propinquum and johnsongrass (Sorghum halepense L. Pers.). A total of 117,682 expressed sequence tags (ESTs) were obtained representing both 3' and 5' sequences from about half that number of cDNA clones. A total of 16,801 unique transcripts, representing tentative UniScripts (TUs), were identified from 55,783 3' ESTs. Of these TUs, 9,032 are represented by two or more ESTs. Collectively, these libraries were predicted to contain a total of approximately 31,000 TUs. Individual libraries, however, were predicted to contain no more than about 6,000 to 9,000, with the exception of light-grown seedlings, which yielded an estimate of close to 13,000. In addition, each library exhibits about the same level of complexity with respect to both the number of TUs preferentially expressed in that library and the frequency with which two or more ESTs is found in only that library. These results indicate that the sorghum genome is expressed in highly selective fashion in the individual organs and in response to the environmental conditions surveyed here. Close to 2,000 differentially expressed TUs were identified among the cDNA libraries examined, of which 775 were differentially expressed at a confidence level of 98%. From these 775 TUs, signature genes were identified defining drought, Colletotrichum infection, skotomorphogenesis (etiolation), ovary, immature panicle, and embryo.
Collapse
Affiliation(s)
- Lee H Pratt
- Department of Plant Biology, University of Georgia, Athens, 30602, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
47
|
Brockman JM, Singh P, Liu D, Quinlan S, Salisbury J, Graber JH. PACdb: PolyA Cleavage Site and 3'-UTR Database. Bioinformatics 2005; 21:3691-3. [PMID: 16030070 DOI: 10.1093/bioinformatics/bti589] [Citation(s) in RCA: 48] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
UNLABELLED The PolyA Cleavage Site and 3'-UTR Database (PACdb) is a web-accessible database that catalogs putative 3'-processing sites and 3'-UTR sequences for multiple organisms. Sites have been identified primarily via expressed sequence tag-genome alignments, enabling delineation of both the specificities and heterogeneity of 3'-processing events. AVAILABILITY By web browser or CGI: PACdb: http://harlequin.jax.org/pacdb/; AtPACdb: http://harlequin.jax.org/atpacdb/. SUPPLEMENTARY INFORMATION Available online at http://harlequin.jax.org/pacdb/supplemental.php.
Collapse
|
48
|
Abstract
BACKGROUND Clustering the ESTs from a large dataset representing a single species is a convenient starting point for a number of investigations into gene discovery, genome evolution, expression patterns, and alternatively spliced transcripts. Several methods have been developed to accomplish this, the most widely available being UniGene, a public domain collection of gene-oriented clusters for over 45 different species created and maintained by NCBI. The goal is for each cluster to represent a unique gene, but currently it is not known how closely the overall results represent that reality. UniGene's build procedure begins with initial mRNA clusters before joining ESTs. UniGene's results for soybean indicate a significant amount of redundancy among some sequences reported to be unique mRNAs. To establish a valid non-redundant known gene set for Glycine max we applied our algorithm to the clustering of only mRNA sequences. The mRNA dataset was run through the algorithm using two different matching stringencies. The resulting cluster compositions were compared to each other and to UniGene. Clusters exhibiting differences among the three methods were analyzed by 1) nucleotide and amino acid alignment and 2) submitting authors conclusions to determine whether members of a single cluster represented the same gene or not. RESULTS Of the 12 clusters that were examined closely most contained examples of sequences that did not belong in the same cluster. However, neither the two stringencies of PECT nor UniGene had a significantly greater record of accuracy in placing paralogs into separate clusters. CONCLUSION Our results reveal that, although each method produces some errors, using multiple stringencies for matching or a sequential hierarchical method of increasing stringencies can provide more reliable results and therefore allow greater confidence in the vast majority of clusters that contain only ESTs and no mRNA sequences.
Collapse
Affiliation(s)
- Ronald L Frank
- Biological Sciences Department, University of Missouri-Rolla, Rolla, MO, USA
| | - Fikret Ercal
- Computer Science Department, University of Missouri-Rolla, Rolla, MO, USA
| |
Collapse
|
49
|
Taraszka JA, Gao X, Valentine SJ, Sowell RA, Koeniger SL, Miller DF, Kaufman TC, Clemmer DE. Proteome Profiling for Assessing Diversity: Analysis of Individual Heads of Drosophila melanogaster Using LC−Ion Mobility−MS. J Proteome Res 2005; 4:1238-47. [PMID: 16083273 DOI: 10.1021/pr050037o] [Citation(s) in RCA: 35] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
The proteomes of three heads of individual Drosophila melanogaster organisms have been analyzed and compared by a combination of liquid chromatography, ion mobility spectrometry, and mass spectrometry approaches. In total, 197 proteins are identified among all three individuals (an average of 120 +/- 20 proteins per individual), of which at least 101 proteins are present in all three individuals. Within all three datasets, more than 25 000 molecular ions (an average of 9000 +/- 2000 per individual) corresponding to protonated precursor ions of individual peptides have been observed. A comparison of peaks among the datasets reveals that peaks corresponding to protonated peptides that are found in all heads are more intense than those features that appear between pairs of or within only one of the individuals. Moreover, there is little variability in the relative intensities of the peaks common among all individuals. It appears that it is the lower abundance components of the proteome that play the most significant role in determining unique features of individuals.
Collapse
Affiliation(s)
- John A Taraszka
- Department of Chemistry, Indiana University, Bloomington, IN 47405, USA
| | | | | | | | | | | | | | | |
Collapse
|
50
|
Nakao M, Barrero RA, Mukai Y, Motono C, Suwa M, Nakai K. Large-scale analysis of human alternative protein isoforms: pattern classification and correlation with subcellular localization signals. Nucleic Acids Res 2005; 33:2355-63. [PMID: 15860772 PMCID: PMC1087780 DOI: 10.1093/nar/gki520] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2004] [Revised: 12/28/2004] [Accepted: 03/29/2005] [Indexed: 01/09/2023] Open
Abstract
We investigated human alternative protein isoforms of >2600 genes based on full-length cDNA clones and SwissProt. We classified the isoforms and examined their co-occurrence for each gene. Further, we investigated potential relationships between these changes and differential subcellular localization. The two most abundant patterns were the one with different C-terminal regions and the one with an internal insertion, which together account for 43% of the total. Although changes of the N-terminal region are less common than those of the C-terminal region, extension of the C-terminal region is much less common than that of the N-terminal region, probably because of the difficulty of removing stop codons in one isoform. We also found that there are some frequently used combinations of co-occurrence in alternative isoforms. We interpret this as evidence that there is some structural relationship which produces a repertoire of isoformal patterns. Finally, many terminal changes are predicted to cause differential subcellular localization, especially in targeting either peroxisomes or mitochondria. Our study sheds new light on the enrichment of the human proteome through alternative splicing and related events. Our database of alternative protein isoforms is available through the internet.
Collapse
Affiliation(s)
- Mitsuteru Nakao
- Human Genome Center, Institute of Medical Science, University of TokyoTokyo, Japan
- Computational Biology Research Center, National Institute of Advanced Industry Science and TechnologyTokyo, Japan
| | - Roberto A. Barrero
- Center for Information Biology and DNA Data Bank Japan, National Institute of GeneticsShizuoka, Japan
| | - Yuri Mukai
- Computational Biology Research Center, National Institute of Advanced Industry Science and TechnologyTokyo, Japan
| | - Chie Motono
- Computational Biology Research Center, National Institute of Advanced Industry Science and TechnologyTokyo, Japan
| | - Makiko Suwa
- Computational Biology Research Center, National Institute of Advanced Industry Science and TechnologyTokyo, Japan
| | - Kenta Nakai
- Human Genome Center, Institute of Medical Science, University of TokyoTokyo, Japan
| |
Collapse
|