1
|
Liu M, Chen J, Zhang C, Liu S, Chao X, Yang H, Muhammad A, Zhou B, Ao W, Schinckel AP. Deciphering Estrus Expression in Gilts: The Role of Alternative Polyadenylation and LincRNAs in Reproductive Transcriptomics. Animals (Basel) 2024; 14:791. [PMID: 38473176 DOI: 10.3390/ani14050791] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2024] [Revised: 02/29/2024] [Accepted: 03/01/2024] [Indexed: 03/14/2024] Open
Abstract
The fertility rate and litter size of female pigs are critically affected by the expression of estrus. The objective of this study was to elucidate the regulatory mechanisms of estrus expression by analyzing the differential expression of genes and long intergenic non-coding RNAs (lincRNA), as well as the utilization of alternative polyadenylation (APA) sites, in the vulva and vagina during the estrus and diestrus stages of Large White and indigenous Chinese Mi gilts. Our study revealed that the number of differentially expressed genes (DEG) in the vulva was less than that in the vagina, and the DEGs in the vulva were enriched in pathways such as "neural" pathways and steroid hormone responses, including the "Calcium signaling pathway" and "Oxytocin signaling pathway". The DEGs in the vagina were enriched in the "Metabolic pathways" and "VEGF signaling pathway". Furthermore, 27 and 21 differentially expressed lincRNAs (DEL), whose target genes were enriched in the "Endocrine resistance" pathway, were identified in the vulva and vagina, respectively. Additionally, we observed that 63 and 618 transcripts of the 3'-untranslated region (3'-UTR) were lengthened during estrus in the vulva and vagina, respectively. Interestingly, the genes undergoing APA events in the vulva exhibited species-specific enrichment in neural or steroid-related pathways, whereas those in the vagina were enriched in apoptosis or autophagy-related pathways. Further bioinformatic analysis of these lengthened 3'-UTRs revealed the presence of multiple miRNAs binding sites and cytoplasmic polyadenylation element (CPE) regulatory aspects. In particular, we identified more than 10 CPEs in the validated lengthened 3'-UTRs of the NFIX, PCNX4, CEP162 and ABHD2 genes using RT-qPCR. These findings demonstrated the involvement of APA and lincRNAs in the regulation of estrus expression in female pigs, providing new insights into the molecular mechanisms underlying estrus expression in pigs.
Collapse
Affiliation(s)
- Mingzheng Liu
- College of Animal Science and Technology, Nanjing Agricultural University, Nanjing 210095, China
| | - Jiahao Chen
- College of Animal Science and Technology, Nanjing Agricultural University, Nanjing 210095, China
| | - Chunlei Zhang
- College of Animal Science and Technology, Nanjing Agricultural University, Nanjing 210095, China
| | - Shuhan Liu
- College of Animal Science and Technology, Nanjing Agricultural University, Nanjing 210095, China
| | - Xiaohuan Chao
- College of Animal Science and Technology, Nanjing Agricultural University, Nanjing 210095, China
| | - Huan Yang
- College of Animal Science and Technology, Nanjing Agricultural University, Nanjing 210095, China
| | - Asim Muhammad
- College of Animal Science and Technology, Nanjing Agricultural University, Nanjing 210095, China
| | - Bo Zhou
- College of Animal Science and Technology, Nanjing Agricultural University, Nanjing 210095, China
| | - Weiping Ao
- College of Animal Science and Technology, Tarim University, Alar 843300, China
| | - Allan P Schinckel
- Department of Animal Sciences, Purdue University, West Lafayette, IN 47907-2054, USA
| |
Collapse
|
2
|
Jonnakuti VS, Wagner EJ, Maletić-Savatić M, Liu Z, Yalamanchili HK. PolyAMiner-Bulk is a deep learning-based algorithm that decodes alternative polyadenylation dynamics from bulk RNA-seq data. CELL REPORTS METHODS 2024; 4:100707. [PMID: 38325383 PMCID: PMC10921021 DOI: 10.1016/j.crmeth.2024.100707] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/28/2023] [Revised: 04/13/2023] [Accepted: 01/11/2024] [Indexed: 02/09/2024]
Abstract
Alternative polyadenylation (APA) is a key post-transcriptional regulatory mechanism; yet, its regulation and impact on human diseases remain understudied. Existing bulk RNA sequencing (RNA-seq)-based APA methods predominantly rely on predefined annotations, severely impacting their ability to decode novel tissue- and disease-specific APA changes. Furthermore, they only account for the most proximal and distal cleavage and polyadenylation sites (C/PASs). Deconvoluting overlapping C/PASs and the inherent noisy 3' UTR coverage in bulk RNA-seq data pose additional challenges. To overcome these limitations, we introduce PolyAMiner-Bulk, an attention-based deep learning algorithm that accurately recapitulates C/PAS sequence grammar, resolves overlapping C/PASs, captures non-proximal-to-distal APA changes, and generates visualizations to illustrate APA dynamics. Evaluation on multiple datasets strongly evinces the performance merit of PolyAMiner-Bulk, accurately identifying more APA changes compared with other methods. With the growing importance of APA and the abundance of bulk RNA-seq data, PolyAMiner-Bulk establishes a robust paradigm of APA analysis.
Collapse
Affiliation(s)
- Venkata Soumith Jonnakuti
- Department of Pediatrics, Baylor College of Medicine, Houston, TX 77030, USA; Jan and Dan Duncan Neurological Research Institute, Texas Children's Hospital, Houston, TX 77030, USA; Program in Quantitative and Computational Biology, Baylor College of Medicine, Houston, TX 77030, USA; Medical Scientist Training Program, Baylor College of Medicine, Houston, TX 77030, USA
| | - Eric J Wagner
- Department of Biochemistry and Biophysics, University of Rochester School of Medicine and Dentistry, Rochester, NY 14642, USA
| | - Mirjana Maletić-Savatić
- Department of Pediatrics, Baylor College of Medicine, Houston, TX 77030, USA; Jan and Dan Duncan Neurological Research Institute, Texas Children's Hospital, Houston, TX 77030, USA
| | - Zhandong Liu
- Department of Pediatrics, Baylor College of Medicine, Houston, TX 77030, USA; Jan and Dan Duncan Neurological Research Institute, Texas Children's Hospital, Houston, TX 77030, USA; Program in Quantitative and Computational Biology, Baylor College of Medicine, Houston, TX 77030, USA
| | - Hari Krishna Yalamanchili
- Department of Pediatrics, Baylor College of Medicine, Houston, TX 77030, USA; Jan and Dan Duncan Neurological Research Institute, Texas Children's Hospital, Houston, TX 77030, USA; USDA/ARS Children's Nutrition Research Center, Department of Pediatrics, Baylor College of Medicine, Houston, TX 77030, USA.
| |
Collapse
|
3
|
Bryce-Smith S, Burri D, Gazzara MR, Herrmann CJ, Danecka W, Fitzsimmons CM, Wan YK, Zhuang F, Fansler MM, Fernández JM, Ferret M, Gonzalez-Uriarte A, Haynes S, Herdman C, Kanitz A, Katsantoni M, Marini F, McDonnel E, Nicolet B, Poon CL, Rot G, Schärfen L, Wu PJ, Yoon Y, Barash Y, Zavolan M. Extensible benchmarking of methods that identify and quantify polyadenylation sites from RNA-seq data. RNA (NEW YORK, N.Y.) 2023; 29:1839-1855. [PMID: 37816550 PMCID: PMC10653393 DOI: 10.1261/rna.079849.123] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/21/2023] [Accepted: 09/21/2023] [Indexed: 10/12/2023]
Abstract
The tremendous rate with which data is generated and analysis methods emerge makes it increasingly difficult to keep track of their domain of applicability, assumptions, limitations, and consequently, of the efficacy and precision with which they solve specific tasks. Therefore, there is an increasing need for benchmarks, and for the provision of infrastructure for continuous method evaluation. APAeval is an international community effort, organized by the RNA Society in 2021, to benchmark tools for the identification and quantification of the usage of alternative polyadenylation (APA) sites from short-read, bulk RNA-sequencing (RNA-seq) data. Here, we reviewed 17 tools and benchmarked eight on their ability to perform APA identification and quantification, using a comprehensive set of RNA-seq experiments comprising real, synthetic, and matched 3'-end sequencing data. To support continuous benchmarking, we have incorporated the results into the OpenEBench online platform, which allows for continuous extension of the set of methods, metrics, and challenges. We envisage that our analyses will assist researchers in selecting the appropriate tools for their studies, while the containers and reproducible workflows could easily be deployed and extended to evaluate new methods or data sets.
Collapse
Affiliation(s)
- Sam Bryce-Smith
- Department of Neuromuscular Diseases, UCL Queen Square Motor Neuron Disease Centre, UCL Queen Square Institute of Neurology, UCL, London WC1N 3BG, United Kingdom
| | - Dominik Burri
- Biozentrum, University of Basel, 4056 Basel, Switzerland
- Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - Matthew R Gazzara
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA
| | - Christina J Herrmann
- Biozentrum, University of Basel, 4056 Basel, Switzerland
- Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - Weronika Danecka
- Institute for Cell Biology, School of Biological Sciences, The University of Edinburgh, Edinburgh EH9 3FF, United Kingdom
| | - Christina M Fitzsimmons
- Laboratory of Cell Biology, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, Maryland 20892, USA
| | - Yuk Kei Wan
- Genome Institute of Singapore, Buona Vista, Singapore 138672
- Yong Loo Lin School of Medicine, National University of Singapore, Kent Ridge, Singapore 119228
| | - Farica Zhuang
- Department of Computer and Information Science, School of Engineering, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA
| | - Mervin M Fansler
- Tri-Institutional Program in Computational Biology and Medicine, Weill Cornell Graduate Studies, New York, New York 10065, USA
- Cancer Biology and Genetics, Sloan-Kettering Institute, MSKCC, New York, New York 10065, USA
| | - José M Fernández
- Life Sciences Department, Barcelona Supercomputing Center, 08034 Barcelona, Spain
- Spanish National Bioinformatics Institute (INB/ELIXIR-ES), 28029 Madrid, Spain
| | - Meritxell Ferret
- Life Sciences Department, Barcelona Supercomputing Center, 08034 Barcelona, Spain
- Spanish National Bioinformatics Institute (INB/ELIXIR-ES), 28029 Madrid, Spain
| | - Asier Gonzalez-Uriarte
- Life Sciences Department, Barcelona Supercomputing Center, 08034 Barcelona, Spain
- Spanish National Bioinformatics Institute (INB/ELIXIR-ES), 28029 Madrid, Spain
| | - Samuel Haynes
- Institute for Cell Biology, School of Biological Sciences, The University of Edinburgh, Edinburgh EH9 3FF, United Kingdom
| | - Chelsea Herdman
- Department of Neurobiology, University of Utah, Salt Lake City, Utah 84132, USA
| | - Alexander Kanitz
- Biozentrum, University of Basel, 4056 Basel, Switzerland
- Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - Maria Katsantoni
- Biozentrum, University of Basel, 4056 Basel, Switzerland
- Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - Federico Marini
- Institute of Medical Biostatistics, Epidemiology and Informatics (IMBEI), University Medical Center of the Johannes Gutenberg-University Mainz, 55118 Mainz, Germany
| | - Euan McDonnel
- Leeds Institute for Data Analytics, School of Molecular and Cellular Biology, University of Leeds, Leeds LS2 9NL, United Kingdom
| | - Ben Nicolet
- Department of Hematopoiesis, Sanquin Research, Landsteiner Laboratory, Amsterdam UMC, University of Amsterdam, 1066 CX Amsterdam, The Netherlands
- Oncode Institute, 3521 AL Utrecht, The Netherlands
| | - Chi-Lam Poon
- Graduate School of Medical Sciences, Weill Cornell Medicine, New York, New York 10065, USA
| | - Gregor Rot
- Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
- Institute of Molecular Life Sciences, University of Zurich, 8057 Zurich, Switzerland
| | - Leonard Schärfen
- Department of Molecular Biophysics & Biochemistry, Yale University, New Haven, Connecticut 06520, USA
| | - Pin-Jou Wu
- Center for Plant Molecular Biology (ZMBP), University of Tübingen, 72076 Tübingen, Germany
| | - Yoseop Yoon
- Department of Microbiology and Molecular Genetics, School of Medicine, University of California Irvine, Irvine, California 92617, USA
| | - Yoseph Barash
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA
- Department of Computer and Information Science, School of Engineering, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA
| | - Mihaela Zavolan
- Biozentrum, University of Basel, 4056 Basel, Switzerland
- Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| |
Collapse
|
4
|
Carrion SA, Michal JJ, Jiang Z. Alternative Transcripts Diversify Genome Function for Phenome Relevance to Health and Diseases. Genes (Basel) 2023; 14:2051. [PMID: 38002994 PMCID: PMC10671453 DOI: 10.3390/genes14112051] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2023] [Revised: 11/06/2023] [Accepted: 11/07/2023] [Indexed: 11/26/2023] Open
Abstract
Manipulation using alternative exon splicing (AES), alternative transcription start (ATS), and alternative polyadenylation (APA) sites are key to transcript diversity underlying health and disease. All three are pervasive in organisms, present in at least 50% of human protein-coding genes. In fact, ATS and APA site use has the highest impact on protein identity, with their ability to alter which first and last exons are utilized as well as impacting stability and translation efficiency. These RNA variants have been shown to be highly specific, both in tissue type and stage, with demonstrated importance to cell proliferation, differentiation and the transition from fetal to adult cells. While alternative exon splicing has a limited effect on protein identity, its ubiquity highlights the importance of these minor alterations, which can alter other features such as localization. The three processes are also highly interwoven, with overlapping, complementary, and competing factors, RNA polymerase II and its CTD (C-terminal domain) chief among them. Their role in development means dysregulation leads to a wide variety of disorders and cancers, with some forms of disease disproportionately affected by specific mechanisms (AES, ATS, or APA). Challenges associated with the genome-wide profiling of RNA variants and their potential solutions are also discussed in this review.
Collapse
Affiliation(s)
| | | | - Zhihua Jiang
- Department of Animal Sciences and Center for Reproductive Biology, Washington State University, Pullman, WA 99164-7620, USA; (S.A.C.); (J.J.M.)
| |
Collapse
|
5
|
Bryce-Smith S, Burri D, Gazzara MR, Herrmann CJ, Danecka W, Fitzsimmons CM, Wan YK, Zhuang F, Fansler MM, Fernández JM, Ferret M, Gonzalez-Uriarte A, Haynes S, Herdman C, Kanitz A, Katsantoni M, Marini F, McDonnel E, Nicolet B, Poon CL, Rot G, Schärfen L, Wu PJ, Yoon Y, Barash Y, Zavolan M. Extensible benchmarking of methods that identify and quantify polyadenylation sites from RNA-seq data. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.06.23.546284. [PMID: 37425672 PMCID: PMC10327023 DOI: 10.1101/2023.06.23.546284] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/11/2023]
Abstract
The tremendous rate with which data is generated and analysis methods emerge makes it increasingly difficult to keep track of their domain of applicability, assumptions, and limitations and consequently, of the efficacy and precision with which they solve specific tasks. Therefore, there is an increasing need for benchmarks, and for the provision of infrastructure for continuous method evaluation. APAeval is an international community effort, organized by the RNA Society in 2021, to benchmark tools for the identification and quantification of the usage of alternative polyadenylation (APA) sites from short-read, bulk RNA-sequencing (RNA-seq) data. Here, we reviewed 17 tools and benchmarked eight on their ability to perform APA identification and quantification, using a comprehensive set of RNA-seq experiments comprising real, synthetic, and matched 3'-end sequencing data. To support continuous benchmarking, we have incorporated the results into the OpenEBench online platform, which allows for seamless extension of the set of methods, metrics, and challenges. We envisage that our analyses will assist researchers in selecting the appropriate tools for their studies. Furthermore, the containers and reproducible workflows generated in the course of this project can be seamlessly deployed and extended in the future to evaluate new methods or datasets.
Collapse
Affiliation(s)
- Sam Bryce-Smith
- UCL Queen Square Motor Neuron Disease Centre, Department of Neuromuscular Diseases, UCL Queen Square Institute of Neurology, UCL, London, UK
| | - Dominik Burri
- Biozentrum, University of Basel, Basel, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Matthew R. Gazzara
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, USA
| | - Christina J. Herrmann
- Biozentrum, University of Basel, Basel, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Weronika Danecka
- Institute for Cell Biology, School of Biological Sciences, The University of Edinburgh, Edinburgh, United Kingdom
| | - Christina M. Fitzsimmons
- Laboratory of Cell Biology, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, Maryland, USA
| | - Yuk Kei Wan
- Genome Institute of Singapore, Buona Vista, Singapore
- National University of Singapore, Kent Ridge, Singapore
| | - Farica Zhuang
- Department of Computer and Information Science, School of Engineering, University of Pennsylvania, Philadelphia, USA
| | - Mervin M. Fansler
- Tri-Institutional Program in Computational Biology and Medicine, Weill Cornell GraduateStudies, New York, NY, USA
- Cancer Biology and Genetics, Sloan-Kettering Institute, MSKCC, New York, NY, USA
| | - José M. Fernández
- Barcelona Supercomputing Center, Barcelona, Spain
- Spanish National Bioinformatics Institute (INB/ELIXIR-ES)
| | - Meritxell Ferret
- Barcelona Supercomputing Center, Barcelona, Spain
- Spanish National Bioinformatics Institute (INB/ELIXIR-ES)
| | - Asier Gonzalez-Uriarte
- Barcelona Supercomputing Center, Barcelona, Spain
- Spanish National Bioinformatics Institute (INB/ELIXIR-ES)
| | - Samuel Haynes
- Institute for Cell Biology, School of Biological Sciences, The University of Edinburgh, Edinburgh, United Kingdom
| | | | - Alexander Kanitz
- Biozentrum, University of Basel, Basel, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Maria Katsantoni
- Biozentrum, University of Basel, Basel, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Federico Marini
- Institute of Medical Biostatistics, Epidemiology and Informatics (IMBEI) - UniversityMedical Center of the Johannes Gutenberg, University Mainz, Germany
| | - Euan McDonnel
- Leeds Institute for Data Analytics, School of Molecular and Cellular Biology, University of Leeds, United Kingdom
| | - Ben Nicolet
- Department of Hematopoiesis, Sanquin Research, Landsteiner Laboratory, AmsterdamUMC, University of Amsterdam, and Oncode Institute, Amsterdam, The Netherlands
| | | | - Gregor Rot
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
- Institute of Molecular Life Sciences, Zurich, Switzerland
| | - Leonard Schärfen
- Department of Molecular Biophysics & Biochemistry, Yale University, New Haven CT, USA
| | - Pin-Jou Wu
- Center for Plant Molecular Biology (ZMBP), University of Tübingen, Germany
| | - Yoseop Yoon
- Department of Microbiology and Molecular Genetics, School of Medicine, University of California Irvine, Irvine, California, USA
| | - Yoseph Barash
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, USA
- Department of Computer and Information Science, School of Engineering, University of Pennsylvania, Philadelphia, USA
| | - Mihaela Zavolan
- Biozentrum, University of Basel, Basel, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| |
Collapse
|
6
|
Ji G, Tang Q, Zhu S, Zhu J, Ye P, Xia S, Wu X. stAPAminer: Mining Spatial Patterns of Alternative Polyadenylation for Spatially Resolved Transcriptomic Studies. GENOMICS, PROTEOMICS & BIOINFORMATICS 2023; 21:601-618. [PMID: 36669641 PMCID: PMC10787175 DOI: 10.1016/j.gpb.2023.01.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/16/2022] [Revised: 12/07/2022] [Accepted: 01/08/2023] [Indexed: 01/19/2023]
Abstract
Alternative polyadenylation (APA) contributes to transcriptome complexity and gene expression regulation and has been implicated in various cellular processes and diseases. Single-cell RNA sequencing (scRNA-seq) has enabled the profiling of APA at the single-cell level; however, the spatial information of cells is not preserved in scRNA-seq. Alternatively, spatial transcriptomics (ST) technologies provide opportunities to decipher the spatial context of the transcriptomic landscape. Pioneering studies have revealed potential spatially variable genes and/or splice isoforms; however, the pattern of APA usage in spatial contexts remains unappreciated. In this study, we developed a toolkit called stAPAminer for mining spatial patterns of APA from spatially barcoded ST data. APA sites were identified and quantified from the ST data. In particular, an imputation model based on the k-nearest neighbors algorithm was designed to recover APA signals, and then APA genes with spatial patterns of APA usage variation were identified. By analyzing well-established ST data of the mouse olfactory bulb (MOB), we presented a detailed view of spatial APA usage across morphological layers of the MOB. We compiled a comprehensive list of genes with spatial APA dynamics and obtained several major spatial expression patterns that represent spatial APA dynamics in different morphological layers. By extending this analysis to two additional replicates of the MOB ST data, we observed that the spatial APA patterns of several genes were reproducible among replicates. stAPAminer employs the power of ST to explore the transcriptional atlas of spatial APA patterns with spatial resolution. This toolkit is available at https://github.com/BMILAB/stAPAminer and https://ngdc.cncb.ac.cn/biocode/tools/BT007320.
Collapse
Affiliation(s)
- Guoli Ji
- Pasteurien College, Suzhou Medical College of Soochow University, Soochow University, Suzhou 215000, China; Department of Automation, Xiamen University, Xiamen 361005, China
| | - Qi Tang
- Pasteurien College, Suzhou Medical College of Soochow University, Soochow University, Suzhou 215000, China; Department of Automation, Xiamen University, Xiamen 361005, China
| | - Sheng Zhu
- Department of Automation, Xiamen University, Xiamen 361005, China
| | - Junyi Zhu
- Institute of Neuroscience, Soochow University, Suzhou 215000, China
| | - Pengchao Ye
- Department of Automation, Xiamen University, Xiamen 361005, China
| | - Shuting Xia
- Pasteurien College, Suzhou Medical College of Soochow University, Soochow University, Suzhou 215000, China; Institute of Neuroscience, Soochow University, Suzhou 215000, China
| | - Xiaohui Wu
- Pasteurien College, Suzhou Medical College of Soochow University, Soochow University, Suzhou 215000, China.
| |
Collapse
|
7
|
Zou Y, Guo Q, Chang Y, Zhong Y, Cheng L, Wei W. Alternative splicing affects synapses in the hippocampus of offspring after maternal fructose exposure during gestation and lactation. Chem Biol Interact 2023; 379:110518. [PMID: 37121297 DOI: 10.1016/j.cbi.2023.110518] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2023] [Revised: 04/15/2023] [Accepted: 04/27/2023] [Indexed: 05/02/2023]
Abstract
Increased fructose over-intake is a global issue. Maternal fructose exposure during gestation and lactation can impair brain development in offspring. However, the effect on synapses is still unknown. For the diversification of RNA and biological functions, alternative splicing (AS) and alternative polyadenylation (APA) are essential. We constructed a maternal high-fructose diet model by administering 13% and 40% fructose water. The student's t-test analyzed the results of RT-qPCR. All other results were analyzed by one-way analysis of variance. The animal behavior experiment results revealed that conditioning and associative memory had been damaged. The proteins that form synapses were consistently low-expressed. In addition, compared with the control group, the Oxford Nanopore Technologies platform's full-length RNA-sequencing identified 298 different spliced genes (DSGs) and 51 differentially expressed alternative splicing (DEAS) genes in the 13% fructose group. 313 DSGs and 74 DEAS genes were in the 40% fructose group. Enrichment analysis based on these altered genes revealed some enlightening items and pathways. Our findings demonstrated the transcriptome mechanism underlying maternal fructose exposure during gestation and lactation and impaired synapse function during the transcripts' editing.
Collapse
Affiliation(s)
- Yuchen Zou
- Child and Adolescent Health, School of Public Health, China Medical University, No.77 Puhe Road, Shenyang North New Area, Shenyang, 110122, PR China
| | - Qing Guo
- Child and Adolescent Health, School of Public Health, China Medical University, No.77 Puhe Road, Shenyang North New Area, Shenyang, 110122, PR China
| | - Yidan Chang
- Child and Adolescent Health, School of Public Health, China Medical University, No.77 Puhe Road, Shenyang North New Area, Shenyang, 110122, PR China
| | - Yongyong Zhong
- Child and Adolescent Health, School of Public Health, China Medical University, No.77 Puhe Road, Shenyang North New Area, Shenyang, 110122, PR China
| | - Lin Cheng
- Child and Adolescent Health, School of Public Health, China Medical University, No.77 Puhe Road, Shenyang North New Area, Shenyang, 110122, PR China
| | - Wei Wei
- Child and Adolescent Health, School of Public Health, China Medical University, No.77 Puhe Road, Shenyang North New Area, Shenyang, 110122, PR China.
| |
Collapse
|
8
|
Jonnakuti VS, Wagner EJ, Maletić-Savatić M, Liu Z, Yalamanchili HK. PolyAMiner-Bulk: A Machine Learning Based Bioinformatics Algorithm to Infer and Decode Alternative Polyadenylation Dynamics from bulk RNA-seq data. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.01.23.523471. [PMID: 36747700 PMCID: PMC9900750 DOI: 10.1101/2023.01.23.523471] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/26/2023]
Abstract
More than half of human genes exercise alternative polyadenylation (APA) and generate mRNA transcripts with varying 3' untranslated regions (UTR). However, current computational approaches for identifying cleavage and polyadenylation sites (C/PASs) and quantifying 3'UTR length changes from bulk RNA-seq data fail to unravel tissue- and disease-specific APA dynamics. Here, we developed a next-generation bioinformatics algorithm and application, PolyAMiner-Bulk, that utilizes an attention-based machine learning architecture and an improved vector projection-based engine to infer differential APA dynamics accurately. When applied to earlier studies, PolyAMiner-Bulk accurately identified more than twice the number of APA changes in an RBM17 knockdown bulk RNA-seq dataset compared to current generation tools. Moreover, on a separate dataset, PolyAMiner-Bulk revealed novel APA dynamics and pathways in scleroderma pathology and identified differential APA in a gene that was identified as being involved in scleroderma pathogenesis in an independent study. Lastly, we used PolyAMiner-Bulk to analyze the RNA-seq data of post-mortem prefrontal cortexes from the ROSMAP data consortium and unraveled novel APA dynamics in Alzheimer's Disease. Our method, PolyAMiner-Bulk, creates a paradigm for future alternative polyadenylation analysis from bulk RNA-seq data.
Collapse
Affiliation(s)
- Venkata Soumith Jonnakuti
- Department of Pediatrics, Baylor College of Medicine, Houston, TX, 77030, USA
- Jan and Dan Duncan Neurological Research Institute, Texas Children’s Hospital, Houston, TX, 77030, USA
- Program in Quantitative and Computational Biology, Baylor College of Medicine, Houston, TX, 77030, USA
- Medical Scientist Training Program, Baylor College of Medicine, Houston, TX, 77030, USA
| | - Eric J. Wagner
- Department of Biochemistry and Biophysics, University of Rochester School of Medicine and Dentistry, Rochester, NY 14642, USA
| | - Mirjana Maletić-Savatić
- Department of Pediatrics, Baylor College of Medicine, Houston, TX, 77030, USA
- Jan and Dan Duncan Neurological Research Institute, Texas Children’s Hospital, Houston, TX, 77030, USA
| | - Zhandong Liu
- Department of Pediatrics, Baylor College of Medicine, Houston, TX, 77030, USA
- Jan and Dan Duncan Neurological Research Institute, Texas Children’s Hospital, Houston, TX, 77030, USA
- Program in Quantitative and Computational Biology, Baylor College of Medicine, Houston, TX, 77030, USA
| | - Hari Krishna Yalamanchili
- Department of Pediatrics, Baylor College of Medicine, Houston, TX, 77030, USA
- Jan and Dan Duncan Neurological Research Institute, Texas Children’s Hospital, Houston, TX, 77030, USA
| |
Collapse
|
9
|
Meyer E, Chaung K, Dehghannasiri R, Salzman J. ReadZS detects cell type-specific and developmentally regulated RNA processing programs in single-cell RNA-seq. Genome Biol 2022; 23:226. [PMID: 36284317 PMCID: PMC9594907 DOI: 10.1186/s13059-022-02795-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2022] [Accepted: 10/13/2022] [Indexed: 11/13/2022] Open
Abstract
RNA processing, including splicing and alternative polyadenylation, is crucial to gene function and regulation, but methods to detect RNA processing from single-cell RNA sequencing data are limited by reliance on pre-existing annotations, peak calling heuristics, and collapsing measurements by cell type. We introduce ReadZS, an annotation-free statistical approach to identify regulated RNA processing in single cells. ReadZS discovers cell type-specific RNA processing in human lung and conserved, developmentally regulated RNA processing in mammalian spermatogenesis-including global 3' UTR shortening in human spermatogenesis. ReadZS also discovers global 3' UTR lengthening in Arabidopsis development, highlighting the usefulness of this method in under-annotated transcriptomes.
Collapse
Affiliation(s)
- Elisabeth Meyer
- Department of Biochemistry, Stanford University, Stanford, CA, 94305, USA
- Department of Biomedical Data Science, Stanford University, Stanford, CA, 94305, USA
| | - Kaitlin Chaung
- Department of Biochemistry, Stanford University, Stanford, CA, 94305, USA
- Department of Biomedical Data Science, Stanford University, Stanford, CA, 94305, USA
| | - Roozbeh Dehghannasiri
- Department of Biochemistry, Stanford University, Stanford, CA, 94305, USA
- Department of Biomedical Data Science, Stanford University, Stanford, CA, 94305, USA
| | - Julia Salzman
- Department of Biochemistry, Stanford University, Stanford, CA, 94305, USA.
- Department of Biomedical Data Science, Stanford University, Stanford, CA, 94305, USA.
- Department of Statistics (by courtesy), Stanford University, Stanford, CA, 94305, USA.
| |
Collapse
|
10
|
Ye W, Lian Q, Ye C, Wu X. A Survey on Methods for Predicting Polyadenylation Sites from DNA Sequences, Bulk RNA-seq, and Single-cell RNA-seq. GENOMICS, PROTEOMICS & BIOINFORMATICS 2022:S1672-0229(22)00121-8. [PMID: 36167284 PMCID: PMC10372920 DOI: 10.1016/j.gpb.2022.09.005] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/19/2022] [Revised: 08/17/2022] [Accepted: 09/19/2022] [Indexed: 05/08/2023]
Abstract
Alternative polyadenylation (APA) plays important roles in modulating mRNA stability, translation, and subcellular localization, and contributes extensively to shaping eukaryotic transcriptome complexity and proteome diversity. Identification of poly(A) sites (pAs) on a genome-wide scale is a critical step toward understanding the underlying mechanism of APA-mediated gene regulation. A number of established computational tools have been proposed to predict pAs from diverse genomic data. Here we provided an exhaustive overview of computational approaches for predicting pAs from DNA sequences, bulk RNA sequencing (RNA-seq) data, and single-cell RNA sequencing (scRNA-seq) data. Particularly, we examined several representative tools using bulk RNA-seq and scRNA-seq data from peripheral blood mononuclear cells and put forward operable suggestions on how to assess the reliability of pAs predicted by different tools. We also proposed practical guidelines on choosing appropriate methods applicable to diverse scenarios. Moreover, we discussed in depth the challenges in improving the performance of pA prediction and benchmarking different methods. Additionally, we highlighted outstanding challenges and opportunities using new machine learning and integrative multi-omics techniques, and provided our perspective on how computational methodologies might evolve in the future for non-3' untranslated region, tissue-specific, cross-species, and single-cell pA prediction.
Collapse
Affiliation(s)
- Wenbin Ye
- Pasteurien College, Suzhou Medical College of Soochow University, Soochow University, Suzhou 215000, China
| | - Qiwei Lian
- Pasteurien College, Suzhou Medical College of Soochow University, Soochow University, Suzhou 215000, China; Department of Automation, Xiamen University, Xiamen 361005, China
| | - Congting Ye
- Key Laboratory of the Coastal and Wetland Ecosystems, Ministry of Education, College of the Environment and Ecology, Xiamen University, Xiamen 361005, China
| | - Xiaohui Wu
- Pasteurien College, Suzhou Medical College of Soochow University, Soochow University, Suzhou 215000, China.
| |
Collapse
|
11
|
scAPAmod: Profiling Alternative Polyadenylation Modalities in Single Cells from Single-Cell RNA-Seq Data. Int J Mol Sci 2022; 23:ijms23158123. [PMID: 35897701 PMCID: PMC9329739 DOI: 10.3390/ijms23158123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2022] [Revised: 07/01/2022] [Accepted: 07/21/2022] [Indexed: 11/17/2022] Open
Abstract
Alternative polyadenylation (APA) is a key layer of gene expression regulation, and APA choice is finely modulated in cells. Advances in single-cell RNA-seq (scRNA-seq) have provided unprecedented opportunities to study APA in cell populations. However, existing studies that investigated APA in single cells were either confined to a few cells or focused on profiling APA dynamics between cell types or identifying APA sites. The diversity and pattern of APA usages on a genomic scale in single cells remains unappreciated. Here, we proposed an analysis framework based on a Gaussian mixture model, scAPAmod, to identify patterns of APA usage from homogeneous or heterogeneous cell populations at the single-cell level. We systematically evaluated the performance of scAPAmod using simulated data and scRNA-seq data. The results show that scAPAmod can accurately identify different patterns of APA usages at the single-cell level. We analyzed the dynamic changes in the pattern of APA usage using scAPAmod in different cell differentiation and developmental stages during mouse spermatogenesis and found that even the same gene has different patterns of APA usages in different differentiation stages. The preference of patterns of usages of APA sites in different genomic regions was also analyzed. We found that patterns of APA usages of the same gene in 3′ UTRs (3′ untranslated region) and non-3′ UTRs are different. Moreover, we analyzed cell-type-specific APA usage patterns and changes in patterns of APA usages across cell types. Different from the conventional analysis of single-cell heterogeneity based on gene expression profiling, this study profiled the heterogeneous pattern of APA isoforms, which contributes to revealing the heterogeneity of single-cell gene expression with higher resolution.
Collapse
|
12
|
Liu M, Hao L, Yang S, Wu X. PolyAtailor: measuring poly(A) tail length from short-read and long-read sequencing data. Brief Bioinform 2022; 23:6620877. [PMID: 35769001 DOI: 10.1093/bib/bbac271] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2022] [Revised: 05/23/2022] [Accepted: 06/09/2022] [Indexed: 12/18/2022] Open
Abstract
The poly(A) tail is a dynamic addition to the eukaryotic mRNA and the change in its length plays an essential role in regulating gene expression through affecting nuclear export, mRNA stability and translation. Only recently high-throughput sequencing strategies began to emerge for transcriptome-wide profiling of poly(A) tail length in diverse developmental stages and organisms. However, there is currently no easy-to-use and universal tool for measuring poly(A) tails in sequencing data from different sequencing protocols. Here we established PolyAtailor, a unified and efficient framework, for identifying and analyzing poly(A) tails from PacBio-based long reads or next generation short reads. PolyAtailor provides two core functions for measuring poly(A) tails, namely Tail_map and Tail_scan, which can be used for profiling tails with or without using a reference genome. Particularly, PolyAtailor can identify all potential tails in a read, providing users with detailed information such as tail position, tail length, tail sequence and tail type. Moreover, PolyAtailor integrates rich functions for poly(A) tail and poly(A) site analyses, such as differential poly(A) length analysis, poly(A) site identification and annotation, and statistics and visualization of base composition in tails. We compared PolyAtailor with three latest methods, FLAMAnalysis, FLEPSeq and PAIsoSeqAnalysis, using data from three sequencing protocols in HeLa samples and Arabidopsis. Results show that PolyAtailor is effective in measuring poly(A) tail length and detecting significance of differential poly(A) length, which achieves much higher sensitivity and accuracy than competing methods. PolyAtailor is available at https://github.com/BMILAB/PolyAtailor.
Collapse
Affiliation(s)
- Mengfei Liu
- Pasteurien College, Soochow University, Suzhou, Jiangsu 215000, China.,Department of Automation, Xiamen University, Xiamen, Fujian 361005, China
| | - Linlin Hao
- Pasteurien College, Soochow University, Suzhou, Jiangsu 215000, China
| | - Sien Yang
- Pasteurien College, Soochow University, Suzhou, Jiangsu 215000, China
| | - Xiaohui Wu
- Pasteurien College, Soochow University, Suzhou, Jiangsu 215000, China
| |
Collapse
|
13
|
Leveraging omic features with F3UTER enables identification of unannotated 3'UTRs for synaptic genes. Nat Commun 2022; 13:2270. [PMID: 35477703 PMCID: PMC9046390 DOI: 10.1038/s41467-022-30017-z] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2021] [Accepted: 03/18/2022] [Indexed: 11/08/2022] Open
Abstract
There is growing evidence for the importance of 3' untranslated region (3'UTR) dependent regulatory processes. However, our current human 3'UTR catalogue is incomplete. Here, we develop a machine learning-based framework, leveraging both genomic and tissue-specific transcriptomic features to predict previously unannotated 3'UTRs. We identify unannotated 3'UTRs associated with 1,563 genes across 39 human tissues, with the greatest abundance found in the brain. These unannotated 3'UTRs are significantly enriched for RNA binding protein (RBP) motifs and exhibit high human lineage-specificity. We find that brain-specific unannotated 3'UTRs are enriched for the binding motifs of important neuronal RBPs such as TARDBP and RBFOX1, and their associated genes are involved in synaptic function. Our data is shared through an online resource F3UTER ( https://astx.shinyapps.io/F3UTER/ ). Overall, our data improves 3'UTR annotation and provides additional insights into the mRNA-RBP interactome in the human brain, with implications for our understanding of neurological and neurodevelopmental diseases.
Collapse
|
14
|
Zhu S, Lian Q, Ye W, Qin W, Wu Z, Ji G, Wu X. scAPAdb: a comprehensive database of alternative polyadenylation at single-cell resolution. Nucleic Acids Res 2021; 50:D365-D370. [PMID: 34508354 PMCID: PMC8728153 DOI: 10.1093/nar/gkab795] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2021] [Revised: 08/26/2021] [Accepted: 09/02/2021] [Indexed: 01/08/2023] Open
Abstract
Alternative polyadenylation (APA) is a widespread regulatory mechanism of transcript diversification in eukaryotes, which is increasingly recognized as an important layer for eukaryotic gene expression. Recent studies based on single-cell RNA-seq (scRNA-seq) have revealed cell-to-cell heterogeneity in APA usage and APA dynamics across different cell types in various tissues, biological processes and diseases. However, currently available APA databases were all collected from bulk 3′-seq and/or RNA-seq data, and no existing database has provided APA information at single-cell resolution. Here, we present a user-friendly database called scAPAdb (http://www.bmibig.cn/scAPAdb), which provides a comprehensive and manually curated atlas of poly(A) sites, APA events and poly(A) signals at the single-cell level. Currently, scAPAdb collects APA information from > 360 scRNA-seq experiments, covering six species including human, mouse and several other plant species. scAPAdb also provides batch download of data, and users can query the database through a variety of keywords such as gene identifier, gene function and accession number. scAPAdb would be a valuable and extendable resource for the study of cell-to-cell heterogeneity in APA isoform usages and APA-mediated gene regulation at the single-cell level under diverse cell types, tissues and species.
Collapse
Affiliation(s)
- Sheng Zhu
- Pasteurien College, Soochow University, Suzhou, Jiangsu 215000, China.,Department of Automation, Xiamen University, Xiamen, Fujian 361005, China
| | - Qiwei Lian
- Department of Automation, Xiamen University, Xiamen, Fujian 361005, China
| | - Wenbin Ye
- Department of Automation, Xiamen University, Xiamen, Fujian 361005, China
| | - Wei Qin
- Department of Automation, Xiamen University, Xiamen, Fujian 361005, China
| | - Zhe Wu
- Department of Automation, Xiamen University, Xiamen, Fujian 361005, China
| | - Guoli Ji
- Department of Automation, Xiamen University, Xiamen, Fujian 361005, China
| | - Xiaohui Wu
- Pasteurien College, Soochow University, Suzhou, Jiangsu 215000, China
| |
Collapse
|
15
|
Kim S, Bai Y, Fan Z, Diergaarde B, Tseng GC, Park HJ. The microRNA target site landscape is a novel molecular feature associating alternative polyadenylation with immune evasion activity in breast cancer. Brief Bioinform 2021; 22:bbaa191. [PMID: 32844230 PMCID: PMC8138879 DOI: 10.1093/bib/bbaa191] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2020] [Revised: 07/10/2020] [Accepted: 07/28/2020] [Indexed: 12/14/2022] Open
Abstract
Alternative polyadenylation (APA) in breast tumor samples results in the removal/addition of cis-regulatory elements such as microRNA (miRNA) target sites in the 3'-untranslated region (3'-UTRs) of genes. Although previous computational APA studies focused on a subset of genes strongly affected by APA (APA genes), we identify miRNAs of which widespread APA events collectively increase or decrease the number of target sites [probabilistic inference of microRNA target site modification through APA (PRIMATA-APA)]. Using PRIMATA-APA on the cancer genome atlas (TCGA) breast cancer data, we found that the global APA events change the number of the target sites of particular microRNAs [target sites modified miRNA (tamoMiRNA)] enriched for cancer development and treatments. We also found that when knockdown (KD) of NUDT21 in HeLa cells induces a different set of widespread 3'-UTR shortening than TCGA breast cancer data, it changes the target sites of the common tamoMiRNAs. Since the NUDT21 KD experiment previously demonstrated the tumorigenic role of APA events in a miRNA dependent fashion, this result suggests that the APA-initiated tumorigenesis is attributable to the miRNA target site changes, not the APA events themselves. Further, we found that the miRNA target site changes identify tumor cell proliferation and immune cell infiltration to the tumor microenvironment better than the miRNA expression levels or the APA events themselves. Altogether, our computational analyses provide a proof-of-concept demonstration that the miRNA target site information indicates the effect of global APA events with a potential as predictive biomarker.
Collapse
Affiliation(s)
- Soyeon Kim
- Department of Pediatrics, University of Pittsburgh Medical Center and in Division of Pulmonary Medicine, Children’s Hospital of Pittsburgh of UPMC
| | - YuLong Bai
- Department of Human Genetics in the Graduate School of Public Health, University of Pittsburgh
| | - Zhenjiang Fan
- Department of Computer Science, University of Pittsburgh
| | - Brenda Diergaarde
- Department of Human Genetics, Graduate School of Public Health, University of Pittsburgh
| | - George C Tseng
- Department of Biostatistics, Graduate School of Public Health, University of Pittsburgh
| | - Hyun Jung Park
- Department of Human Genetics, Graduate School of Public Health, University of Pittsburgh
| |
Collapse
|
16
|
Computational analysis of alternative polyadenylation from standard RNA-seq and single-cell RNA-seq data. Methods Enzymol 2021; 655:225-243. [PMID: 34183123 DOI: 10.1016/bs.mie.2021.03.015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
Abstract
Alternative polyadenylation (APA) is a major mechanism of post-transcriptional regulation in various cellular processes including cell proliferation and differentiation. Since conventional APA profiling methods have not been widely adopted, global APA studies are very limited. In this chapter, we summarize current computational methods for analyzing APA in standard RNA-seq and scRNA-seq data and describe two state-of-the-art bioinformatic algorithms DaPars and scDaPars in detail. The bioinformatic pipelines for both DaPars and scDaPars are presented and the application of both algorithms are highlighted.
Collapse
|
17
|
Aptardi predicts polyadenylation sites in sample-specific transcriptomes using high-throughput RNA sequencing and DNA sequence. Nat Commun 2021; 12:1652. [PMID: 33712618 PMCID: PMC7955126 DOI: 10.1038/s41467-021-21894-x] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2020] [Accepted: 02/18/2021] [Indexed: 02/01/2023] Open
Abstract
Annotation of polyadenylation sites from short-read RNA sequencing alone is a challenging computational task. Other algorithms rooted in DNA sequence predict potential polyadenylation sites; however, in vivo expression of a particular site varies based on a myriad of conditions. Here, we introduce aptardi (alternative polyadenylation transcriptome analysis from RNA-Seq data and DNA sequence information), which leverages both DNA sequence and RNA sequencing in a machine learning paradigm to predict expressed polyadenylation sites. Specifically, as input aptardi takes DNA nucleotide sequence, genome-aligned RNA-Seq data, and an initial transcriptome. The program evaluates these initial transcripts to identify expressed polyadenylation sites in the biological sample and refines transcript 3'-ends accordingly. The average precision of the aptardi model is twice that of a standard transcriptome assembler. In particular, the recall of the aptardi model (the proportion of true polyadenylation sites detected by the algorithm) is improved by over three-fold. Also, the model-trained using the Human Brain Reference RNA commercial standard-performs well when applied to RNA-sequencing samples from different tissues and different mammalian species. Finally, aptardi's input is simple to compile and its output is easily amenable to downstream analyses such as quantitation and differential expression.
Collapse
|
18
|
Jin W, Zhu Q, Yang Y, Yang W, Wang D, Yang J, Niu X, Yu D, Gong J. Animal-APAdb: a comprehensive animal alternative polyadenylation database. Nucleic Acids Res 2021; 49:D47-D54. [PMID: 32986825 PMCID: PMC7779049 DOI: 10.1093/nar/gkaa778] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2020] [Revised: 08/27/2020] [Accepted: 09/08/2020] [Indexed: 12/31/2022] Open
Abstract
Alternative polyadenylation (APA) is an important post-transcriptional regulatory mechanism that recognizes different polyadenylation signals on transcripts, resulting in transcripts with different lengths of 3′ untranslated regions and thereby influencing a series of biological processes. Recent studies have highlighted the important roles of APA in human. However, APA profiles in other animals have not been fully recognized, and there is no database that provides comprehensive APA information for other animals except human. Here, by using the RNA sequencing data collected from public databases, we systematically characterized the APA profiles in 9244 samples of 18 species. In total, we identified 342 952 APA events with a median of 17 020 per species using the DaPars2 algorithm, and 315 691 APA events with a median of 17 953 per species using the QAPA algorithm in these 18 species, respectively. In addition, we predicted the polyadenylation sites (PAS) and motifs near PAS of these species. We further developed Animal-APAdb, a user-friendly database (http://gong_lab.hzau.edu.cn/Animal-APAdb/) for data searching, browsing and downloading. With comprehensive information of APA events in different tissues of different species, Animal-APAdb may greatly facilitate the exploration of animal APA patterns and novel mechanisms, gene expression regulation and APA evolution across tissues and species.
Collapse
Affiliation(s)
- Weiwei Jin
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan 430070, P.R. China
| | - Qizhao Zhu
- College of Animal Science and Technology, Nanjing Agricultural University, Nanjing 210095, P.R. China
| | - Yanbo Yang
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan 430070, P.R. China
| | - Wenqian Yang
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan 430070, P.R. China
| | - Dongyang Wang
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan 430070, P.R. China
| | - Jiajun Yang
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan 430070, P.R. China
| | - Xiaohui Niu
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan 430070, P.R. China
| | - Debing Yu
- College of Animal Science and Technology, Nanjing Agricultural University, Nanjing 210095, P.R. China
| | - Jing Gong
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan 430070, P.R. China.,College of Biomedicine and Health, Huazhong Agricultural University, Wuhan 430070, P.R. China
| |
Collapse
|
19
|
Processing of coding and non-coding RNAs in plant development and environmental responses. Essays Biochem 2020; 64:931-945. [DOI: 10.1042/ebc20200029] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2020] [Revised: 10/21/2020] [Accepted: 10/23/2020] [Indexed: 12/14/2022]
Abstract
Abstract
Precursor RNAs undergo extensive processing to become mature RNAs. RNA transcripts are subjected to 5′ capping, 3′-end processing, splicing, and modification; they also form dynamic secondary structures during co-transcriptional and post-transcriptional processing. Like coding RNAs, non-coding RNAs (ncRNAs) undergo extensive processing. For example, secondary small interfering RNA (siRNA) transcripts undergo RNA processing, followed by further cleavage to become mature siRNAs. Transcriptome studies have revealed roles for co-transcriptional and post-transcriptional RNA processing in the regulation of gene expression and the coordination of plant development and plant–environment interactions. In this review, we present the latest progress on RNA processing in gene expression and discuss phased siRNAs (phasiRNAs), a kind of germ cell-specific secondary small RNA (sRNA), focusing on their functions in plant development and environmental responses.
Collapse
|
20
|
Ye W, Liu T, Fu H, Ye C, Ji G, Wu X. movAPA: modeling and visualization of dynamics of alternative polyadenylation across biological samples. Bioinformatics 2020; 37:2470-2472. [PMID: 33258917 DOI: 10.1093/bioinformatics/btaa997] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2020] [Revised: 10/10/2020] [Accepted: 11/17/2020] [Indexed: 02/07/2023] Open
Abstract
MOTIVATION Alternative polyadenylation (APA) has been widely recognized as a widespread mechanism modulated dynamically. Studies based on 3' end sequencing and/or RNA-seq have profiled poly(A) sites in various species with diverse pipelines, yet no unified and easy-to-use toolkit is available for comprehensive APA analyses. RESULTS We developed an R package called movAPA for modeling and visualization of dynamics of alternative polyadenylation across biological samples. movAPA incorporates rich functions for preprocessing, annotation and statistical analyses of poly(A) sites, identification of poly(A) signals, profiling of APA dynamics and visualization. Particularly, seven metrics are provided for measuring the tissue-specificity or usages of APA sites across samples. Three methods are used for identifying 3' UTR shortening/lengthening events between conditions. APA site switching involving non-3' UTR polyadenylation can also be explored. Using poly(A) site data from rice and mouse sperm cells, we demonstrated the high scalability and flexibility of movAPA in profiling APA dynamics across tissues and single cells. AVAILABILITY AND IMPLEMENTATION https://github.com/BMILAB/movAPA. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Wenbin Ye
- Department of Automation, Xiamen University, Xiamen 361005, China
| | - Tao Liu
- Department of Automation, Xiamen University, Xiamen 361005, China
| | - Hongjuan Fu
- Department of Automation, Xiamen University, Xiamen 361005, China
| | - Congting Ye
- Key Laboratory of the Ministry of Education for Coastal and Wetland Ecosystems, College of the Environment and Ecology, Xiamen University, Xiamen 361102, China
| | - Guoli Ji
- Department of Automation, Xiamen University, Xiamen 361005, China
| | - Xiaohui Wu
- Department of Automation, Xiamen University, Xiamen 361005, China
| |
Collapse
|
21
|
Wu X, Liu T, Ye C, Ye W, Ji G. scAPAtrap: identification and quantification of alternative polyadenylation sites from single-cell RNA-seq data. Brief Bioinform 2020; 22:5952304. [PMID: 33142319 DOI: 10.1093/bib/bbaa273] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2020] [Revised: 09/17/2020] [Accepted: 09/20/2020] [Indexed: 02/06/2023] Open
Abstract
Alternative polyadenylation (APA) generates diverse mRNA isoforms, which contributes to transcriptome diversity and gene expression regulation by affecting mRNA stability, translation and localization in cells. The rapid development of 3' tag-based single-cell RNA-sequencing (scRNA-seq) technologies, such as CEL-seq and 10x Genomics, has led to the emergence of computational methods for identifying APA sites and profiling APA dynamics at single-cell resolution. However, existing methods fail to detect the precise location of poly(A) sites or sites with low read coverage. Moreover, they rely on priori genome annotation and can only detect poly(A) sites located within or near annotated genes. Here we proposed a tool called scAPAtrap for detecting poly(A) sites at the whole genome level in individual cells from 3' tag-based scRNA-seq data. scAPAtrap incorporates peak identification and poly(A) read anchoring, enabling the identification of the precise location of poly(A) sites, even for sites with low read coverage. Moreover, scAPAtrap can identify poly(A) sites without using priori genome annotation, which helps locate novel poly(A) sites in previously overlooked regions and improve genome annotation. We compared scAPAtrap with two latest methods, scAPA and Sierra, using scRNA-seq data from different experimental technologies and species. Results show that scAPAtrap identified poly(A) sites with higher accuracy and sensitivity than competing methods and could be used to explore APA dynamics among cell types or the heterogeneous APA isoform expression in individual cells. scAPAtrap is available at https://github.com/BMILAB/scAPAtrap.
Collapse
Affiliation(s)
- Xiaohui Wu
- Department of Automation in Xiamen University
| | - Tao Liu
- Department of Automation in Xiamen University
| | - Congting Ye
- College of the Environment and Ecology in Xiamen University
| | - Wenbin Ye
- Department of Automation in Xiamen University
| | - Guoli Ji
- Department of Automation in Xiamen University
| |
Collapse
|
22
|
Tu M, Li Y. Profiling Alternative 3' Untranslated Regions in Sorghum using RNA-seq Data. Front Genet 2020; 11:556749. [PMID: 33193635 PMCID: PMC7649775 DOI: 10.3389/fgene.2020.556749] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2020] [Accepted: 09/30/2020] [Indexed: 12/18/2022] Open
Abstract
Sorghum is an important crop widely used for food, feed, and fuel. Transcriptome-wide studies of 3′ untranslated regions (3′UTR) using regular RNA-seq remain scarce in sorghum, while transcriptomes have been characterized extensively using Illumina short-read sequencing platforms for many sorghum varieties under various conditions or developmental contexts. 3′UTR is a critical regulatory component of genes, controlling the translation, transport, and stability of messenger RNAs. In the present study, we profiled the alternative 3′UTRs at the transcriptome level in three genetically related but phenotypically contrasting lines of sorghum: Rio, BTx406, and R9188. A total of 1,197 transcripts with alternative 3′UTRs were detected using RNA-seq data. Their categorization identified 612 high-confidence alternative 3′UTRs. Importantly, the high-confidence alternative 3′UTR genes significantly overlapped with the genesets that are associated with RNA N6-methyladenosine (m6A) modification, suggesting a clear indication between alternative 3′UTR and m6A methylation in sorghum. Moreover, taking advantage of sorghum genetics, we provided evidence of genotype specificity of alternative 3′UTR usage. In summary, our work exemplifies a transcriptome-wide profiling of alternative 3′UTRs using regular RNA-seq data in non-model crops and gains insights into alternative 3′UTRs and their genotype specificity.
Collapse
Affiliation(s)
- Min Tu
- Waksman Institute of Microbiology, Rutgers, The State University of New Jersey, Piscataway, NJ, United States
| | - Yin Li
- Waksman Institute of Microbiology, Rutgers, The State University of New Jersey, Piscataway, NJ, United States
| |
Collapse
|
23
|
Ye C, Lin J, Li QQ. Discovery of alternative polyadenylation dynamics from single cell types. Comput Struct Biotechnol J 2020; 18:1012-1019. [PMID: 32382395 PMCID: PMC7200215 DOI: 10.1016/j.csbj.2020.04.009] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2020] [Revised: 04/12/2020] [Accepted: 04/14/2020] [Indexed: 12/13/2022] Open
Abstract
Alternative polyadenylation (APA) occurs in the process of mRNA maturation by adding a poly(A) tail at different locations, resulting increased diversity of mRNA isoforms and contributing to the complexity of gene regulatory network. Benefit from the development of high-throughput sequencing technologies, we could now delineate APA profiles of transcriptomes at an unprecedented pace. Especially the single cell RNA sequencing (scRNA-seq) technologies provide us opportunities to interrogate biological details of diverse and rare cell types. Despite increasing evidence showing that APA is involved in the cell type-specific regulation and function, efficient and specific laboratory methods for capturing poly(A) sites at single cell resolution are underdeveloped to date. In this review, we summarize existing experimental and computational methods for the identification of APA dynamics from diverse single cell types. A future perspective is also provided.
Collapse
Affiliation(s)
- Congting Ye
- Key Laboratory of the Ministry of Education for Coastal and Wetland Ecosystems, College of the Environment and Ecology, Xiamen University, Xiamen, Fujian 361102, China
| | - Juncheng Lin
- Key Laboratory of the Ministry of Education for Coastal and Wetland Ecosystems, College of the Environment and Ecology, Xiamen University, Xiamen, Fujian 361102, China
| | - Qingshun Q. Li
- Key Laboratory of the Ministry of Education for Coastal and Wetland Ecosystems, College of the Environment and Ecology, Xiamen University, Xiamen, Fujian 361102, China
- Graduate College of Biomedical Sciences, Western University of Health Sciences, Pomona, CA 91766, USA
| |
Collapse
|