1
|
Jonnakuti VS, Wagner EJ, Maletić-Savatić M, Liu Z, Yalamanchili HK. PolyAMiner-Bulk is a deep learning-based algorithm that decodes alternative polyadenylation dynamics from bulk RNA-seq data. CELL REPORTS METHODS 2024; 4:100707. [PMID: 38325383 PMCID: PMC10921021 DOI: 10.1016/j.crmeth.2024.100707] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/28/2023] [Revised: 04/13/2023] [Accepted: 01/11/2024] [Indexed: 02/09/2024]
Abstract
Alternative polyadenylation (APA) is a key post-transcriptional regulatory mechanism; yet, its regulation and impact on human diseases remain understudied. Existing bulk RNA sequencing (RNA-seq)-based APA methods predominantly rely on predefined annotations, severely impacting their ability to decode novel tissue- and disease-specific APA changes. Furthermore, they only account for the most proximal and distal cleavage and polyadenylation sites (C/PASs). Deconvoluting overlapping C/PASs and the inherent noisy 3' UTR coverage in bulk RNA-seq data pose additional challenges. To overcome these limitations, we introduce PolyAMiner-Bulk, an attention-based deep learning algorithm that accurately recapitulates C/PAS sequence grammar, resolves overlapping C/PASs, captures non-proximal-to-distal APA changes, and generates visualizations to illustrate APA dynamics. Evaluation on multiple datasets strongly evinces the performance merit of PolyAMiner-Bulk, accurately identifying more APA changes compared with other methods. With the growing importance of APA and the abundance of bulk RNA-seq data, PolyAMiner-Bulk establishes a robust paradigm of APA analysis.
Collapse
Affiliation(s)
- Venkata Soumith Jonnakuti
- Department of Pediatrics, Baylor College of Medicine, Houston, TX 77030, USA; Jan and Dan Duncan Neurological Research Institute, Texas Children's Hospital, Houston, TX 77030, USA; Program in Quantitative and Computational Biology, Baylor College of Medicine, Houston, TX 77030, USA; Medical Scientist Training Program, Baylor College of Medicine, Houston, TX 77030, USA
| | - Eric J Wagner
- Department of Biochemistry and Biophysics, University of Rochester School of Medicine and Dentistry, Rochester, NY 14642, USA
| | - Mirjana Maletić-Savatić
- Department of Pediatrics, Baylor College of Medicine, Houston, TX 77030, USA; Jan and Dan Duncan Neurological Research Institute, Texas Children's Hospital, Houston, TX 77030, USA
| | - Zhandong Liu
- Department of Pediatrics, Baylor College of Medicine, Houston, TX 77030, USA; Jan and Dan Duncan Neurological Research Institute, Texas Children's Hospital, Houston, TX 77030, USA; Program in Quantitative and Computational Biology, Baylor College of Medicine, Houston, TX 77030, USA
| | - Hari Krishna Yalamanchili
- Department of Pediatrics, Baylor College of Medicine, Houston, TX 77030, USA; Jan and Dan Duncan Neurological Research Institute, Texas Children's Hospital, Houston, TX 77030, USA; USDA/ARS Children's Nutrition Research Center, Department of Pediatrics, Baylor College of Medicine, Houston, TX 77030, USA.
| |
Collapse
|
2
|
Wang T, Ye W, Zhang J, Li H, Zeng W, Zhu S, Ji G, Wu X, Ma L. Alternative 3'-untranslated regions regulate high-salt tolerance of Spartina alterniflora. PLANT PHYSIOLOGY 2023; 191:2570-2587. [PMID: 36682816 PMCID: PMC10069910 DOI: 10.1093/plphys/kiad030] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/29/2022] [Revised: 12/06/2022] [Accepted: 12/15/2022] [Indexed: 05/15/2023]
Abstract
High-salt stress continues to challenge the growth and survival of many plants. Alternative polyadenylation (APA) produces mRNAs with different 3'-untranslated regions (3' UTRs) to regulate gene expression at the post-transcriptional level. However, the roles of alternative 3' UTRs in response to salt stress remain elusive. Here, we report the function of alternative 3' UTRs in response to high-salt stress in S. alterniflora (Spartina alterniflora), a monocotyledonous halophyte tolerant of high-salt environments. We found that high-salt stress induced global APA dynamics, and ∼42% of APA genes responded to salt stress. High-salt stress led to 3' UTR lengthening of 207 transcripts through increasing the usage of distal poly(A) sites. Transcripts with alternative 3' UTRs were mainly enriched in salt stress-related ion transporters. Alternative 3' UTRs of HIGH-AFFINITY K+ TRANSPORTER 1 (SaHKT1) increased RNA stability and protein synthesis in vivo. Regulatory AU-rich elements were identified in alternative 3' UTRs, boosting the protein level of SaHKT1. RNAi-knock-down experiments revealed that the biogenesis of 3' UTR lengthening in SaHKT1 was controlled by the poly(A) factor CLEAVAGE AND POLYADENYLATION SPECIFICITY FACTOR 30 (SaCPSF30). Over-expression of SaHKT1 with an alternative 3' UTR in rice (Oryza sativa) protoplasts increased mRNA accumulation of salt-tolerance genes in an AU-rich element-dependent manner. These results suggest that mRNA 3' UTR lengthening is a potential mechanism in response to high-salt stress. These results also reveal complex regulatory roles of alternative 3' UTRs coupling APA and regulatory elements at the post-transcriptional level in plants.
Collapse
Affiliation(s)
- Taotao Wang
- College of Forestry, Haixia Institute of Science and Technology, School of Future Technology, Fujian Agriculture and Forestry University, Fuzhou, Fujian 350002, China
| | - Wenbin Ye
- Department of Automation, Xiamen University, Xiamen, Fujian 361005, China
| | - Jiaxiang Zhang
- College of Forestry, Haixia Institute of Science and Technology, School of Future Technology, Fujian Agriculture and Forestry University, Fuzhou, Fujian 350002, China
- College of Life Sciences, Fujian Agriculture and Forestry University, Fuzhou, Fujian 350002, China
| | - Han Li
- College of Forestry, Haixia Institute of Science and Technology, School of Future Technology, Fujian Agriculture and Forestry University, Fuzhou, Fujian 350002, China
- College of Life Sciences, Fujian Agriculture and Forestry University, Fuzhou, Fujian 350002, China
| | - Weike Zeng
- College of Forestry, Haixia Institute of Science and Technology, School of Future Technology, Fujian Agriculture and Forestry University, Fuzhou, Fujian 350002, China
- College of Life Sciences, Fujian Agriculture and Forestry University, Fuzhou, Fujian 350002, China
| | - Sheng Zhu
- Department of Automation, Xiamen University, Xiamen, Fujian 361005, China
| | - Guoli Ji
- Department of Automation, Xiamen University, Xiamen, Fujian 361005, China
| | - Xiaohui Wu
- Pasteurien College, Soochow University, Suzhou, Jiangsu 215000, China
| | - Liuyin Ma
- College of Forestry, Haixia Institute of Science and Technology, School of Future Technology, Fujian Agriculture and Forestry University, Fuzhou, Fujian 350002, China
| |
Collapse
|
3
|
Jonnakuti VS, Wagner EJ, Maletić-Savatić M, Liu Z, Yalamanchili HK. PolyAMiner-Bulk: A Machine Learning Based Bioinformatics Algorithm to Infer and Decode Alternative Polyadenylation Dynamics from bulk RNA-seq data. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.01.23.523471. [PMID: 36747700 PMCID: PMC9900750 DOI: 10.1101/2023.01.23.523471] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/26/2023]
Abstract
More than half of human genes exercise alternative polyadenylation (APA) and generate mRNA transcripts with varying 3' untranslated regions (UTR). However, current computational approaches for identifying cleavage and polyadenylation sites (C/PASs) and quantifying 3'UTR length changes from bulk RNA-seq data fail to unravel tissue- and disease-specific APA dynamics. Here, we developed a next-generation bioinformatics algorithm and application, PolyAMiner-Bulk, that utilizes an attention-based machine learning architecture and an improved vector projection-based engine to infer differential APA dynamics accurately. When applied to earlier studies, PolyAMiner-Bulk accurately identified more than twice the number of APA changes in an RBM17 knockdown bulk RNA-seq dataset compared to current generation tools. Moreover, on a separate dataset, PolyAMiner-Bulk revealed novel APA dynamics and pathways in scleroderma pathology and identified differential APA in a gene that was identified as being involved in scleroderma pathogenesis in an independent study. Lastly, we used PolyAMiner-Bulk to analyze the RNA-seq data of post-mortem prefrontal cortexes from the ROSMAP data consortium and unraveled novel APA dynamics in Alzheimer's Disease. Our method, PolyAMiner-Bulk, creates a paradigm for future alternative polyadenylation analysis from bulk RNA-seq data.
Collapse
Affiliation(s)
- Venkata Soumith Jonnakuti
- Department of Pediatrics, Baylor College of Medicine, Houston, TX, 77030, USA
- Jan and Dan Duncan Neurological Research Institute, Texas Children’s Hospital, Houston, TX, 77030, USA
- Program in Quantitative and Computational Biology, Baylor College of Medicine, Houston, TX, 77030, USA
- Medical Scientist Training Program, Baylor College of Medicine, Houston, TX, 77030, USA
| | - Eric J. Wagner
- Department of Biochemistry and Biophysics, University of Rochester School of Medicine and Dentistry, Rochester, NY 14642, USA
| | - Mirjana Maletić-Savatić
- Department of Pediatrics, Baylor College of Medicine, Houston, TX, 77030, USA
- Jan and Dan Duncan Neurological Research Institute, Texas Children’s Hospital, Houston, TX, 77030, USA
| | - Zhandong Liu
- Department of Pediatrics, Baylor College of Medicine, Houston, TX, 77030, USA
- Jan and Dan Duncan Neurological Research Institute, Texas Children’s Hospital, Houston, TX, 77030, USA
- Program in Quantitative and Computational Biology, Baylor College of Medicine, Houston, TX, 77030, USA
| | - Hari Krishna Yalamanchili
- Department of Pediatrics, Baylor College of Medicine, Houston, TX, 77030, USA
- Jan and Dan Duncan Neurological Research Institute, Texas Children’s Hospital, Houston, TX, 77030, USA
| |
Collapse
|
4
|
Yang J, Cao Y, Ma L. Co-Transcriptional RNA Processing in Plants: Exploring from the Perspective of Polyadenylation. Int J Mol Sci 2021; 22:ijms22073300. [PMID: 33804866 PMCID: PMC8037041 DOI: 10.3390/ijms22073300] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2021] [Revised: 03/09/2021] [Accepted: 03/19/2021] [Indexed: 12/13/2022] Open
Abstract
Most protein-coding genes in eukaryotes possess at least two poly(A) sites, and alternative polyadenylation is considered a contributing factor to transcriptomic and proteomic diversity. Following transcription, a nascent RNA usually undergoes capping, splicing, cleavage, and polyadenylation, resulting in a mature messenger RNA (mRNA); however, increasing evidence suggests that transcription and RNA processing are coupled. Plants, which must produce rapid responses to environmental changes because of their limited mobility, exhibit such coupling. In this review, we summarize recent advances in our understanding of the coupling of transcription with RNA processing in plants, and we describe the possible spatial environment and important proteins involved. Moreover, we describe how liquid–liquid phase separation, mediated by the C-terminal domain of RNA polymerase II and RNA processing factors with intrinsically disordered regions, enables efficient co-transcriptional mRNA processing in plants.
Collapse
|
5
|
Tu M, Li Y. Profiling Alternative 3' Untranslated Regions in Sorghum using RNA-seq Data. Front Genet 2020; 11:556749. [PMID: 33193635 PMCID: PMC7649775 DOI: 10.3389/fgene.2020.556749] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2020] [Accepted: 09/30/2020] [Indexed: 12/18/2022] Open
Abstract
Sorghum is an important crop widely used for food, feed, and fuel. Transcriptome-wide studies of 3′ untranslated regions (3′UTR) using regular RNA-seq remain scarce in sorghum, while transcriptomes have been characterized extensively using Illumina short-read sequencing platforms for many sorghum varieties under various conditions or developmental contexts. 3′UTR is a critical regulatory component of genes, controlling the translation, transport, and stability of messenger RNAs. In the present study, we profiled the alternative 3′UTRs at the transcriptome level in three genetically related but phenotypically contrasting lines of sorghum: Rio, BTx406, and R9188. A total of 1,197 transcripts with alternative 3′UTRs were detected using RNA-seq data. Their categorization identified 612 high-confidence alternative 3′UTRs. Importantly, the high-confidence alternative 3′UTR genes significantly overlapped with the genesets that are associated with RNA N6-methyladenosine (m6A) modification, suggesting a clear indication between alternative 3′UTR and m6A methylation in sorghum. Moreover, taking advantage of sorghum genetics, we provided evidence of genotype specificity of alternative 3′UTR usage. In summary, our work exemplifies a transcriptome-wide profiling of alternative 3′UTRs using regular RNA-seq data in non-model crops and gains insights into alternative 3′UTRs and their genotype specificity.
Collapse
Affiliation(s)
- Min Tu
- Waksman Institute of Microbiology, Rutgers, The State University of New Jersey, Piscataway, NJ, United States
| | - Yin Li
- Waksman Institute of Microbiology, Rutgers, The State University of New Jersey, Piscataway, NJ, United States
| |
Collapse
|
6
|
Yang Y, Li Y, Sancar A, Oztas O. The circadian clock shapes the Arabidopsis transcriptome by regulating alternative splicing and alternative polyadenylation. J Biol Chem 2020; 295:7608-7619. [PMID: 32303634 DOI: 10.1074/jbc.ra120.013513] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2020] [Revised: 04/10/2020] [Indexed: 01/24/2023] Open
Abstract
The circadian clock in plants temporally coordinates biological processes throughout the day, synchronizing gene expression with diurnal environmental changes. Circadian oscillator proteins are known to regulate the expression of clock-controlled plant genes by controlling their transcription. Here, using a high-throughput RNA-Seq approach, we examined genome-wide circadian and diurnal control of the Arabidopsis transcriptome, finding that the oscillation patterns of different transcripts of multitranscript genes can exhibit substantial differences and demonstrating that the circadian clock affects posttranscriptional regulation. In parallel, we found that two major posttranscriptional mechanisms, alternative splicing (AS; especially intron retention) and alternative polyadenylation (APA), display circadian rhythmicity resulting from oscillation in the genes involved in AS and APA. Moreover, AS-related genes exhibited rhythmic AS and APA regulation, adding another layer of complexity to circadian regulation of gene expression. We conclude that the Arabidopsis circadian clock not only controls transcription of genes but also affects their posttranscriptional regulation by influencing alternative splicing and alternative polyadenylation.
Collapse
Affiliation(s)
- Yuchen Yang
- Department of Genetics, University of North Carolina, Chapel Hill, North Carolina
| | - Yun Li
- Department of Genetics, University of North Carolina, Chapel Hill, North Carolina.,Department of Biostatistics, University of North Carolina, Chapel Hill, North Carolina.,Department of Computer Science, University of North Carolina, Chapel Hill, North Carolina
| | - Aziz Sancar
- Department of Biochemistry and Biophysics, University of North Carolina, Chapel Hill, North Carolina
| | - Onur Oztas
- Department of Biochemistry and Biophysics, University of North Carolina, Chapel Hill, North Carolina
| |
Collapse
|
7
|
Bernardes WS, Menossi M. Plant 3' Regulatory Regions From mRNA-Encoding Genes and Their Uses to Modulate Expression. FRONTIERS IN PLANT SCIENCE 2020; 11:1252. [PMID: 32922424 PMCID: PMC7457121 DOI: 10.3389/fpls.2020.01252] [Citation(s) in RCA: 30] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/25/2020] [Accepted: 07/29/2020] [Indexed: 05/08/2023]
Abstract
Molecular biotechnology has made it possible to explore the potential of plants for different purposes. The 3' regulatory regions have a great diversity of cis-regulatory elements directly involved in polyadenylation, stability, transport and mRNA translation, essential to achieve the desired levels of gene expression. A complex interaction between the cleavage and polyadenylation molecular complex and cis-elements determine the polyadenylation site, which may result in the choice of non-canonical sites, resulting in alternative polyadenylation events, involved in the regulation of more than 80% of the genes expressed in plants. In addition, after transcription, a wide array of RNA-binding proteins interacts with cis-acting elements located mainly in the 3' untranslated region, determining the fate of mRNAs in eukaryotic cells. Although a small number of 3' regulatory regions have been identified and validated so far, many studies have shown that plant 3' regulatory regions have a higher potential to regulate gene expression in plants compared to widely used 3' regulatory regions, such as NOS and OCS from Agrobacterium tumefaciens and 35S from cauliflower mosaic virus. In this review, we discuss the role of 3' regulatory regions in gene expression, and the superior potential that plant 3' regulatory regions have compared to NOS, OCS and 35S 3' regulatory regions.
Collapse
|
8
|
Zhu S, Ye W, Ye L, Fu H, Ye C, Xiao X, Ji Y, Lin W, Ji G, Wu X. PlantAPAdb: A Comprehensive Database for Alternative Polyadenylation Sites in Plants. PLANT PHYSIOLOGY 2020; 182:228-242. [PMID: 31767692 PMCID: PMC6945835 DOI: 10.1104/pp.19.00943] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/31/2019] [Accepted: 11/18/2019] [Indexed: 05/23/2023]
Abstract
Alternative cleavage and polyadenylation (APA) is increasingly recognized as an important regulatory mechanism in eukaryotic gene expression and is dynamically modulated in a developmental, tissue-specific, or environmentally responsive manner. Given the functional importance of APA and the rapid accumulation of APA sites in plants, a comprehensive and easily accessible APA site database is necessary for improved understanding of APA-mediated gene expression regulation. We present a database called PlantAPAdb that catalogs the most comprehensive APA site data derived from sequences from diverse 3' sequencing protocols and biological samples in plants. Currently, PlantAPAdb contains APA sites in six species, Oryza sativa (japonica and indica), Arabidopsis (Arabidopsis thaliana), Medicago truncatula, Trifolium pratense, Phyllostachys edulis, and Chlamydomonas reinhardtii APA sites in PlantAPAdb are available for bulk download and can be queried in a Google-like manner. PlantAPAdb provides rich information of the whole-genome APA sites, including genomic locations, heterogeneous cleavage sites, expression levels, and sample information. It also provides comprehensive poly(A) signals for APA sites in different genomic regions according to distinct profiles of cis-elements in plants. In addition, PlantAPAdb contains events of 3' untranslated region shortening/lengthening resulting from APA, which helps to understand the mechanisms underlying systematic changes in 3' untranslated region lengths. Additional information about conservation of APA sites in plants is also available, providing insights into the evolutionary polyadenylation configuration across species. As a user-friendly database, PlantAPAdb is a large and extendable resource for elucidating APA mechanisms, APA conservation, and gene expression regulation.
Collapse
Affiliation(s)
- Sheng Zhu
- Department of Automation, Xiamen University, Xiamen, Fujian 361005, China
- National Institute for Data Science in Health and Medicine, Xiamen University, Xiamen, Fujian 3611002, China
| | - Wenbin Ye
- Department of Automation, Xiamen University, Xiamen, Fujian 361005, China
- National Institute for Data Science in Health and Medicine, Xiamen University, Xiamen, Fujian 3611002, China
| | - Lishan Ye
- National Institute for Data Science in Health and Medicine, Xiamen University, Xiamen, Fujian 3611002, China
- Xiamen Health and Medical Big Data Center, Xiamen, Fujian 361008, China
| | - Hongjuan Fu
- Department of Automation, Xiamen University, Xiamen, Fujian 361005, China
- National Institute for Data Science in Health and Medicine, Xiamen University, Xiamen, Fujian 3611002, China
| | - Congting Ye
- Key Laboratory of the Ministry of Education for Coastal and Wetland Ecosystems, College of the Environment and Ecology, Xiamen University, Xiamen, Fujian 361102, China
| | - Xuesong Xiao
- Department of Automation, Xiamen University, Xiamen, Fujian 361005, China
| | - Yuanhaowei Ji
- School of Mathematics, Northwest University, Xi'an, Shanxi 710127, China
| | - Weixu Lin
- Department of Automation, Xiamen University, Xiamen, Fujian 361005, China
| | - Guoli Ji
- Department of Automation, Xiamen University, Xiamen, Fujian 361005, China
- National Institute for Data Science in Health and Medicine, Xiamen University, Xiamen, Fujian 3611002, China
| | - Xiaohui Wu
- Department of Automation, Xiamen University, Xiamen, Fujian 361005, China
- National Institute for Data Science in Health and Medicine, Xiamen University, Xiamen, Fujian 3611002, China
| |
Collapse
|
9
|
Ye C, Zhou Q, Wu X, Ji G, Li QQ. Genome-wide alternative polyadenylation dynamics in response to biotic and abiotic stresses in rice. ECOTOXICOLOGY AND ENVIRONMENTAL SAFETY 2019; 183:109485. [PMID: 31376807 DOI: 10.1016/j.ecoenv.2019.109485] [Citation(s) in RCA: 32] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/05/2019] [Revised: 07/24/2019] [Accepted: 07/26/2019] [Indexed: 05/24/2023]
Abstract
Alternative polyadenylation (APA) is an important way to regulate gene expression at the post-transcriptional level, and is extensively involved in plant stress responses. However, the systematic roles of APA regulation in response to abiotic and biotic stresses in rice at the genome scale remain unknown. To take advantage of available RNA-seq datasets, using a novel tool APAtrap, we identified thousands of genes with significantly differential usage of polyadenylation [poly(A)] sites in response to the abiotic stress (drought, heat shock, and cadmium) and biotic stress [bacterial blight (BB), rice blast, and rice stripe virus (RSV)]. Genes with stress-responsive APA dynamics commonly exhibited higher expression levels when their isoforms with short 3' untranslated region (3' UTR) were more abundant. The stress-responsive APA events were widely involved in crucial stress-responsive genes and pathways: e.g. APA acted as a negative regulator in heat stress tolerance; APA events were involved in DNA repair and cell wall formation under Cd stress; APA regulated chlorophyll metabolism, being associated with the pathogenesis of leaf diseases under RSV and BB challenges. Furthermore, APA events were found to be involved in glutathione metabolism and MAPK signaling pathways, mediating a crosstalk among the abiotic and biotic stress-responsive regulatory networks in rice. Analysis of large-scale datasets revealed that APA may regulate abiotic and biotic stress-responsive processes in rice. Such post-transcriptome diversities contribute to rice adaption to various environmental challenges. Our study would supply useful resource for further molecular assisted breeding of multiple stress-tolerant cultivars for rice.
Collapse
Affiliation(s)
- Congting Ye
- Key Laboratory of the Ministry of Education for Coastal and Wetland Ecosystems, College of the Environment and Ecology, Xiamen University, Xiamen, Fujian, 361102, China.
| | - Qian Zhou
- Key Laboratory of the Ministry of Education for Coastal and Wetland Ecosystems, College of the Environment and Ecology, Xiamen University, Xiamen, Fujian, 361102, China; Graduate College of Biomedical Sciences, Western University of Health Sciences, Pomona, CA, 91766, USA.
| | - Xiaohui Wu
- Department of Automation, Xiamen University, Xiamen, Fujian, 361005, China.
| | - Guoli Ji
- Department of Automation, Xiamen University, Xiamen, Fujian, 361005, China.
| | - Qingshun Quinn Li
- Key Laboratory of the Ministry of Education for Coastal and Wetland Ecosystems, College of the Environment and Ecology, Xiamen University, Xiamen, Fujian, 361102, China; Graduate College of Biomedical Sciences, Western University of Health Sciences, Pomona, CA, 91766, USA.
| |
Collapse
|
10
|
Chen M, Ji G, Fu H, Lin Q, Ye C, Ye W, Su Y, Wu X. A survey on identification and quantification of alternative polyadenylation sites from RNA-seq data. Brief Bioinform 2019; 21:1261-1276. [PMID: 31267126 DOI: 10.1093/bib/bbz068] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2019] [Revised: 05/03/2019] [Accepted: 05/14/2019] [Indexed: 12/13/2022] Open
Abstract
Alternative polyadenylation (APA) has been implicated to play an important role in post-transcriptional regulation by regulating mRNA abundance, stability, localization and translation, which contributes considerably to transcriptome diversity and gene expression regulation. RNA-seq has become a routine approach for transcriptome profiling, generating unprecedented data that could be used to identify and quantify APA site usage. A number of computational approaches for identifying APA sites and/or dynamic APA events from RNA-seq data have emerged in the literature, which provide valuable yet preliminary results that should be refined to yield credible guidelines for the scientific community. In this review, we provided a comprehensive overview of the status of currently available computational approaches. We also conducted objective benchmarking analysis using RNA-seq data sets from different species (human, mouse and Arabidopsis) and simulated data sets to present a systematic evaluation of 11 representative methods. Our benchmarking study showed that the overall performance of all tools investigated is moderate, reflecting that there is still lot of scope to improve the prediction of APA site or dynamic APA events from RNA-seq data. Particularly, prediction results from individual tools differ considerably, and only a limited number of predicted APA sites or genes are common among different tools. Accordingly, we attempted to give some advice on how to assess the reliability of the obtained results. We also proposed practical recommendations on the appropriate method applicable to diverse scenarios and discussed implications and future directions relevant to profiling APA from RNA-seq data.
Collapse
Affiliation(s)
- Moliang Chen
- Department of Automation, Xiamen University, Xiamen 361005, China.,Xiamen Research Institute of National Center of Healthcare Big Data, Xiamen 361005, China
| | - Guoli Ji
- Department of Automation, Xiamen University, Xiamen 361005, China.,Xiamen Research Institute of National Center of Healthcare Big Data, Xiamen 361005, China
| | - Hongjuan Fu
- Department of Automation, Xiamen University, Xiamen 361005, China.,Xiamen Research Institute of National Center of Healthcare Big Data, Xiamen 361005, China
| | - Qianmin Lin
- Xiang' an hospital of Xiamen university, Xiamen 361005, China
| | - Congting Ye
- Key Laboratory of the Ministry of Education for Coastal and Wetland Ecosystems, College of the Environment and Ecology, Xiamen University, Xiamen, Fujian 361102, China
| | - Wenbin Ye
- Department of Automation, Xiamen University, Xiamen 361005, China.,Xiamen Research Institute of National Center of Healthcare Big Data, Xiamen 361005, China
| | - Yaru Su
- College of Mathematics and Computer Science, Fuzhou University, Fuzhou 350116, China
| | - Xiaohui Wu
- Department of Automation, Xiamen University, Xiamen 361005, China.,Xiamen Research Institute of National Center of Healthcare Big Data, Xiamen 361005, China
| |
Collapse
|
11
|
Ye C, Long Y, Ji G, Li QQ, Wu X. APAtrap: identification and quantification of alternative polyadenylation sites from RNA-seq data. Bioinformatics 2019; 34:1841-1849. [PMID: 29360928 DOI: 10.1093/bioinformatics/bty029] [Citation(s) in RCA: 74] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2017] [Accepted: 01/17/2018] [Indexed: 12/28/2022] Open
Abstract
Motivation Alternative polyadenylation (APA) has been increasingly recognized as a crucial mechanism that contributes to transcriptome diversity and gene expression regulation. As RNA-seq has become a routine protocol for transcriptome analysis, it is of great interest to leverage such unprecedented collection of RNA-seq data by new computational methods to extract and quantify APA dynamics in these transcriptomes. However, research progress in this area has been relatively limited. Conventional methods rely on either transcript assembly to determine transcript 3' ends or annotated poly(A) sites. Moreover, they can neither identify more than two poly(A) sites in a gene nor detect dynamic APA site usage considering more than two poly(A) sites. Results We developed an approach called APAtrap based on the mean squared error model to identify and quantify APA sites from RNA-seq data. APAtrap is capable of identifying novel 3' UTRs and 3' UTR extensions, which contributes to locating potential poly(A) sites in previously overlooked regions and improving genome annotations. APAtrap also aims to tally all potential poly(A) sites and detect genes with differential APA site usages between conditions. Extensive comparisons of APAtrap with two other latest methods, ChangePoint and DaPars, using various RNA-seq datasets from simulation studies, human and Arabidopsis demonstrate the efficacy and flexibility of APAtrap for any organisms with an annotated genome. Availability and implementation Freely available for download at https://apatrap.sourceforge.io. Contact liqq@xmu.edu.cn or xhuister@xmu.edu.cn. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Congting Ye
- Key Laboratory of the Ministry of Education for Coastal and Wetland Ecosystems, College of the Environment and Ecology, Xiamen University, Xiamen, Fujian 361102, China
| | - Yuqi Long
- Department of Automation, Xiamen University, Xiamen, Fujian 361005, China
| | - Guoli Ji
- Department of Automation, Xiamen University, Xiamen, Fujian 361005, China
| | - Qingshun Quinn Li
- Key Laboratory of the Ministry of Education for Coastal and Wetland Ecosystems, College of the Environment and Ecology, Xiamen University, Xiamen, Fujian 361102, China.,Graduate College of Biomedical Sciences, Western University of Health Sciences, Pomona, CA 91766, USA
| | - Xiaohui Wu
- Department of Automation, Xiamen University, Xiamen, Fujian 361005, China
| |
Collapse
|
12
|
Ye W, Long Y, Ji G, Su Y, Ye P, Fu H, Wu X. Cluster analysis of replicated alternative polyadenylation data using canonical correlation analysis. BMC Genomics 2019; 20:75. [PMID: 30669970 PMCID: PMC6343338 DOI: 10.1186/s12864-019-5433-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2018] [Accepted: 01/03/2019] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Alternative polyadenylation (APA) has emerged as a pervasive mechanism that contributes to the transcriptome complexity and dynamics of gene regulation. The current tsunami of whole genome poly(A) site data from various conditions generated by 3' end sequencing provides a valuable data source for the study of APA-related gene expression. Cluster analysis is a powerful technique for investigating the association structure among genes, however, conventional gene clustering methods are not suitable for APA-related data as they fail to consider the information of poly(A) sites (e.g., location, abundance, number, etc.) within each gene or measure the association among poly(A) sites between two genes. RESULTS Here we proposed a computational framework, named PASCCA, for clustering genes from replicated or unreplicated poly(A) site data using canonical correlation analysis (CCA). PASCCA incorporates multiple layers of gene expression data from both the poly(A) site level and gene level and takes into account the number of replicates and the variability within each experimental group. Moreover, PASCCA characterizes poly(A) sites in various ways including the abundance and relative usage, which can exploit the advantages of 3' end deep sequencing in quantifying APA sites. Using both real and synthetic poly(A) site data sets, the cluster analysis demonstrates that PASCCA outperforms other widely-used distance measures under five performance metrics including connectivity, the Dunn index, average distance, average distance between means, and the biological homogeneity index. We also used PASCCA to infer APA-specific gene modules from recently published poly(A) site data of rice and discovered some distinct functional gene modules. We have made PASCCA an easy-to-use R package for APA-related gene expression analyses, including the characterization of poly(A) sites, quantification of association between genes, and clustering of genes. CONCLUSIONS By providing a better treatment of the noise inherent in repeated measurements and taking into account multiple layers of poly(A) site data, PASCCA could be a general tool for clustering and analyzing APA-specific gene expression data. PASCCA could be used to elucidate the dynamic interplay of genes and their APA sites among various biological conditions from emerging 3' end sequencing data to address the complex biological phenomenon.
Collapse
Affiliation(s)
- Wenbin Ye
- Department of Automation, Xiamen University, Xiamen, 361005, China.,Innovation Center for Cell Biology, Xiamen University, Xiamen, 361005, China
| | - Yuqi Long
- Department of Automation, Xiamen University, Xiamen, 361005, China.,Software Quality Testing Engineering Research Center, China Electronic Product Reliability and Environmental Testing Research Institute, Guangzhou, 510610, China
| | - Guoli Ji
- Department of Automation, Xiamen University, Xiamen, 361005, China.,Innovation Center for Cell Biology, Xiamen University, Xiamen, 361005, China
| | - Yaru Su
- College of Mathematics and Computer Science, Fuzhou University, Fuzhou, 350116, China
| | - Pengchao Ye
- Department of Automation, Xiamen University, Xiamen, 361005, China
| | - Hongjuan Fu
- Department of Automation, Xiamen University, Xiamen, 361005, China
| | - Xiaohui Wu
- Department of Automation, Xiamen University, Xiamen, 361005, China. .,Innovation Center for Cell Biology, Xiamen University, Xiamen, 361005, China.
| |
Collapse
|
13
|
Zhang Y, Carrion SA, Zhang Y, Zhang X, Zinski AL, Michal JJ, Jiang Z. Alternative polyadenylation analysis in animals and plants: newly developed strategies for profiling, processing and validation. Int J Biol Sci 2018; 14:1709-1714. [PMID: 30416385 PMCID: PMC6216028 DOI: 10.7150/ijbs.27168] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2018] [Accepted: 08/05/2018] [Indexed: 12/25/2022] Open
Abstract
Alternative polyadenylation is an essential RNA processing event that contributes significantly to regulation of transcriptome diversity and functional dynamics in both animals and plants. Here we review newly developed next generation sequencing methods for genome-wide profiling of alternative polyadenylation (APA) sites, bioinformatics pipelines for data processing and both wet and dry laboratory approaches for APA validation. The library construction methods LITE-Seq (Low-Input 3'-Terminal sequencing) and PAC-seq (PolyA Click sequencing) tag polyA+ cDNA, while BAT-seq (BArcoded, three-prime specific sequencing) and PAPERCLIP (Poly(A) binding Protein-mediated mRNA 3'End Retrieval by CrossLinking ImmunoPrecipitation) enrich polyA+ RNA. Interestingly, only WTTS-seq (Whole Transcriptome Termini Site sequencing) targets both polyA+ RNA and polyA+ cDNA. Varieties of bioinformatics pipelines are well established to pursue read quality control, mapping, clustering, characterization and pathway analysis. The RHAPA (RNase H alternative polyadenylation assay) and 3'RACE-seq (3' rapid amplification of cDNA end sequencing) methods directly validate APA sites, while WTSS-seq (whole transcriptome start site sequencing), RNA-seq (RNA sequencing) and public APA databases can serve as indirect validation methods. We hope that these tools, pipelines and resources trigger huge waves of interest in the research community to investigate APA events underlying physiological, pathological and psychological changes and thus understand the information transfer events from genome to phenome relevant to economically important traits in both animals and plants.
Collapse
Affiliation(s)
| | | | | | | | | | | | - Zhihua Jiang
- Department of Animal Sciences and Center for Reproductive Biology, Washington State University, Pullman, WA 99164-7620
| |
Collapse
|
14
|
Hong L, Ye C, Lin J, Fu H, Wu X, Li QQ. Alternative polyadenylation is involved in auxin-based plant growth and development. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2018; 93:246-258. [PMID: 29155478 DOI: 10.1111/tpj.13771] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/21/2017] [Revised: 10/24/2017] [Accepted: 10/31/2017] [Indexed: 05/24/2023]
Abstract
Auxin is widely involved in plant growth and development. However, the molecular mechanism on how auxin carries out this work is unclear. In particular, the effect of auxin on pre-mRNA post-transcriptional regulation is mostly unknown. By using a poly(A) tag (PAT) sequencing approach, mRNA alternative polyadenylation (APA) profiles after auxin treatment were revealed. We showed that hundreds of poly(A) site clusters (PACs) are affected by auxin at the transcriptome level, where auxin reduces PAC distribution in 5'-untranslated region (UTR), but increases in the 3'UTR. APA site usage frequencies of 42 genes were switched by auxin, suggesting that auxin affects the choice of poly(A) sites. Furthermore, poly(A) signal selection was altered after auxin treatment. For example, a mutant of poly(A) signal binding protein CPSF30 showed altered sensitivity to auxin treatment, indicating interactions between auxin and the poly(A) signal recognition machinery. We also found that auxin activity on lateral root development is likely mediated by altered expression of ARF7, ARF19 and IAA14 through poly(A) site switches. Our results shed light on the molecular mechanisms of auxin responses relative to its interactions with mRNA polyadenylation.
Collapse
Affiliation(s)
- Liwei Hong
- Key Laboratory of the Ministry of Education for Coastal and Wetland Ecosystems, College of the Environment and Ecology, Xiamen University, Xiamen, Fujian, 361102, China
| | - Congting Ye
- Key Laboratory of the Ministry of Education for Coastal and Wetland Ecosystems, College of the Environment and Ecology, Xiamen University, Xiamen, Fujian, 361102, China
| | - Juncheng Lin
- Key Laboratory of the Ministry of Education for Coastal and Wetland Ecosystems, College of the Environment and Ecology, Xiamen University, Xiamen, Fujian, 361102, China
- Graduate College of Biomedical Sciences, Western University of Health Sciences, Pomona, CA, 91766, USA
| | - Haihui Fu
- Key Laboratory of the Ministry of Education for Coastal and Wetland Ecosystems, College of the Environment and Ecology, Xiamen University, Xiamen, Fujian, 361102, China
| | - Xiaohui Wu
- Department of Automation, Xiamen University, Xiamen, Fujian, 361005, China
| | - Qingshun Q Li
- Key Laboratory of the Ministry of Education for Coastal and Wetland Ecosystems, College of the Environment and Ecology, Xiamen University, Xiamen, Fujian, 361102, China
- Graduate College of Biomedical Sciences, Western University of Health Sciences, Pomona, CA, 91766, USA
| |
Collapse
|
15
|
Fu H, Yang D, Su W, Ma L, Shen Y, Ji G, Ye X, Wu X, Li QQ. Genome-wide dynamics of alternative polyadenylation in rice. Genome Res 2016; 26:1753-1760. [PMID: 27733415 PMCID: PMC5131826 DOI: 10.1101/gr.210757.116] [Citation(s) in RCA: 56] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2016] [Accepted: 10/06/2016] [Indexed: 12/02/2022]
Abstract
Alternative polyadenylation (APA), in which a transcript uses one of the poly(A) sites to define its 3'-end, is a common regulatory mechanism in eukaryotic gene expression. However, the potential of APA in determining crop agronomic traits remains elusive. This study systematically tallied poly(A) sites of 14 different rice tissues and developmental stages using the poly(A) tag sequencing (PAT-seq) approach. The results indicate significant involvement of APA in developmental and quantitative trait loci (QTL) gene expression. About 48% of all expressed genes use APA to generate transcriptomic and proteomic diversity. Some genes switch APA sites, allowing differentially expressed genes to use alternate 3' UTRs. Interestingly, APA in mature pollen is distinct where differential expression levels of a set of poly(A) factors and different distributions of APA sites are found, indicating a unique mRNA 3'-end formation regulation during gametophyte development. Equally interesting, statistical analyses showed that QTL tends to use APA for regulation of gene expression of many agronomic traits, suggesting a potential important role of APA in rice production. These results provide thus far the most comprehensive and high-resolution resource for advanced analysis of APA in crops and shed light on how APA is associated with trait formation in eukaryotes.
Collapse
Affiliation(s)
- Haihui Fu
- Key Laboratory of the Ministry of Education for Coastal and Wetland Ecosystems, College of the Environment and Ecology, Xiamen University, Xiamen, Fujian, China, 361102
| | - Dewei Yang
- Rice Research Institute, Fujian Academy of Agricultural Sciences, Fuzhou, Fujian, China, 350018
| | - Wenyue Su
- Key Laboratory of the Ministry of Education for Coastal and Wetland Ecosystems, College of the Environment and Ecology, Xiamen University, Xiamen, Fujian, China, 361102
| | - Liuyin Ma
- Key Laboratory of the Ministry of Education for Coastal and Wetland Ecosystems, College of the Environment and Ecology, Xiamen University, Xiamen, Fujian, China, 361102
| | - Yingjia Shen
- Key Laboratory of the Ministry of Education for Coastal and Wetland Ecosystems, College of the Environment and Ecology, Xiamen University, Xiamen, Fujian, China, 361102
| | - Guoli Ji
- Department of Automation, Xiamen University, Xiamen, Fujian, China, 361005
| | - Xinfu Ye
- Rice Research Institute, Fujian Academy of Agricultural Sciences, Fuzhou, Fujian, China, 350018
| | - Xiaohui Wu
- Department of Automation, Xiamen University, Xiamen, Fujian, China, 361005
| | - Qingshun Q Li
- Key Laboratory of the Ministry of Education for Coastal and Wetland Ecosystems, College of the Environment and Ecology, Xiamen University, Xiamen, Fujian, China, 361102
- Rice Research Institute, Fujian Academy of Agricultural Sciences, Fuzhou, Fujian, China, 350018
- Graduate College of Biomedical Sciences, Western University of Health Sciences, Pomona, California 91766, USA
| |
Collapse
|