1
|
Chen B, Chen X, Hu R, Li H, Wang M, Zhou L, Chen H, Wang J, Zhang H, Zhou X, Zhang H. Alternative polyadenylation regulates the translation of metabolic and inflammation-related proteins in adipose tissue of gestational diabetes mellitus. Comput Struct Biotechnol J 2024; 23:1298-1310. [PMID: 38560280 PMCID: PMC10978812 DOI: 10.1016/j.csbj.2024.03.013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2023] [Revised: 02/25/2024] [Accepted: 03/14/2024] [Indexed: 04/04/2024] Open
Abstract
In gestational diabetes mellitus (GDM), adipose tissue undergoes metabolic disturbances and chronic low-grade inflammation. Alternative polyadenylation (APA) is a post-transcriptional modification mechanism that generates mRNA with variable lengths of 3' untranslated regions (3'UTR), and it is associated with inflammation and metabolism. However, the role of APA in GDM adipose tissue has not been well characterized. In this study, we conducted transcriptomic and proteomic sequencing on subcutaneous and omental adipose tissues from both control and GDM patients. Using Dapars, a novel APA quantitative algorithm, we delineated the APA landscape of adipose tissue, revealing significant 3'UTR elongation of mRNAs in the GDM group. Omental adipose tissue exhibited a significant correlation between elongated 3'UTRs and reduced translation levels of genes related to metabolism and inflammation. Validation experiments in THP-1 derived macrophages (TDMs) demonstrated the impact of APA on translation levels by overexpressing long and short 3'UTR isoforms of a representative gene LRRC25. Additionally, LRRC25 was validated to suppress proinflammatory polarization in TDMs. Further exploration revealed two underexpressed APA trans-acting factors, CSTF3 and PPP1CB, in GDM omental adipose tissue. In conclusion, this study provides preliminary insights into the APA landscape of GDM adipose tissue. Reduced APA regulation in GDM omental adipose tissue may contribute to metabolic disorders and inflammation by downregulating gene translation levels. These findings advance our understanding of the molecular mechanisms underlying GDM-associated adipose tissue changes.
Collapse
Affiliation(s)
- Bingnan Chen
- Department of Obstetrics and Gynecology, The First Affiliated Hospital of Chongqing Medical University, Chongqing, China
- State Key Laboratory of Maternal and Fetal Medicine of Chongqing Municipality, Chongqing Medical University, Chongqing, China
- The Chongqing Key Laboratory of Translational Medicine in Major Metabolic Diseases, The First Affiliated Hospital of Chongqing Medical University, Chongqing, China
| | - Xuyang Chen
- Department of Obstetrics and Gynecology, The First Affiliated Hospital of Chongqing Medical University, Chongqing, China
- State Key Laboratory of Maternal and Fetal Medicine of Chongqing Municipality, Chongqing Medical University, Chongqing, China
| | - Ruohan Hu
- Biomedical Sciences, University of Otago, Dunedin, New Zealand
| | - Hongli Li
- Department of Obstetrics and Gynecology, The First Affiliated Hospital of Chongqing Medical University, Chongqing, China
- State Key Laboratory of Maternal and Fetal Medicine of Chongqing Municipality, Chongqing Medical University, Chongqing, China
| | - Min Wang
- Department of Obstetrics and Gynecology, The First Affiliated Hospital of Chongqing Medical University, Chongqing, China
- State Key Laboratory of Maternal and Fetal Medicine of Chongqing Municipality, Chongqing Medical University, Chongqing, China
- The Chongqing Key Laboratory of Translational Medicine in Major Metabolic Diseases, The First Affiliated Hospital of Chongqing Medical University, Chongqing, China
| | - Linwei Zhou
- Department of Obstetrics and Gynecology, The First Affiliated Hospital of Chongqing Medical University, Chongqing, China
- State Key Laboratory of Maternal and Fetal Medicine of Chongqing Municipality, Chongqing Medical University, Chongqing, China
| | - Hao Chen
- Department of Obstetrics and Gynecology, The First Affiliated Hospital of Chongqing Medical University, Chongqing, China
- State Key Laboratory of Maternal and Fetal Medicine of Chongqing Municipality, Chongqing Medical University, Chongqing, China
| | - Jianqi Wang
- Department of Obstetrics and Gynecology, The First Affiliated Hospital of Chongqing Medical University, Chongqing, China
- State Key Laboratory of Maternal and Fetal Medicine of Chongqing Municipality, Chongqing Medical University, Chongqing, China
| | - Hanwen Zhang
- Department of Obstetrics and Gynecology, The First Affiliated Hospital of Chongqing Medical University, Chongqing, China
- State Key Laboratory of Maternal and Fetal Medicine of Chongqing Municipality, Chongqing Medical University, Chongqing, China
| | - Xiaobo Zhou
- Department of Obstetrics and Gynecology, The First Affiliated Hospital of Chongqing Medical University, Chongqing, China
- State Key Laboratory of Maternal and Fetal Medicine of Chongqing Municipality, Chongqing Medical University, Chongqing, China
| | - Hua Zhang
- Department of Obstetrics and Gynecology, The First Affiliated Hospital of Chongqing Medical University, Chongqing, China
- State Key Laboratory of Maternal and Fetal Medicine of Chongqing Municipality, Chongqing Medical University, Chongqing, China
| |
Collapse
|
2
|
Ye W, Lian Q, Ye C, Wu X. A Survey on Methods for Predicting Polyadenylation Sites from DNA Sequences, Bulk RNA-seq, and Single-cell RNA-seq. GENOMICS, PROTEOMICS & BIOINFORMATICS 2022:S1672-0229(22)00121-8. [PMID: 36167284 PMCID: PMC10372920 DOI: 10.1016/j.gpb.2022.09.005] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/19/2022] [Revised: 08/17/2022] [Accepted: 09/19/2022] [Indexed: 05/08/2023]
Abstract
Alternative polyadenylation (APA) plays important roles in modulating mRNA stability, translation, and subcellular localization, and contributes extensively to shaping eukaryotic transcriptome complexity and proteome diversity. Identification of poly(A) sites (pAs) on a genome-wide scale is a critical step toward understanding the underlying mechanism of APA-mediated gene regulation. A number of established computational tools have been proposed to predict pAs from diverse genomic data. Here we provided an exhaustive overview of computational approaches for predicting pAs from DNA sequences, bulk RNA sequencing (RNA-seq) data, and single-cell RNA sequencing (scRNA-seq) data. Particularly, we examined several representative tools using bulk RNA-seq and scRNA-seq data from peripheral blood mononuclear cells and put forward operable suggestions on how to assess the reliability of pAs predicted by different tools. We also proposed practical guidelines on choosing appropriate methods applicable to diverse scenarios. Moreover, we discussed in depth the challenges in improving the performance of pA prediction and benchmarking different methods. Additionally, we highlighted outstanding challenges and opportunities using new machine learning and integrative multi-omics techniques, and provided our perspective on how computational methodologies might evolve in the future for non-3' untranslated region, tissue-specific, cross-species, and single-cell pA prediction.
Collapse
Affiliation(s)
- Wenbin Ye
- Pasteurien College, Suzhou Medical College of Soochow University, Soochow University, Suzhou 215000, China
| | - Qiwei Lian
- Pasteurien College, Suzhou Medical College of Soochow University, Soochow University, Suzhou 215000, China; Department of Automation, Xiamen University, Xiamen 361005, China
| | - Congting Ye
- Key Laboratory of the Coastal and Wetland Ecosystems, Ministry of Education, College of the Environment and Ecology, Xiamen University, Xiamen 361005, China
| | - Xiaohui Wu
- Pasteurien College, Suzhou Medical College of Soochow University, Soochow University, Suzhou 215000, China.
| |
Collapse
|
3
|
Shkurin A, Hughes TR. Known sequence features can explain half of all human gene ends. NAR Genom Bioinform 2021; 3:lqab042. [PMID: 34104882 PMCID: PMC8176999 DOI: 10.1093/nargab/lqab042] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2021] [Revised: 04/14/2021] [Accepted: 05/10/2021] [Indexed: 11/15/2022] Open
Abstract
Cleavage and polyadenylation (CPA) sites define eukaryotic gene ends. CPA sites are associated with five key sequence recognition elements: the upstream UGUA, the polyadenylation signal (PAS), and U-rich sequences; the CA/UA dinucleotide where cleavage occurs; and GU-rich downstream elements (DSEs). Currently, it is not clear whether these sequences are sufficient to delineate CPA sites. Additionally, numerous other sequences and factors have been described, often in the context of promoting alternative CPA sites and preventing cryptic CPA site usage. Here, we dissect the contributions of individual sequence features to CPA using standard discriminative models. We show that models comprised only of the five primary CPA sequence features give highest probability scores to constitutive CPA sites at the ends of coding genes, relative to the entire pre-mRNA sequence, for 41% of all human genes. U1-hybridizing sequences provide a small boost in performance. The addition of all known RBP RNA binding motifs to the model, however, increases this figure to 49%, and suggests an involvement of both known and suspected CPA regulators as well as potential new factors in delineating constitutive CPA sites. To our knowledge, this high effectiveness of established features to predict human gene ends has not previously been documented.
Collapse
Affiliation(s)
- Aleksei Shkurin
- Department of Molecular Genetics, University of Toronto, Toronto, ON M5S 1A8, Canada
- Terrence Donnelly Centre for Cellular & Biomolecular Research, Toronto, ON M5S 3E1, Canada
| | - Timothy R Hughes
- Department of Molecular Genetics, University of Toronto, Toronto, ON M5S 1A8, Canada
- Terrence Donnelly Centre for Cellular & Biomolecular Research, Toronto, ON M5S 3E1, Canada
| |
Collapse
|
4
|
Poly(A)-DG: A deep-learning-based domain generalization method to identify cross-species Poly(A) signal without prior knowledge from target species. PLoS Comput Biol 2020; 16:e1008297. [PMID: 33151940 PMCID: PMC7671507 DOI: 10.1371/journal.pcbi.1008297] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2020] [Revised: 11/17/2020] [Accepted: 08/30/2020] [Indexed: 11/19/2022] Open
Abstract
In eukaryotes, polyadenylation (poly(A)) is an essential process during mRNA maturation. Identifying the cis-determinants of poly(A) signal (PAS) on the DNA sequence is the key to understand the mechanism of translation regulation and mRNA metabolism. Although machine learning methods were widely used in computationally identifying PAS, the need for tremendous amounts of annotation data hinder applications of existing methods in species without experimental data on PAS. Therefore, cross-species PAS identification, which enables the possibility to predict PAS from untrained species, naturally becomes a promising direction. In our works, we propose a novel deep learning method named Poly(A)-DG for cross-species PAS identification. Poly(A)-DG consists of a Convolution Neural Network-Multilayer Perceptron (CNN-MLP) network and a domain generalization technique. It learns PAS patterns from the training species and identifies PAS in target species without re-training. To test our method, we use four species and build cross-species training sets with two of them and evaluate the performance of the remaining ones. Moreover, we test our method against insufficient data and imbalanced data issues and demonstrate that Poly(A)-DG not only outperforms state-of-the-art methods but also maintains relatively high accuracy when it comes to a smaller or imbalanced training set.
Collapse
|
5
|
Tu M, Li Y. Profiling Alternative 3' Untranslated Regions in Sorghum using RNA-seq Data. Front Genet 2020; 11:556749. [PMID: 33193635 PMCID: PMC7649775 DOI: 10.3389/fgene.2020.556749] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2020] [Accepted: 09/30/2020] [Indexed: 12/18/2022] Open
Abstract
Sorghum is an important crop widely used for food, feed, and fuel. Transcriptome-wide studies of 3′ untranslated regions (3′UTR) using regular RNA-seq remain scarce in sorghum, while transcriptomes have been characterized extensively using Illumina short-read sequencing platforms for many sorghum varieties under various conditions or developmental contexts. 3′UTR is a critical regulatory component of genes, controlling the translation, transport, and stability of messenger RNAs. In the present study, we profiled the alternative 3′UTRs at the transcriptome level in three genetically related but phenotypically contrasting lines of sorghum: Rio, BTx406, and R9188. A total of 1,197 transcripts with alternative 3′UTRs were detected using RNA-seq data. Their categorization identified 612 high-confidence alternative 3′UTRs. Importantly, the high-confidence alternative 3′UTR genes significantly overlapped with the genesets that are associated with RNA N6-methyladenosine (m6A) modification, suggesting a clear indication between alternative 3′UTR and m6A methylation in sorghum. Moreover, taking advantage of sorghum genetics, we provided evidence of genotype specificity of alternative 3′UTR usage. In summary, our work exemplifies a transcriptome-wide profiling of alternative 3′UTRs using regular RNA-seq data in non-model crops and gains insights into alternative 3′UTRs and their genotype specificity.
Collapse
Affiliation(s)
- Min Tu
- Waksman Institute of Microbiology, Rutgers, The State University of New Jersey, Piscataway, NJ, United States
| | - Yin Li
- Waksman Institute of Microbiology, Rutgers, The State University of New Jersey, Piscataway, NJ, United States
| |
Collapse
|
6
|
Arefeen A, Xiao X, Jiang T. DeepPASTA: deep neural network based polyadenylation site analysis. Bioinformatics 2020; 35:4577-4585. [PMID: 31081512 DOI: 10.1093/bioinformatics/btz283] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2018] [Revised: 03/22/2019] [Accepted: 04/16/2019] [Indexed: 12/12/2022] Open
Abstract
MOTIVATION Alternative polyadenylation (polyA) sites near the 3' end of a pre-mRNA create multiple mRNA transcripts with different 3' untranslated regions (3' UTRs). The sequence elements of a 3' UTR are essential for many biological activities such as mRNA stability, sub-cellular localization, protein translation, protein binding and translation efficiency. Moreover, numerous studies in the literature have reported the correlation between diseases and the shortening (or lengthening) of 3' UTRs. As alternative polyA sites are common in mammalian genes, several machine learning tools have been published for predicting polyA sites from sequence data. These tools either consider limited sequence features or use relatively old algorithms for polyA site prediction. Moreover, none of the previous tools consider RNA secondary structures as a feature to predict polyA sites. RESULTS In this paper, we propose a new deep learning model, called DeepPASTA, for predicting polyA sites from both sequence and RNA secondary structure data. The model is then extended to predict tissue-specific polyA sites. Moreover, the tool can predict the most dominant (i.e. frequently used) polyA site of a gene in a specific tissue and relative dominance when two polyA sites of the same gene are given. Our extensive experiments demonstrate that DeepPASTA signisficantly outperforms the existing tools for polyA site prediction and tissue-specific relative and absolute dominant polyA site prediction. AVAILABILITY AND IMPLEMENTATION https://github.com/arefeen/DeepPASTA. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Ashraful Arefeen
- Department of Computer Science and Engineering, University of California, Riverside, CA 92521, USA
| | - Xinshu Xiao
- Department of Integrative Biology and Physiology, University of California, Los Angeles, CA 90095, USA
| | - Tao Jiang
- Department of Computer Science and Engineering, University of California, Riverside, CA 92521, USA.,Institute of Integrative Genome Biology, University of California, Riverside, CA 92521, USA.,Bioinformatics Division, BNRIST, Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China
| |
Collapse
|
7
|
Leung MKK, Delong A, Frey BJ. Inference of the human polyadenylation code. Bioinformatics 2019; 34:2889-2898. [PMID: 29648582 PMCID: PMC6129302 DOI: 10.1093/bioinformatics/bty211] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2017] [Accepted: 04/09/2018] [Indexed: 01/02/2023] Open
Abstract
Motivation Processing of transcripts at the 3′-end involves cleavage at a polyadenylation site followed by the addition of a poly(A)-tail. By selecting which site is cleaved, the process of alternative polyadenylation enables genes to produce transcript isoforms with different 3′-ends. To facilitate the identification and treatment of disease-causing mutations that affect polyadenylation and to understand the sequence determinants underlying this regulatory process, a computational model that can accurately predict polyadenylation patterns from genomic features is desirable. Results Previous works have focused on identifying candidate polyadenylation sites and classifying tissue-specific sites. By training on how multiple sites in genes are competitively selected for polyadenylation from 3′-end sequencing data, we developed a deep learning model that can predict the tissue-specific strength of a polyadenylation site in the 3′ untranslated region of the human genome given only its genomic sequence. We demonstrate the model’s broad utility on multiple tasks, without any application-specific training. The model can be used to predict which polyadenylation site is more likely to be selected in genes with multiple sites. It can be used to scan the 3′ untranslated region to find candidate polyadenylation sites. It can be used to classify the pathogenicity of variants near annotated polyadenylation sites in ClinVar. It can also be used to anticipate the effect of antisense oligonucleotide experiments to redirect polyadenylation. We provide analysis on how different features affect the model’s predictive performance and a method to identify sensitive regions of the genome at the single-based resolution that can affect polyadenylation regulation. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Michael K K Leung
- Department of Electrical and Computer Engineering, University of Toronto, Toronto, Canada.,Deep Genomics, MaRS Centre, Toronto, Canada
| | - Andrew Delong
- Department of Electrical and Computer Engineering, University of Toronto, Toronto, Canada.,Deep Genomics, MaRS Centre, Toronto, Canada
| | - Brendan J Frey
- Department of Electrical and Computer Engineering, University of Toronto, Toronto, Canada.,Deep Genomics, MaRS Centre, Toronto, Canada.,Banting and Best Department of Medical Research, University of Toronto, Toronto, Canada
| |
Collapse
|
8
|
Majerciak V, Yang W, Zheng J, Zhu J, Zheng ZM. A Genome-Wide Epstein-Barr Virus Polyadenylation Map and Its Antisense RNA to EBNA. J Virol 2019; 93:e01593-18. [PMID: 30355690 PMCID: PMC6321932 DOI: 10.1128/jvi.01593-18] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2018] [Accepted: 10/17/2018] [Indexed: 12/14/2022] Open
Abstract
Epstein-Barr virus (EBV) is a ubiquitous human pathogen associated with Burkitt's lymphoma and nasopharyngeal carcinoma. Although the EBV genome harbors more than a hundred genes, a full transcription map with EBV polyadenylation profiles remains unknown. To elucidate the 3' ends of all EBV transcripts genome-wide, we performed the first comprehensive analysis of viral polyadenylation sites (pA sites) using our previously reported polyadenylation sequencing (PA-seq) technology. We identified that EBV utilizes a total of 62 pA sites in JSC-1, 60 in Raji, and 53 in Akata cells for the expression of EBV genes from both plus and minus DNA strands; 42 of these pA sites are commonly used in all three cell lines. The majority of identified pA sites were mapped to the intergenic regions downstream of previously annotated EBV open reading frames (ORFs) and viral promoters. pA sites lacking an association with any known EBV genes were also identified, mostly for the minus DNA strand within the EBNA locus, a major locus responsible for maintenance of viral latency and cell transformation. The expression of these novel antisense transcripts to EBNA were verified by 3' rapid amplification of cDNA ends (RACE) and Northern blot analyses in several EBV-positive (EBV+) cell lines. In contrast to EBNA RNA expressed during latency, expression of EBNA-antisense transcripts, which is restricted in latent cells, can be significantly induced by viral lytic infection, suggesting potential regulation of viral gene expression by EBNA-antisense transcription during lytic EBV infection. Our data provide the first evidence that EBV has an unrecognized mechanism that regulates EBV reactivation from latency.IMPORTANCE Epstein-Barr virus represents an important human pathogen with an etiological role in the development of several cancers. By elucidation of a genome-wide polyadenylation landscape of EBV in JSC-1, Raji, and Akata cells, we have redefined the EBV transcriptome and mapped individual polymerase II (Pol II) transcripts of viral genes to each one of the mapped pA sites at single-nucleotide resolution as well as the depth of expression. By unveiling a new class of viral lytic RNA transcripts antisense to latent EBNAs, we provide a novel mechanism of how EBV might control the expression of viral latent genes and lytic infection. Thus, this report takes another step closer to understanding EBV gene structure and expression and paves a new path for antiviral approaches.
Collapse
Affiliation(s)
- Vladimir Majerciak
- Tumor Virus RNA Biology Section, RNA Biology Laboratory, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Frederick, Maryland, USA
| | - Wenjing Yang
- Systems Biology Center, National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, Maryland, USA
| | - Jing Zheng
- Systems Biology Center, National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, Maryland, USA
| | - Jun Zhu
- Systems Biology Center, National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, Maryland, USA
| | - Zhi-Ming Zheng
- Tumor Virus RNA Biology Section, RNA Biology Laboratory, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Frederick, Maryland, USA
| |
Collapse
|
9
|
Ha KCH, Blencowe BJ, Morris Q. QAPA: a new method for the systematic analysis of alternative polyadenylation from RNA-seq data. Genome Biol 2018; 19:45. [PMID: 29592814 PMCID: PMC5874996 DOI: 10.1186/s13059-018-1414-4] [Citation(s) in RCA: 115] [Impact Index Per Article: 19.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2017] [Accepted: 02/28/2018] [Indexed: 12/21/2022] Open
Abstract
Alternative polyadenylation (APA) affects most mammalian genes. The genome-wide investigation of APA has been hampered by an inability to reliably profile it using conventional RNA-seq. We describe 'Quantification of APA' (QAPA), a method that infers APA from conventional RNA-seq data. QAPA is faster and more sensitive than other methods. Application of QAPA reveals discrete, temporally coordinated APA programs during neurogenesis and that there is little overlap between genes regulated by alternative splicing and those by APA. Modeling of these data uncovers an APA sequence code. QAPA thus enables the discovery and characterization of programs of regulated APA using conventional RNA-seq.
Collapse
Affiliation(s)
- Kevin C H Ha
- Department of Molecular Genetics, University of Toronto, 1 King's College Circle, Toronto, ON, M5A 1A8, Canada.,Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, 160 College Street, Toronto, ON, M5S 3E1, Canada
| | - Benjamin J Blencowe
- Department of Molecular Genetics, University of Toronto, 1 King's College Circle, Toronto, ON, M5A 1A8, Canada. .,Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, 160 College Street, Toronto, ON, M5S 3E1, Canada.
| | - Quaid Morris
- Department of Molecular Genetics, University of Toronto, 1 King's College Circle, Toronto, ON, M5A 1A8, Canada. .,Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, 160 College Street, Toronto, ON, M5S 3E1, Canada. .,Department of Computer Science, University of Toronto, 10 King's College Road, Toronto, ON, M5S 3G4, Canada. .,Vector Institute, 661 University Avenue, Toronto, ON, M5G 1M1, Canada.
| |
Collapse
|
10
|
Alternative Polyadenylation: Methods, Findings, and Impacts. GENOMICS PROTEOMICS & BIOINFORMATICS 2017; 15:287-300. [PMID: 29031844 PMCID: PMC5673674 DOI: 10.1016/j.gpb.2017.06.001] [Citation(s) in RCA: 68] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/24/2017] [Revised: 06/01/2017] [Accepted: 06/03/2017] [Indexed: 12/21/2022]
Abstract
Alternative polyadenylation (APA), a phenomenon that RNA molecules with different 3' ends originate from distinct polyadenylation sites of a single gene, is emerging as a mechanism widely used to regulate gene expression. In the present review, we first summarized various methods prevalently adopted in APA study, mainly focused on the next-generation sequencing (NGS)-based techniques specially designed for APA identification, the related bioinformatics methods, and the strategies for APA study in single cells. Then we summarized the main findings and advances so far based on these methods, including the preferences of alternative polyA (pA) site, the biological processes involved, and the corresponding consequences. We especially categorized the APA changes discovered so far and discussed their potential functions under given conditions, along with the possible underlying molecular mechanisms. With more in-depth studies on extensive samples, more signatures and functions of APA will be revealed, and its diverse roles will gradually heave in sight.
Collapse
|
11
|
Hu W, Li S, Park JY, Boppana S, Ni T, Li M, Zhu J, Tian B, Xie Z, Xiang M. Dynamic landscape of alternative polyadenylation during retinal development. Cell Mol Life Sci 2016; 74:1721-1739. [PMID: 27990575 DOI: 10.1007/s00018-016-2429-1] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2016] [Revised: 11/24/2016] [Accepted: 12/01/2016] [Indexed: 10/20/2022]
Abstract
The development of the central nervous system (CNS) is a complex process that must be exquisitely controlled at multiple levels to ensure the production of appropriate types and quantity of neurons. RNA alternative polyadenylation (APA) contributes to transcriptome diversity and gene regulation, and has recently been shown to be widespread in the CNS. However, the previous studies have been primarily focused on the tissue specificity of APA and developmental APA change of whole model organisms; a systematic survey of APA usage is lacking during CNS development. Here, we conducted global analysis of APA during mouse retinal development, and identified stage-specific polyadenylation (pA) sites that are enriched for genes critical for retinal development and visual perception. Moreover, we demonstrated 3'UTR (untranslated region) lengthening and increased usage of intronic pA sites over development that would result in gaining many different RBP (RNA-binding protein) and miRNA target sites. Furthermore, we showed that a considerable number of polyadenylated lncRNAs are co-expressed with protein-coding genes involved in retinal development and functions. Together, our data indicate that APA is highly and dynamically regulated during retinal development and maturation, suggesting that APA may serve as a crucial mechanism of gene regulation underlying the delicate process of CNS development.
Collapse
Affiliation(s)
- Wenyan Hu
- State Key Laboratory of Ophthalmology, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangzhou, 500040, China
| | - Shengguo Li
- Center for Advanced Biotechnology and Medicine and Department of Pediatrics, Rutgers University-Robert Wood Johnson Medical School, 679 Hoes Lane West, Piscataway, NJ, 08854, USA
| | - Ji Yeon Park
- Department of Microbiology, Biochemistry and Molecular Genetics, Rutgers New Jersey Medical School, Newark, NJ, 07101, USA
| | - Sridhar Boppana
- Center for Advanced Biotechnology and Medicine and Department of Pediatrics, Rutgers University-Robert Wood Johnson Medical School, 679 Hoes Lane West, Piscataway, NJ, 08854, USA
| | - Ting Ni
- State Key Laboratory of Genetic Engineering and MOE Key Laboratory of Contemporary Anthropology, Collaborative Innovation Center of Genetics and Development, School of Life Sciences, Fudan University, Shanghai, 200438, China
| | - Miaoxin Li
- Department of Medical Genetics, Center for Genome Research, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, China
| | - Jun Zhu
- Systems Biology Center, National Heart, Lung and Blood Institute, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Bin Tian
- Department of Microbiology, Biochemistry and Molecular Genetics, Rutgers New Jersey Medical School, Newark, NJ, 07101, USA
| | - Zhi Xie
- State Key Laboratory of Ophthalmology, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangzhou, 500040, China.
| | - Mengqing Xiang
- State Key Laboratory of Ophthalmology, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangzhou, 500040, China. .,Center for Advanced Biotechnology and Medicine and Department of Pediatrics, Rutgers University-Robert Wood Johnson Medical School, 679 Hoes Lane West, Piscataway, NJ, 08854, USA.
| |
Collapse
|
12
|
Weng L, Li Y, Xie X, Shi Y. Poly(A) code analyses reveal key determinants for tissue-specific mRNA alternative polyadenylation. RNA (NEW YORK, N.Y.) 2016; 22:813-21. [PMID: 27095026 PMCID: PMC4878608 DOI: 10.1261/rna.055681.115] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/17/2015] [Accepted: 02/22/2016] [Indexed: 05/23/2023]
Abstract
mRNA alternative polyadenylation (APA) is a critical mechanism for post-transcriptional gene regulation and is often regulated in a tissue- and/or developmental stage-specific manner. An ultimate goal for the APA field has been to be able to computationally predict APA profiles under different physiological or pathological conditions. As a first step toward this goal, we have assembled a poly(A) code for predicting tissue-specific poly(A) sites (PASs). Based on a compendium of over 600 features that have known or potential roles in PAS selection, we have generated and refined a machine-learning algorithm using multiple high-throughput sequencing-based data sets of tissue-specific and constitutive PASs. This code can predict tissue-specific PASs with >85% accuracy. Importantly, by analyzing the prediction performance based on different RNA features, we found that PAS context, including the distance between alternative PASs and the relative position of a PAS within the gene, is a key feature for determining the susceptibility of a PAS to tissue-specific regulation. Our poly(A) code provides a useful tool for not only predicting tissue-specific APA regulation, but also for studying its underlying molecular mechanisms.
Collapse
Affiliation(s)
- Lingjie Weng
- Department of Microbiology and Molecular Genetics, School of Medicine, University of California, Irvine, Irvine, California 92697, USA Institute for Genomics and Bioinformatics, University of California, Irvine, Irvine, California 92697, USA Department of Computer Science, University of California, Irvine, Irvine, California 92697, USA
| | - Yi Li
- Institute for Genomics and Bioinformatics, University of California, Irvine, Irvine, California 92697, USA Department of Computer Science, University of California, Irvine, Irvine, California 92697, USA
| | - Xiaohui Xie
- Institute for Genomics and Bioinformatics, University of California, Irvine, Irvine, California 92697, USA Department of Computer Science, University of California, Irvine, Irvine, California 92697, USA
| | - Yongsheng Shi
- Department of Microbiology and Molecular Genetics, School of Medicine, University of California, Irvine, Irvine, California 92697, USA
| |
Collapse
|
13
|
Ni T, Majerciak V, Zheng ZM, Zhu J. PA-seq for Global Identification of RNA Polyadenylation Sites of Kaposi's Sarcoma-Associated Herpesvirus Transcripts. ACTA ACUST UNITED AC 2016; 41:14E.7.1-14E.7.18. [PMID: 27153384 DOI: 10.1002/cpmc.1] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
Kaposi's sarcoma-associated herpesvirus (KSHV) is a human oncovirus linked to the development of several malignancies in immunocompromised patients. Like other herpesviruses, KSHV has a large DNA genome encoding more than 100 distinct gene products. Despite being transcribed and processed by cellular machinery, the structure and organization of KSHV genes in the virus genome differ from what is observed in cellular genes from the human genome. A typical feature of KSHV expression is the production of polycistronic transcripts initiated from different promoters but sharing the same polyadenylation site (pA site). This represents a challenge in determination of the 3' end of individual viral transcripts. Such information is critical for generation of a virus transcriptional map for genetic studies. Here we present PA-seq, a high-throughput method for genome-wide analysis of pA sites of KSHV transcripts in B lymphocytes with latent or lytic KSHV infection. Besides identification of all viral pA sites, PA-seq also provides quantitative information about the levels of viral transcripts associated with each pA site, making it possible to determine the relative expression levels of viral genes at various stages of infection. Due to the indiscriminate nature of PA-seq, the pA sites of host transcripts are also concurrently mapped in the testing samples. Therefore, this technology can simultaneously estimate the expression changes of host genes and RNA polyadenylation upon KSHV infection. © 2016 by John Wiley & Sons, Inc.
Collapse
Affiliation(s)
- Ting Ni
- Ministry of Education (MOE) Key Laboratory of Contemporary Anthropology and State Key Laboratory of Genetic Engineering, Collaborative Innovation Center of Genetics and Development, School of Life Sciences, Fudan University, Shanghai, People's Republic of China.,These authors should be considered co-first authors
| | - Vladimir Majerciak
- Tumor Virus RNA Biology Section, Gene Regulation and Chromosome Biology Laboratory, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Frederick, Maryland.,These authors should be considered co-first authors
| | - Zhi-Ming Zheng
- Tumor Virus RNA Biology Section, Gene Regulation and Chromosome Biology Laboratory, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Frederick, Maryland
| | - Jun Zhu
- Systems Biology Center, National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, Maryland.,Corresponding author
| |
Collapse
|
14
|
Testis-specific products of the Drosophila melanogaster sbr gene, encoding nuclear export factor 1, are necessary for male fertility. Gene 2015; 577:153-60. [PMID: 26621383 DOI: 10.1016/j.gene.2015.11.030] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2015] [Revised: 11/18/2015] [Accepted: 11/21/2015] [Indexed: 01/08/2023]
Abstract
The evolutionarily conserved nuclear export factor 1 (NXF1) provides mRNA export from the nucleus to the cytoplasm. We described several testis-specific transcripts of the Drosophila melanogaster nxf1 gene designated “sbr” in this species via different PCR approaches and CAGE-seq analysis. Characteristically, most of them have truncated 3′UTRs compared with those in other organs. In addition to regular transcripts, there are shorter transcripts that begin in intron 3 of the sbr gene. These short, 5′-truncated testis-specific transcripts vary in terms of transcription start site and their ability to exclude or retain the last 237 nucleotides of intron 3 in their 5′UTR. Using an anti-SBR antibody against the C-terminal portion of this protein, we detected the major SBR protein (74 kDa) in all analyzed organs of the fly as well as a new smaller protein (60 kDa) found only in the testes. This protein corresponds to the detected sbr transcripts that start in intron 3, based on its molecular mass. We investigated the sbr12 allele of the sbr gene, which is lethal in homozygous females and causes dominant sterility in heterozygous males. Sequencing of the sbr12 gene allele revealed a 30-bp deletion in exon 9 without a frame shift.Western blot analysiswith an SBR-specific antibody revealed two bands of the expected size in the testes of heterozygous males. Thus, a mutant protein along with the normal protein presents in the testes of lethal allele-bearing flies and the described shorter testis-specific variant of SBR may account for male sterility.
Collapse
|
15
|
Yu Y, Fuscoe JC, Zhao C, Guo C, Jia M, Qing T, Bannon DI, Lancashire L, Bao W, Du T, Luo H, Su Z, Jones WD, Moland CL, Branham WS, Qian F, Ning B, Li Y, Hong H, Guo L, Mei N, Shi T, Wang KY, Wolfinger RD, Nikolsky Y, Walker SJ, Duerksen-Hughes P, Mason CE, Tong W, Thierry-Mieg J, Thierry-Mieg D, Shi L, Wang C. A rat RNA-Seq transcriptomic BodyMap across 11 organs and 4 developmental stages. Nat Commun 2015; 5:3230. [PMID: 24510058 PMCID: PMC3926002 DOI: 10.1038/ncomms4230] [Citation(s) in RCA: 265] [Impact Index Per Article: 29.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2013] [Accepted: 01/10/2014] [Indexed: 02/07/2023] Open
Abstract
The rat has been used extensively as a model for evaluating chemical toxicities and for understanding drug mechanisms. However, its transcriptome across multiple organs, or developmental stages, has not yet been reported. Here we show, as part of the SEQC consortium efforts, a comprehensive rat transcriptomic BodyMap created by performing RNA-Seq on 320 samples from 11 organs of both sexes of juvenile, adolescent, adult and aged Fischer 344 rats. We catalogue the expression profiles of 40,064 genes, 65,167 transcripts, 31,909 alternatively spliced transcript variants and 2,367 non-coding genes/non-coding RNAs (ncRNAs) annotated in AceView. We find that organ-enriched, differentially expressed genes reflect the known organ-specific biological activities. A large number of transcripts show organ-specific, age-dependent or sex-specific differential expression patterns. We create a web-based, open-access rat BodyMap database of expression profiles with crosslinks to other widely used databases, anticipating that it will serve as a primary resource for biomedical research using the rat model. Gene expression is highly variable between tissues, and changes during development and with age. Here, the authors provide a comprehensive RNA-Seq analysis of the rat transcriptome, spanning eleven organs, four developmental stages and both sexes.
Collapse
Affiliation(s)
- Ying Yu
- 1] Center for Pharmacogenomics, State Key Laboratory of Genetic Engineering and MOE Key Laboratory of Contemporary Anthropology, Schools of Life Sciences and Pharmacy, Fudan University, Shanghai 201203, China [2]
| | - James C Fuscoe
- 1] National Center for Toxicological Research, Food and Drug Administration, Jefferson, Arkansas 92079, USA [2]
| | - Chen Zhao
- Center for Pharmacogenomics, State Key Laboratory of Genetic Engineering and MOE Key Laboratory of Contemporary Anthropology, Schools of Life Sciences and Pharmacy, Fudan University, Shanghai 201203, China
| | - Chao Guo
- Functional Genomics Core, Beckman Research Institute, City of Hope, Duarte, California 91010, USA
| | - Meiwen Jia
- Center for Pharmacogenomics, State Key Laboratory of Genetic Engineering and MOE Key Laboratory of Contemporary Anthropology, Schools of Life Sciences and Pharmacy, Fudan University, Shanghai 201203, China
| | - Tao Qing
- Center for Pharmacogenomics, State Key Laboratory of Genetic Engineering and MOE Key Laboratory of Contemporary Anthropology, Schools of Life Sciences and Pharmacy, Fudan University, Shanghai 201203, China
| | - Desmond I Bannon
- Army Institute of Public Health, U.S. Army Public Health Command, Aberdeen Proving Ground, Maryland 21010, USA
| | - Lee Lancashire
- Computation Biology and Bioinformatics, IP & Science, Thomson Reuters, London EC1N 8JS, UK
| | - Wenjun Bao
- SAS Institute Inc., Cary, North Carolina 27513, USA
| | - Tingting Du
- Center for Pharmacogenomics, State Key Laboratory of Genetic Engineering and MOE Key Laboratory of Contemporary Anthropology, Schools of Life Sciences and Pharmacy, Fudan University, Shanghai 201203, China
| | - Heng Luo
- Center for Pharmacogenomics, State Key Laboratory of Genetic Engineering and MOE Key Laboratory of Contemporary Anthropology, Schools of Life Sciences and Pharmacy, Fudan University, Shanghai 201203, China
| | - Zhenqiang Su
- National Center for Toxicological Research, Food and Drug Administration, Jefferson, Arkansas 92079, USA
| | | | - Carrie L Moland
- National Center for Toxicological Research, Food and Drug Administration, Jefferson, Arkansas 92079, USA
| | - William S Branham
- National Center for Toxicological Research, Food and Drug Administration, Jefferson, Arkansas 92079, USA
| | - Feng Qian
- National Center for Toxicological Research, Food and Drug Administration, Jefferson, Arkansas 92079, USA
| | - Baitang Ning
- National Center for Toxicological Research, Food and Drug Administration, Jefferson, Arkansas 92079, USA
| | - Yan Li
- National Center for Toxicological Research, Food and Drug Administration, Jefferson, Arkansas 92079, USA
| | - Huixiao Hong
- National Center for Toxicological Research, Food and Drug Administration, Jefferson, Arkansas 92079, USA
| | - Lei Guo
- National Center for Toxicological Research, Food and Drug Administration, Jefferson, Arkansas 92079, USA
| | - Nan Mei
- National Center for Toxicological Research, Food and Drug Administration, Jefferson, Arkansas 92079, USA
| | - Tieliu Shi
- The Center for Bioinformatics and The Institute of Biomedical Sciences, College of Life Science, Shanghai 200241, China
| | - Kevin Y Wang
- Johns Hopkins University School of Medicine, Baltimore, Maryland 21205, USA
| | | | - Yuri Nikolsky
- Computation Biology and Bioinformatics, IP & Science, Thomson Reuters, London EC1N 8JS, UK
| | - Stephen J Walker
- Wake Forest Institute for Regenerative Medicine, Wake Forest University Health Sciences, Winston-Salem, North Carolina 27157, USA
| | - Penelope Duerksen-Hughes
- Department of Basic Sciences, School of Medicine, Loma Linda University, Loma Linda, California 92350, USA
| | - Christopher E Mason
- Department of Physiology & Biophysics and the Institute for Computational Biomedicine, Cornell University, New York, New York 10021, USA
| | - Weida Tong
- National Center for Toxicological Research, Food and Drug Administration, Jefferson, Arkansas 92079, USA
| | - Jean Thierry-Mieg
- National Center for Biotechnology Information, National Institutes of Health, Bethesda, Maryland 20894, USA
| | - Danielle Thierry-Mieg
- National Center for Biotechnology Information, National Institutes of Health, Bethesda, Maryland 20894, USA
| | - Leming Shi
- 1] Center for Pharmacogenomics, State Key Laboratory of Genetic Engineering and MOE Key Laboratory of Contemporary Anthropology, Schools of Life Sciences and Pharmacy, Fudan University, Shanghai 201203, China [2] National Center for Toxicological Research, Food and Drug Administration, Jefferson, Arkansas 92079, USA [3] Fudan-Zhangjiang Center for Clinical Genomics and Zhangjiang Center for Translational Medicine, Shanghai 201203, China
| | - Charles Wang
- Center for Genomics and Division of Microbiology & Molecular Genetics, School of Medicine, Loma Linda University, Loma Linda, California 92350, USA
| |
Collapse
|
16
|
de Klerk E, 't Hoen PAC. Alternative mRNA transcription, processing, and translation: insights from RNA sequencing. Trends Genet 2015; 31:128-39. [PMID: 25648499 DOI: 10.1016/j.tig.2015.01.001] [Citation(s) in RCA: 226] [Impact Index Per Article: 25.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2014] [Revised: 12/22/2014] [Accepted: 01/05/2015] [Indexed: 12/13/2022]
Abstract
The human transcriptome comprises >80,000 protein-coding transcripts and the estimated number of proteins synthesized from these transcripts is in the range of 250,000 to 1 million. These transcripts and proteins are encoded by less than 20,000 genes, suggesting extensive regulation at the transcriptional, post-transcriptional, and translational level. Here we review how RNA sequencing (RNA-seq) technologies have increased our understanding of the mechanisms that give rise to alternative transcripts and their alternative translation. We highlight four different regulatory processes: alternative transcription initiation, alternative splicing, alternative polyadenylation, and alternative translation initiation. We discuss their transcriptome-wide distribution, their impact on protein expression, their biological relevance, and the possible molecular mechanisms leading to their alternative regulation. We conclude with a discussion of the coordination and the interdependence of these four regulatory layers.
Collapse
Affiliation(s)
- Eleonora de Klerk
- Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands
| | - Peter A C 't Hoen
- Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands.
| |
Collapse
|
17
|
Birol I, Raymond A, Chiu R, Nip KM, Jackman SD, Kreitzman M, Docking TR, Ennis CA, Robertson AG, Karsan A. Kleat: cleavage site analysis of transcriptomes. PACIFIC SYMPOSIUM ON BIOCOMPUTING. PACIFIC SYMPOSIUM ON BIOCOMPUTING 2015:347-358. [PMID: 25592595 PMCID: PMC4350765] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
In eukaryotic cells, alternative cleavage of 3' untranslated regions (UTRs) can affect transcript stability, transport and translation. For polyadenylated (poly(A)) transcripts, cleavage sites can be characterized with short-read sequencing using specialized library construction methods. However, for large-scale cohort studies as well as for clinical sequencing applications, it is desirable to characterize such events using RNA-seq data, as the latter are already widely applied to identify other relevant information, such as mutations, alternative splicing and chimeric transcripts. Here we describe KLEAT, an analysis tool that uses de novo assembly of RNA-seq data to characterize cleavage sites on 3' UTRs. We demonstrate the performance of KLEAT on three cell line RNA-seq libraries constructed and sequenced by the ENCODE project, and assembled using Trans-ABySS. Validating the KLEAT predictions with matched ENCODE RNA-seq and RNA-PET libraries, we show that the tool has over 90% positive predictive value when there are at least three RNA-seq reads supporting a poly(A) tail and requiring at least three RNA-PET reads mapping within 100 nucleotides as validation. We also compare the performance of KLEAT with other popular RNA-seq analysis pipelines that reconstruct 3' UTR ends, and show that it performs favourably, based on an ROC-like curve.
Collapse
Affiliation(s)
- Inanç Birol
- Canada's Michael Smith Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, BC, V5Z 4S6, Canada.
| | | | | | | | | | | | | | | | | | | |
Collapse
|
18
|
An improved poly(A) motifs recognition method based on decision level fusion. Comput Biol Chem 2014; 54:49-56. [PMID: 25594576 DOI: 10.1016/j.compbiolchem.2014.12.001] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2014] [Revised: 11/27/2014] [Accepted: 12/27/2014] [Indexed: 01/07/2023]
Abstract
Polyadenylation is the process of addition of poly(A) tail to mRNA 3' ends. Identification of motifs controlling polyadenylation plays an essential role in improving genome annotation accuracy and better understanding of the mechanisms governing gene regulation. The bioinformatics methods used for poly(A) motifs recognition have demonstrated that information extracted from sequences surrounding the candidate motifs can differentiate true motifs from the false ones greatly. However, these methods depend on either domain features or string kernels. To date, methods combining information from different sources have not been found yet. Here, we proposed an improved poly(A) motifs recognition method by combing different sources based on decision level fusion. First of all, two novel prediction methods was proposed based on support vector machine (SVM): one method is achieved by using the domain-specific features and principle component analysis (PCA) method to eliminate the redundancy (PCA-SVM); the other method is based on Oligo string kernel (Oligo-SVM). Then we proposed a novel machine-learning method for poly(A) motif prediction by marrying four poly(A) motifs recognition methods, including two state-of-the-art methods (Random Forest (RF) and HMM-SVM), and two novel proposed methods (PCA-SVM and Oligo-SVM). A decision level information fusion method was employed to combine the decision values of different classifiers by applying the DS evidence theory. We evaluated our method on a comprehensive poly(A) dataset that consists of 14,740 samples on 12 variants of poly(A) motifs and 2750 samples containing none of these motifs. Our method has achieved accuracy up to 86.13%. Compared with the four classifiers, our evidence theory based method reduces the average error rate by about 30%, 27%, 26% and 16%, respectively. The experimental results suggest that the proposed method is more effective for poly(A) motif recognition.
Collapse
|
19
|
Zhang S, Han J, Zhong D, Liu R, Zheng J. Genome-wide identification and predictive modeling of lincRNAs polyadenylation in cancer genome. Comput Biol Chem 2014; 52:1-8. [DOI: 10.1016/j.compbiolchem.2014.07.001] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2014] [Revised: 06/10/2014] [Accepted: 07/22/2014] [Indexed: 02/07/2023]
|
20
|
Müller S, Rycak L, Afonso-Grunz F, Winter P, Zawada AM, Damrath E, Scheider J, Schmäh J, Koch I, Kahl G, Rotter B. APADB: a database for alternative polyadenylation and microRNA regulation events. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2014; 2014:bau076. [PMID: 25052703 PMCID: PMC4105710 DOI: 10.1093/database/bau076] [Citation(s) in RCA: 68] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Abstract
Alternative polyadenylation (APA) is a widespread mechanism that contributes to the sophisticated dynamics of gene regulation. Approximately 50% of all protein-coding human genes harbor multiple polyadenylation (PA) sites; their selective and combinatorial use gives rise to transcript variants with differing length of their 3′ untranslated region (3′UTR). Shortened variants escape UTR-mediated regulation by microRNAs (miRNAs), especially in cancer, where global 3′UTR shortening accelerates disease progression, dedifferentiation and proliferation. Here we present APADB, a database of vertebrate PA sites determined by 3′ end sequencing, using massive analysis of complementary DNA ends. APADB provides (A)PA sites for coding and non-coding transcripts of human, mouse and chicken genes. For human and mouse, several tissue types, including different cancer specimens, are available. APADB records the loss of predicted miRNA binding sites and visualizes next-generation sequencing reads that support each PA site in a genome browser. The database tables can either be browsed according to organism and tissue or alternatively searched for a gene of interest. APADB is the largest database of APA in human, chicken and mouse. The stored information provides experimental evidence for thousands of PA sites and APA events. APADB combines 3′ end sequencing data with prediction algorithms of miRNA binding sites, allowing to further improve prediction algorithms. Current databases lack correct information about 3′UTR lengths, especially for chicken, and APADB provides necessary information to close this gap. Database URL:http://tools.genxpro.net/apadb/
Collapse
Affiliation(s)
- Sören Müller
- Plant Molecular Biology, Molecular BioSciences, University of Frankfurt am Main, Marie-Curie-Street 9, D-60439 Frankfurt, Germany, GenXPro GmbH, Frankfurt Innovation Center Biotechnology, Altenhöferallee 3, D-60438 Frankfurt, Germany, Molecular Bioinformatics Group, Faculty of Computer Science and Mathematics, Cluster of Excellence Frankfurt "Macromolecular Complexes", Institute of Computer Science, Robert-Mayer-Strasse 11-15, D-60325 Frankfurt am Main, Germany, Department of Internal Medicine IV; Saarland University Medical Center, Kirrberger Strasse, D-66421 Homburg/Saar, Germany, Experimental Neurology, Department of Neurology, Goethe University Medical School, Heinrich, Hoffmann Strasse 7, D-60528 Frankfurt am Main, Germany, Institute for Ecology, Evolution and Diversity, Aquatic Ecotoxicology, University of Frankfurt am Main, Max-von-Laue-Str. 13, D-60438 Frankfurt, Germany and Department of Pediatrics, University Hospital Schleswig-Holstein, Schwanenweg 20, D-24105 Kiel, GermanyPlant Molecular Biology, Molecular BioSciences, University of Frankfurt am Main, Marie-Curie-Street 9, D-60439 Frankfurt, Germany, GenXPro GmbH, Frankfurt Innovation Center Biotechnology, Altenhöferallee 3, D-60438 Frankfurt, Germany, Molecular Bioinformatics Group, Faculty of Computer Science and Mathematics, Cluster of Excellence Frankfurt "Macromolecular Complexes", Institute of Computer Science, Robert-Mayer-Strasse 11-15, D-60325 Frankfurt am Main, Germany, Department of Internal Medicine IV; Saarland University Medical Center, Kirrberger Strasse, D-66421 Homburg/Saar, Germany, Experimental Neurology, Department of Neurology, Goethe University Medical School, Heinrich, Hoffmann Strasse 7, D-60528 Frankfurt am Main, Germany, Institute for Ecology, Evolution and Diversity, Aquatic Ecotoxicology, University of Frankfurt am Main, Max-von-Laue-Str. 13, D-60438 Frankfurt, Germany and Department of Pediatrics, University Hospital Schleswig-Holstein, Schwanenweg 20, D-24105 Kiel, Germany
| | - Lukas Rycak
- Plant Molecular Biology, Molecular BioSciences, University of Frankfurt am Main, Marie-Curie-Street 9, D-60439 Frankfurt, Germany, GenXPro GmbH, Frankfurt Innovation Center Biotechnology, Altenhöferallee 3, D-60438 Frankfurt, Germany, Molecular Bioinformatics Group, Faculty of Computer Science and Mathematics, Cluster of Excellence Frankfurt "Macromolecular Complexes", Institute of Computer Science, Robert-Mayer-Strasse 11-15, D-60325 Frankfurt am Main, Germany, Department of Internal Medicine IV; Saarland University Medical Center, Kirrberger Strasse, D-66421 Homburg/Saar, Germany, Experimental Neurology, Department of Neurology, Goethe University Medical School, Heinrich, Hoffmann Strasse 7, D-60528 Frankfurt am Main, Germany, Institute for Ecology, Evolution and Diversity, Aquatic Ecotoxicology, University of Frankfurt am Main, Max-von-Laue-Str. 13, D-60438 Frankfurt, Germany and Department of Pediatrics, University Hospital Schleswig-Holstein, Schwanenweg 20, D-24105 Kiel, Germany
| | - Fabian Afonso-Grunz
- Plant Molecular Biology, Molecular BioSciences, University of Frankfurt am Main, Marie-Curie-Street 9, D-60439 Frankfurt, Germany, GenXPro GmbH, Frankfurt Innovation Center Biotechnology, Altenhöferallee 3, D-60438 Frankfurt, Germany, Molecular Bioinformatics Group, Faculty of Computer Science and Mathematics, Cluster of Excellence Frankfurt "Macromolecular Complexes", Institute of Computer Science, Robert-Mayer-Strasse 11-15, D-60325 Frankfurt am Main, Germany, Department of Internal Medicine IV; Saarland University Medical Center, Kirrberger Strasse, D-66421 Homburg/Saar, Germany, Experimental Neurology, Department of Neurology, Goethe University Medical School, Heinrich, Hoffmann Strasse 7, D-60528 Frankfurt am Main, Germany, Institute for Ecology, Evolution and Diversity, Aquatic Ecotoxicology, University of Frankfurt am Main, Max-von-Laue-Str. 13, D-60438 Frankfurt, Germany and Department of Pediatrics, University Hospital Schleswig-Holstein, Schwanenweg 20, D-24105 Kiel, GermanyPlant Molecular Biology, Molecular BioSciences, University of Frankfurt am Main, Marie-Curie-Street 9, D-60439 Frankfurt, Germany, GenXPro GmbH, Frankfurt Innovation Center Biotechnology, Altenhöferallee 3, D-60438 Frankfurt, Germany, Molecular Bioinformatics Group, Faculty of Computer Science and Mathematics, Cluster of Excellence Frankfurt "Macromolecular Complexes", Institute of Computer Science, Robert-Mayer-Strasse 11-15, D-60325 Frankfurt am Main, Germany, Department of Internal Medicine IV; Saarland University Medical Center, Kirrberger Strasse, D-66421 Homburg/Saar, Germany, Experimental Neurology, Department of Neurology, Goethe University Medical School, Heinrich, Hoffmann Strasse 7, D-60528 Frankfurt am Main, Germany, Institute for Ecology, Evolution and Diversity, Aquatic Ecotoxicology, University of Frankfurt am Main, Max-von-Laue-Str. 13, D-60438 Frankfurt, Germany and Department of Pediatrics, University Hospital Schleswig-Holstein, Schwanenweg 20, D-24105 Kiel, Germany
| | - Peter Winter
- Plant Molecular Biology, Molecular BioSciences, University of Frankfurt am Main, Marie-Curie-Street 9, D-60439 Frankfurt, Germany, GenXPro GmbH, Frankfurt Innovation Center Biotechnology, Altenhöferallee 3, D-60438 Frankfurt, Germany, Molecular Bioinformatics Group, Faculty of Computer Science and Mathematics, Cluster of Excellence Frankfurt "Macromolecular Complexes", Institute of Computer Science, Robert-Mayer-Strasse 11-15, D-60325 Frankfurt am Main, Germany, Department of Internal Medicine IV; Saarland University Medical Center, Kirrberger Strasse, D-66421 Homburg/Saar, Germany, Experimental Neurology, Department of Neurology, Goethe University Medical School, Heinrich, Hoffmann Strasse 7, D-60528 Frankfurt am Main, Germany, Institute for Ecology, Evolution and Diversity, Aquatic Ecotoxicology, University of Frankfurt am Main, Max-von-Laue-Str. 13, D-60438 Frankfurt, Germany and Department of Pediatrics, University Hospital Schleswig-Holstein, Schwanenweg 20, D-24105 Kiel, Germany
| | - Adam M Zawada
- Plant Molecular Biology, Molecular BioSciences, University of Frankfurt am Main, Marie-Curie-Street 9, D-60439 Frankfurt, Germany, GenXPro GmbH, Frankfurt Innovation Center Biotechnology, Altenhöferallee 3, D-60438 Frankfurt, Germany, Molecular Bioinformatics Group, Faculty of Computer Science and Mathematics, Cluster of Excellence Frankfurt "Macromolecular Complexes", Institute of Computer Science, Robert-Mayer-Strasse 11-15, D-60325 Frankfurt am Main, Germany, Department of Internal Medicine IV; Saarland University Medical Center, Kirrberger Strasse, D-66421 Homburg/Saar, Germany, Experimental Neurology, Department of Neurology, Goethe University Medical School, Heinrich, Hoffmann Strasse 7, D-60528 Frankfurt am Main, Germany, Institute for Ecology, Evolution and Diversity, Aquatic Ecotoxicology, University of Frankfurt am Main, Max-von-Laue-Str. 13, D-60438 Frankfurt, Germany and Department of Pediatrics, University Hospital Schleswig-Holstein, Schwanenweg 20, D-24105 Kiel, Germany
| | - Ewa Damrath
- Plant Molecular Biology, Molecular BioSciences, University of Frankfurt am Main, Marie-Curie-Street 9, D-60439 Frankfurt, Germany, GenXPro GmbH, Frankfurt Innovation Center Biotechnology, Altenhöferallee 3, D-60438 Frankfurt, Germany, Molecular Bioinformatics Group, Faculty of Computer Science and Mathematics, Cluster of Excellence Frankfurt "Macromolecular Complexes", Institute of Computer Science, Robert-Mayer-Strasse 11-15, D-60325 Frankfurt am Main, Germany, Department of Internal Medicine IV; Saarland University Medical Center, Kirrberger Strasse, D-66421 Homburg/Saar, Germany, Experimental Neurology, Department of Neurology, Goethe University Medical School, Heinrich, Hoffmann Strasse 7, D-60528 Frankfurt am Main, Germany, Institute for Ecology, Evolution and Diversity, Aquatic Ecotoxicology, University of Frankfurt am Main, Max-von-Laue-Str. 13, D-60438 Frankfurt, Germany and Department of Pediatrics, University Hospital Schleswig-Holstein, Schwanenweg 20, D-24105 Kiel, Germany
| | - Jessica Scheider
- Plant Molecular Biology, Molecular BioSciences, University of Frankfurt am Main, Marie-Curie-Street 9, D-60439 Frankfurt, Germany, GenXPro GmbH, Frankfurt Innovation Center Biotechnology, Altenhöferallee 3, D-60438 Frankfurt, Germany, Molecular Bioinformatics Group, Faculty of Computer Science and Mathematics, Cluster of Excellence Frankfurt "Macromolecular Complexes", Institute of Computer Science, Robert-Mayer-Strasse 11-15, D-60325 Frankfurt am Main, Germany, Department of Internal Medicine IV; Saarland University Medical Center, Kirrberger Strasse, D-66421 Homburg/Saar, Germany, Experimental Neurology, Department of Neurology, Goethe University Medical School, Heinrich, Hoffmann Strasse 7, D-60528 Frankfurt am Main, Germany, Institute for Ecology, Evolution and Diversity, Aquatic Ecotoxicology, University of Frankfurt am Main, Max-von-Laue-Str. 13, D-60438 Frankfurt, Germany and Department of Pediatrics, University Hospital Schleswig-Holstein, Schwanenweg 20, D-24105 Kiel, Germany
| | - Juliane Schmäh
- Plant Molecular Biology, Molecular BioSciences, University of Frankfurt am Main, Marie-Curie-Street 9, D-60439 Frankfurt, Germany, GenXPro GmbH, Frankfurt Innovation Center Biotechnology, Altenhöferallee 3, D-60438 Frankfurt, Germany, Molecular Bioinformatics Group, Faculty of Computer Science and Mathematics, Cluster of Excellence Frankfurt "Macromolecular Complexes", Institute of Computer Science, Robert-Mayer-Strasse 11-15, D-60325 Frankfurt am Main, Germany, Department of Internal Medicine IV; Saarland University Medical Center, Kirrberger Strasse, D-66421 Homburg/Saar, Germany, Experimental Neurology, Department of Neurology, Goethe University Medical School, Heinrich, Hoffmann Strasse 7, D-60528 Frankfurt am Main, Germany, Institute for Ecology, Evolution and Diversity, Aquatic Ecotoxicology, University of Frankfurt am Main, Max-von-Laue-Str. 13, D-60438 Frankfurt, Germany and Department of Pediatrics, University Hospital Schleswig-Holstein, Schwanenweg 20, D-24105 Kiel, Germany
| | - Ina Koch
- Plant Molecular Biology, Molecular BioSciences, University of Frankfurt am Main, Marie-Curie-Street 9, D-60439 Frankfurt, Germany, GenXPro GmbH, Frankfurt Innovation Center Biotechnology, Altenhöferallee 3, D-60438 Frankfurt, Germany, Molecular Bioinformatics Group, Faculty of Computer Science and Mathematics, Cluster of Excellence Frankfurt "Macromolecular Complexes", Institute of Computer Science, Robert-Mayer-Strasse 11-15, D-60325 Frankfurt am Main, Germany, Department of Internal Medicine IV; Saarland University Medical Center, Kirrberger Strasse, D-66421 Homburg/Saar, Germany, Experimental Neurology, Department of Neurology, Goethe University Medical School, Heinrich, Hoffmann Strasse 7, D-60528 Frankfurt am Main, Germany, Institute for Ecology, Evolution and Diversity, Aquatic Ecotoxicology, University of Frankfurt am Main, Max-von-Laue-Str. 13, D-60438 Frankfurt, Germany and Department of Pediatrics, University Hospital Schleswig-Holstein, Schwanenweg 20, D-24105 Kiel, Germany
| | - Günter Kahl
- Plant Molecular Biology, Molecular BioSciences, University of Frankfurt am Main, Marie-Curie-Street 9, D-60439 Frankfurt, Germany, GenXPro GmbH, Frankfurt Innovation Center Biotechnology, Altenhöferallee 3, D-60438 Frankfurt, Germany, Molecular Bioinformatics Group, Faculty of Computer Science and Mathematics, Cluster of Excellence Frankfurt "Macromolecular Complexes", Institute of Computer Science, Robert-Mayer-Strasse 11-15, D-60325 Frankfurt am Main, Germany, Department of Internal Medicine IV; Saarland University Medical Center, Kirrberger Strasse, D-66421 Homburg/Saar, Germany, Experimental Neurology, Department of Neurology, Goethe University Medical School, Heinrich, Hoffmann Strasse 7, D-60528 Frankfurt am Main, Germany, Institute for Ecology, Evolution and Diversity, Aquatic Ecotoxicology, University of Frankfurt am Main, Max-von-Laue-Str. 13, D-60438 Frankfurt, Germany and Department of Pediatrics, University Hospital Schleswig-Holstein, Schwanenweg 20, D-24105 Kiel, Germany
| | - Björn Rotter
- Plant Molecular Biology, Molecular BioSciences, University of Frankfurt am Main, Marie-Curie-Street 9, D-60439 Frankfurt, Germany, GenXPro GmbH, Frankfurt Innovation Center Biotechnology, Altenhöferallee 3, D-60438 Frankfurt, Germany, Molecular Bioinformatics Group, Faculty of Computer Science and Mathematics, Cluster of Excellence Frankfurt "Macromolecular Complexes", Institute of Computer Science, Robert-Mayer-Strasse 11-15, D-60325 Frankfurt am Main, Germany, Department of Internal Medicine IV; Saarland University Medical Center, Kirrberger Strasse, D-66421 Homburg/Saar, Germany, Experimental Neurology, Department of Neurology, Goethe University Medical School, Heinrich, Hoffmann Strasse 7, D-60528 Frankfurt am Main, Germany, Institute for Ecology, Evolution and Diversity, Aquatic Ecotoxicology, University of Frankfurt am Main, Max-von-Laue-Str. 13, D-60438 Frankfurt, Germany and Department of Pediatrics, University Hospital Schleswig-Holstein, Schwanenweg 20, D-24105 Kiel, Germany
| |
Collapse
|
21
|
Ji G, Guan J, Zeng Y, Li QQ, Wu X. Genome-wide identification and predictive modeling of polyadenylation sites in eukaryotes. Brief Bioinform 2014; 16:304-13. [DOI: 10.1093/bib/bbu011] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
|
22
|
Majerciak V, Ni T, Yang W, Meng B, Zhu J, Zheng ZM. A viral genome landscape of RNA polyadenylation from KSHV latent to lytic infection. PLoS Pathog 2013; 9:e1003749. [PMID: 24244170 PMCID: PMC3828183 DOI: 10.1371/journal.ppat.1003749] [Citation(s) in RCA: 44] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2013] [Accepted: 09/20/2013] [Indexed: 11/30/2022] Open
Abstract
RNA polyadenylation (pA) is one of the major steps in regulation of gene expression at the posttranscriptional level. In this report, a genome landscape of pA sites of viral transcripts in B lymphocytes with Kaposi sarcoma-associated herpesvirus (KSHV) infection was constructed using a modified PA-seq strategy. We identified 67 unique pA sites, of which 55 could be assigned for expression of annotated ∼90 KSHV genes. Among the assigned pA sites, twenty are for expression of individual single genes and the rest for multiple genes (average 2.7 genes per pA site) in cluster-gene loci of the genome. A few novel viral pA sites that could not be assigned to any known KSHV genes are often positioned in the antisense strand to ORF8, ORF21, ORF34, K8 and ORF50, and their associated antisense mRNAs to ORF21, ORF34 and K8 could be verified by 3′RACE. The usage of each mapped pA site correlates to its peak size, the larger (broad and wide) peak size, the more usage and thus, the higher expression of the pA site-associated gene(s). Similar to mammalian transcripts, KSHV RNA polyadenylation employs two major poly(A) signals, AAUAAA and AUUAAA, and is regulated by conservation of cis-elements flanking the mapped pA sites. Moreover, we found two or more alternative pA sites downstream of ORF54, K2 (vIL6), K9 (vIRF1), K10.5 (vIRF3), K11 (vIRF2), K12 (Kaposin A), T1.5, and PAN genes and experimentally validated the alternative polyadenylation for the expression of KSHV ORF54, K11, and T1.5 transcripts. Together, our data provide not only a comprehensive pA site landscape for understanding KSHV genome structure and gene expression, but also the first evidence of alternative polyadenylation as another layer of posttranscriptional regulation in viral gene expression. A genome-wide polyadenylation landscape in the expression of human herpesviruses has not been reported. In this study, we provide the first genome landscape of viral RNA polyadenylation sites in B cells from KSHV latent to lytic infection by using a modified PA-seq protocol and selectively validated by 3′ RACE. We found that KSHV genome contains 67 active pA sites for the expression of its ∼90 genes and a few antisense transcripts. Among the mapped pA sites, a large fraction of them are for the expression of cluster genes and the production of bicistronic or polycistronic transcripts from KSHV genome and only one-third are used for the expression of single genes. We found that the size of individual PA peaks is positively correlated with the usage of corresponding pA site, which is determined by the number of reads within the PA peak from latent to lytic KSHV infection, and the strength of cis-elements surrounding KSHV pA site determines the expression level of viral genes. Lastly, we identified and experimentally validated alternative polyadenylation of KSHV ORF54, T1.5, and K11 during viral lytic infection. To our knowledge, this is the first report on alternative polyadenylation events in KSHV infection.
Collapse
Affiliation(s)
- Vladimir Majerciak
- Tumor Virus RNA Biology Section, Gene Regulation and Chromosome Biology Laboratory, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, Maryland, United States of America
| | - Ting Ni
- DNA Sequencing and Genomics Core, National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, Maryland, United States of America
| | - Wenjing Yang
- DNA Sequencing and Genomics Core, National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, Maryland, United States of America
| | - Bowen Meng
- Tumor Virus RNA Biology Section, Gene Regulation and Chromosome Biology Laboratory, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, Maryland, United States of America
| | - Jun Zhu
- DNA Sequencing and Genomics Core, National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, Maryland, United States of America
- * E-mail: (JZ); (ZMZ)
| | - Zhi-Ming Zheng
- Tumor Virus RNA Biology Section, Gene Regulation and Chromosome Biology Laboratory, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, Maryland, United States of America
- * E-mail: (JZ); (ZMZ)
| |
Collapse
|