1
|
Ye W, Lian Q, Ye C, Wu X. A Survey on Methods for Predicting Polyadenylation Sites from DNA Sequences, Bulk RNA-seq, and Single-cell RNA-seq. GENOMICS, PROTEOMICS & BIOINFORMATICS 2022:S1672-0229(22)00121-8. [PMID: 36167284 PMCID: PMC10372920 DOI: 10.1016/j.gpb.2022.09.005] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/19/2022] [Revised: 08/17/2022] [Accepted: 09/19/2022] [Indexed: 05/08/2023]
Abstract
Alternative polyadenylation (APA) plays important roles in modulating mRNA stability, translation, and subcellular localization, and contributes extensively to shaping eukaryotic transcriptome complexity and proteome diversity. Identification of poly(A) sites (pAs) on a genome-wide scale is a critical step toward understanding the underlying mechanism of APA-mediated gene regulation. A number of established computational tools have been proposed to predict pAs from diverse genomic data. Here we provided an exhaustive overview of computational approaches for predicting pAs from DNA sequences, bulk RNA sequencing (RNA-seq) data, and single-cell RNA sequencing (scRNA-seq) data. Particularly, we examined several representative tools using bulk RNA-seq and scRNA-seq data from peripheral blood mononuclear cells and put forward operable suggestions on how to assess the reliability of pAs predicted by different tools. We also proposed practical guidelines on choosing appropriate methods applicable to diverse scenarios. Moreover, we discussed in depth the challenges in improving the performance of pA prediction and benchmarking different methods. Additionally, we highlighted outstanding challenges and opportunities using new machine learning and integrative multi-omics techniques, and provided our perspective on how computational methodologies might evolve in the future for non-3' untranslated region, tissue-specific, cross-species, and single-cell pA prediction.
Collapse
Affiliation(s)
- Wenbin Ye
- Pasteurien College, Suzhou Medical College of Soochow University, Soochow University, Suzhou 215000, China
| | - Qiwei Lian
- Pasteurien College, Suzhou Medical College of Soochow University, Soochow University, Suzhou 215000, China; Department of Automation, Xiamen University, Xiamen 361005, China
| | - Congting Ye
- Key Laboratory of the Coastal and Wetland Ecosystems, Ministry of Education, College of the Environment and Ecology, Xiamen University, Xiamen 361005, China
| | - Xiaohui Wu
- Pasteurien College, Suzhou Medical College of Soochow University, Soochow University, Suzhou 215000, China.
| |
Collapse
|
2
|
Wenzel M, Johnston C, Müller B, Pettitt J, Connolly B. Resolution of polycistronic RNA by SL2 trans-splicing is a widely conserved nematode trait. RNA (NEW YORK, N.Y.) 2020; 26:1891-1904. [PMID: 32887788 PMCID: PMC7668243 DOI: 10.1261/rna.076414.120] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/18/2020] [Accepted: 08/26/2020] [Indexed: 06/11/2023]
Abstract
Spliced leader trans-splicing is essential for the processing and translation of polycistronic RNAs generated by eukaryotic operons. In C. elegans, a specialized spliced leader, SL2, provides the 5' end for uncapped pre-mRNAs derived from polycistronic RNAs. Studies of other nematodes suggested that SL2-type trans-splicing is a relatively recent innovation, confined to Rhabditina, the clade containing C. elegans and its close relatives. Here we conduct a survey of transcriptome-wide spliced leader trans-splicing in Trichinella spiralis, a distant relative of C. elegans with a particularly diverse repertoire of 15 spliced leaders. By systematically comparing the genomic context of trans-splicing events for each spliced leader, we identified a subset of T. spiralis spliced leaders that are specifically used to process polycistronic RNAs-the first examples of SL2-type spliced leaders outside of Rhabditina. These T. spiralis spliced leader RNAs possess a perfectly conserved stem-loop motif previously shown to be essential for SL2-type trans-splicing in C. elegans We show that genes trans-spliced to these SL2-type spliced leaders are organized in operonic fashion, with short intercistronic distances. A subset of T. spiralis operons show conservation of synteny with C. elegans operons. Our work substantially revises our understanding of nematode spliced leader trans-splicing, showing that SL2 trans-splicing is a major mechanism for nematode polycistronic RNA processing, which may have evolved prior to the radiation of the Nematoda. This work has important implications for the improvement of genome annotation pipelines in nematodes and other eukaryotes with operonic gene organization.
Collapse
Affiliation(s)
- Marius Wenzel
- Centre of Genome-Enabled Biology and Medicine, University of Aberdeen, Aberdeen AB24 3RY, United Kingdom
| | - Christopher Johnston
- School of Medicine, Medical Sciences and Nutrition, University of Aberdeen, Institute of Medical Sciences, Foresterhill, Aberdeen AB25 2ZD, United Kingdom
| | - Berndt Müller
- School of Medicine, Medical Sciences and Nutrition, University of Aberdeen, Institute of Medical Sciences, Foresterhill, Aberdeen AB25 2ZD, United Kingdom
| | - Jonathan Pettitt
- School of Medicine, Medical Sciences and Nutrition, University of Aberdeen, Institute of Medical Sciences, Foresterhill, Aberdeen AB25 2ZD, United Kingdom
| | - Bernadette Connolly
- School of Medicine, Medical Sciences and Nutrition, University of Aberdeen, Institute of Medical Sciences, Foresterhill, Aberdeen AB25 2ZD, United Kingdom
| |
Collapse
|
3
|
Zhu S, Wu X, Fu H, Ye C, Chen M, Jiang Z, Ji G. Modeling of Genome-Wide Polyadenylation Signals in Xenopus tropicalis. Front Genet 2019; 10:647. [PMID: 31333724 PMCID: PMC6616101 DOI: 10.3389/fgene.2019.00647] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2019] [Accepted: 06/18/2019] [Indexed: 12/22/2022] Open
Abstract
Alternative polyadenylation (APA) is an important post-transcriptional modification event to process messenger RNA (mRNA) for transcriptional termination, transport, and translation. In the present study, we characterized poly(A) signals in Xenopus tropicalis using 70,918 highly confident poly(A) sites derived from 16,511 protein-coding genes to understand their roles in the regulation of embryo development and gender difference. We examined potential factors, including the gene length, the number of introns in a gene, and the intron length, that may affect the prevalence of APA. We observed 12 prominent poly(A) signal patterns, which accounted for approximately 92% of total APA sites in Xenopus tropicalis. Among them, three patterns are specific to X. tropicalis, so they are absent in other animals such as humans or mice. We catalogued APA sites based on their genomic regions and developed a bioinformatics pipeline to identify over-represented signal patterns for each class. Then the schema of cis elements for APA sites in each genomic region was proposed. More importantly, APA usage is dramatically dynamic in embryos along five developmental stages and well-coordinated with the maternal-to-zygotic transition event. We used an entropy-based method to identify developmental stage-specific APA sites and identified significant signal patterns around specific sites and constitutive sites. We found that the APA frequency in different genomic regions varies with developmental stages and that those sites located in intron or coding sequence regions contribute most to the dynamics of gene expression during developmental stages. This study deciphers the characteristics and poly(A) signal patterns for both canonical APA sites and non-canonical APA sites across different developmental stages and gender dimorphisms in X. tropicalis, providing new insights into the dynamic regulation of distal and proximal APA.
Collapse
Affiliation(s)
- Sheng Zhu
- Department of Automation, Xiamen University, Xiamen, China.,National Institute for Data Science in Health and Medicine, Xiamen University, Xiamen, China
| | - Xiaohui Wu
- Department of Automation, Xiamen University, Xiamen, China.,National Institute for Data Science in Health and Medicine, Xiamen University, Xiamen, China.,Innovation Center for Cell Signaling Network, Xiamen University, Xiamen, China
| | - Hongjuan Fu
- Department of Automation, Xiamen University, Xiamen, China
| | - Congting Ye
- National Institute for Data Science in Health and Medicine, Xiamen University, Xiamen, China.,Key Laboratory of the Ministry of Education for Coastal and Wetland Ecosystems, College of the Environment and Ecology, Xiamen University, Xiamen, China
| | - Moliang Chen
- Department of Automation, Xiamen University, Xiamen, China
| | - Zhihua Jiang
- Department of Animal Sciences and Center for Reproductive Biology, Washington State University, Pullman, WA, United States
| | - Guoli Ji
- Department of Automation, Xiamen University, Xiamen, China.,National Institute for Data Science in Health and Medicine, Xiamen University, Xiamen, China.,Innovation Center for Cell Signaling Network, Xiamen University, Xiamen, China
| |
Collapse
|
4
|
Beadell AV, Haag ES. Evolutionary Dynamics of GLD-1-mRNA complexes in Caenorhabditis nematodes. Genome Biol Evol 2014; 7:314-35. [PMID: 25502909 PMCID: PMC4316625 DOI: 10.1093/gbe/evu272] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/04/2014] [Indexed: 12/17/2022] Open
Abstract
Given the large number of RNA-binding proteins and regulatory RNAs within genomes, posttranscriptional regulation may be an underappreciated aspect of cis-regulatory evolution. Here, we focus on nematode germ cells, which are known to rely heavily upon translational control to regulate meiosis and gametogenesis. GLD-1 belongs to the STAR-domain family of RNA-binding proteins, conserved throughout eukaryotes, and functions in Caenorhabditis elegans as a germline-specific translational repressor. A phylogenetic analysis across opisthokonts shows that GLD-1 is most closely related to Drosophila How and deuterostome Quaking, both implicated in alternative splicing. We identify messenger RNAs associated with C. briggsae GLD-1 on a genome-wide scale and provide evidence that many participate in aspects of germline development. By comparing our results with published C. elegans GLD-1 targets, we detect nearly 100 that are conserved between the two species. We also detected several hundred Cbr-GLD-1 targets whose homologs have not been reported to be associated with C. elegans GLD-1 in either of two independent studies. Low expression in C. elegans may explain the failure to detect most of them, but a highly expressed subset are strong candidates for Cbr-GLD-1-specific targets. We examine GLD-1-binding motifs among targets conserved in C. elegans and C. briggsae and find that most, but not all, display evidence of shared ancestral binding sites. Our work illustrates both the conservative and the dynamic character of evolution at the posttranslational level of gene regulation, even between congeners.
Collapse
Affiliation(s)
- Alana V Beadell
- Program in Behavior, Evolution, Ecology, and Systematics, University of Maryland, College Park Present address: Department of Organismal Biology and Anatomy, University of Chicago, Chicago, IL
| | - Eric S Haag
- Program in Behavior, Evolution, Ecology, and Systematics, University of Maryland, College Park Department of Biology, University of Maryland, College Park
| |
Collapse
|
5
|
Ji G, Guan J, Zeng Y, Li QQ, Wu X. Genome-wide identification and predictive modeling of polyadenylation sites in eukaryotes. Brief Bioinform 2014; 16:304-13. [DOI: 10.1093/bib/bbu011] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
|
6
|
Hafez D, Ni T, Mukherjee S, Zhu J, Ohler U. Genome-wide identification and predictive modeling of tissue-specific alternative polyadenylation. Bioinformatics 2013; 29:i108-16. [PMID: 23812974 PMCID: PMC3694680 DOI: 10.1093/bioinformatics/btt233] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
Motivation: Pre-mRNA cleavage and polyadenylation are essential steps for 3′-end maturation and subsequent stability and degradation of mRNAs. This process is highly controlled by cis-regulatory elements surrounding the cleavage/polyadenylation sites (polyA sites), which are frequently constrained by sequence content and position. More than 50% of human transcripts have multiple functional polyA sites, and the specific use of alternative polyA sites (APA) results in isoforms with variable 3′-untranslated regions, thus potentially affecting gene regulation. Elucidating the regulatory mechanisms underlying differential polyA preferences in multiple cell types has been hindered both by the lack of suitable data on the precise location of cleavage sites, as well as of appropriate tests for determining APAs with significant differences across multiple libraries. Results: We applied a tailored paired-end RNA-seq protocol to specifically probe the position of polyA sites in three human adult tissue types. We specified a linear-effects regression model to identify tissue-specific biases indicating regulated APA; the significance of differences between tissue types was assessed by an appropriately designed permutation test. This combination allowed to identify highly specific subsets of APA events in the individual tissue types. Predictive models successfully classified constitutive polyA sites from a biologically relevant background (auROC = 99.6%), as well as tissue-specific regulated sets from each other. We found that the main cis-regulatory elements described for polyadenylation are a strong, and highly informative, hallmark for constitutive sites only. Tissue-specific regulated sites were found to contain other regulatory motifs, with the canonical polyadenylation signal being nearly absent at brain-specific polyA sites. Together, our results contribute to the understanding of the diversity of post-transcriptional gene regulation. Availability: Raw data are deposited on SRA, accession numbers: brain SRX208132, kidney SRX208087 and liver SRX208134. Processed datasets as well as model code are published on our website: http://www.genome.duke.edu/labs/ohler/research/UTR/ Contact:uwe.ohler@duke.edu
Collapse
Affiliation(s)
- Dina Hafez
- Department of Computer Science, Duke University, Durham, NC 27708, USA
| | | | | | | | | |
Collapse
|
7
|
Han J, Liu Z, Zhong D, Wang T. A hybrid model for the prediction of mRNA polyadenylation signals. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2013; 2013:3511-4. [PMID: 24110486 DOI: 10.1109/embc.2013.6610299] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
The mRNA polyadenylation is the cellular process that adds adenosine tails to mature mRNAs. Malfunction of polyadenylation has been implicated in several human diseases. In this paper, we proposed a novel feature extraction approach which employs the K-gram nucleotide acid pattern, the position weight matrix (PWM) and the increment of diversity (ID) to represent the original features. Then Principle Component Analysis (PCA) was applied to transform the original features into a new feature space where the low-dimensional features were used to train the real-coded genetic neural network model. In the experiments, our proposed algorithm (GA-BP) can achieve the accuracy about 82.98%, specificity 82.95% and sensitivity 83.01% in the specific dataset constructed by Kalkatawi. The results demonstrate that GA-BP is a promising algorithm for the prediction of mRNA polyadenylation signals.
Collapse
|
8
|
Rehfeld A, Plass M, Krogh A, Friis-Hansen L. Alterations in polyadenylation and its implications for endocrine disease. Front Endocrinol (Lausanne) 2013; 4:53. [PMID: 23658553 PMCID: PMC3647115 DOI: 10.3389/fendo.2013.00053] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/10/2013] [Accepted: 04/22/2013] [Indexed: 12/17/2022] Open
Abstract
INTRODUCTION Polyadenylation is the process in which the pre-mRNA is cleaved at the poly(A) site and a poly(A) tail is added - a process necessary for normal mRNA formation. Genes with multiple poly(A) sites can undergo alternative polyadenylation (APA), producing distinct mRNA isoforms with different 3' untranslated regions (3' UTRs) and in some cases different coding regions. Two thirds of all human genes undergo APA. The efficiency of the polyadenylation process regulates gene expression and APA plays an important part in post-transcriptional regulation, as the 3' UTR contains various cis-elements associated with post-transcriptional regulation, such as target sites for micro-RNAs and RNA-binding proteins. Implications of alterations in polyadenylation for endocrine disease: Alterations in polyadenylation have been found to be causative of neonatal diabetes and IPEX (immune dysfunction, polyendocrinopathy, enteropathy, X-linked) and to be associated with type I and II diabetes, pre-eclampsia, fragile X-associated premature ovarian insufficiency, ectopic Cushing syndrome, and many cancer diseases, including several types of endocrine tumor diseases. PERSPECTIVES Recent developments in high-throughput sequencing have made it possible to characterize polyadenylation genome-wide. Antisense elements inhibiting or enhancing specific poly(A) site usage can induce desired alterations in polyadenylation, and thus hold the promise of new therapeutic approaches. SUMMARY This review gives a detailed description of alterations in polyadenylation in endocrine disease, an overview of the current literature on polyadenylation and summarizes the clinical implications of the current state of research in this field.
Collapse
Affiliation(s)
- Anders Rehfeld
- Genomic Medicine, Rigshospitalet, Copenhagen University HospitalCopenhagen, Denmark
| | - Mireya Plass
- Department of Biology, The Bioinformatics Centre, University of CopenhagenCopenhagen, Denmark
| | - Anders Krogh
- Department of Biology, The Bioinformatics Centre, University of CopenhagenCopenhagen, Denmark
| | - Lennart Friis-Hansen
- Genomic Medicine, Rigshospitalet, Copenhagen University HospitalCopenhagen, Denmark
- *Correspondence: Lennart Friis-Hansen, Genomic Medicine, Rigshospitalet, Copenhagen University Hospital, 4113, Blegdamsvej 9, DK2100 Copenhagen, Denmark. e-mail:
| |
Collapse
|
9
|
Wu X, Ji G, Zeng Y. In silico prediction of mRNA poly(A) sites in Chlamydomonas reinhardtii. Mol Genet Genomics 2012; 287:895-907. [PMID: 23108961 DOI: 10.1007/s00438-012-0725-5] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2012] [Accepted: 10/20/2012] [Indexed: 12/31/2022]
Abstract
Accurately predicting polyadenylation [poly(A)] sites is important for defining the end of genes and understanding gene regulation mechanisms. Alternative polyadenylation (APA) has been demonstrated to play an important role in transcriptome diversity and regulating gene expression. To accurately predict poly(A) and APA sites in Chlamydomonas reinhardtii, a green alga that can be used to produce renewable energy, we proposed a novel model that integrated five methods for representing the features of these sites with a combined classifier. We presented a new grouping method based on pattern assembly to classify the poly(A) sites into four groups. We used five methods, involving the predicted RNA secondary structure, the term frequency-inverse document frequency weight, first-order Markov chain, pentamer ratio and a position weight matrix, to generate the feature space. We then developed a heuristic method to form the combined classifier by weighting multiple classifiers to predict poly(A) sites in each group. The high specificity and sensitivity of this model were demonstrated by testing the four groups of poly(A) sites and the intronic APA sites. The average prediction performance was approximately 8 % higher than the performance of a previous prediction model. For the group without any conserved patterns, the prediction accuracy was 9 % higher than for the accuracy with the previous technique. However, the prediction efficiency of this group was still significantly lower than that of the other groups, indicating the importance of identifying additional signal patterns for poly(A) site prediction. We also predicted the alternative poly(A) sites in introns with good accuracy. This prediction model was designed to be easily expanded with new classifiers or new features. Therefore, this model is applicable to new data or other species. Our model will be useful both in genome annotation because it predicts the end of a mature transcript and in genetic engineering because it enables researchers to eliminate undesirable poly(A) sites.
Collapse
Affiliation(s)
- Xiaohui Wu
- Department of Automation, Xiamen University, Xiamen 361000, China.
| | | | | |
Collapse
|
10
|
Tian B, Graber JH. Signals for pre-mRNA cleavage and polyadenylation. WILEY INTERDISCIPLINARY REVIEWS-RNA 2011; 3:385-96. [PMID: 22012871 DOI: 10.1002/wrna.116] [Citation(s) in RCA: 159] [Impact Index Per Article: 12.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/10/2023]
Abstract
Pre-mRNA cleavage and polyadenylation is an essential step for 3' end formation of almost all protein-coding transcripts in eukaryotes. The reaction, involving cleavage of nascent mRNA followed by addition of a polyadenylate or poly(A) tail, is controlled by cis-acting elements in the pre-mRNA surrounding the cleavage site. Experimental and bioinformatic studies in the past three decades have elucidated conserved and divergent elements across eukaryotes, from yeast to human. Here we review histories and current models of these elements in a broad range of species.
Collapse
Affiliation(s)
- Bin Tian
- UMDNJ-New Jersey Medical School, Newark, NJ, USA.
| | | |
Collapse
|
11
|
Abstract
Originally discovered in C. elegans, microRNAs (miRNAs) are small RNAs that regulate fundamental cellular processes in diverse organisms. MiRNAs are encoded within the genome and are initially transcribed as primary transcripts that can be several kilobases in length. Primary transcripts are successively cleaved by two RNase III enzymes, Drosha in the nucleus and Dicer in the cytoplasm, to produce ∼70 nucleotide (nt) long precursor miRNAs and 22 nt long mature miRNAs, respectively. Mature miRNAs regulate gene expression post-transcriptionally by imperfectly binding target mRNAs in association with the multiprotein RNA induced silencing complex (RISC). The conserved sequence, expression pattern, and function of some miRNAs across distinct species as well as the importance of specific miRNAs in many biological pathways have led to an explosion in the study of miRNA biogenesis, miRNA target identification, and miRNA target regulation. Many advances in our understanding of miRNA biology have come from studies in the powerful model organism C. elegans. This chapter reviews the current methods used in C. elegans to study miRNA biogenesis, small RNA populations, miRNA-protein complexes, and miRNA target regulation.
Collapse
Affiliation(s)
| | - Shih-Peng Chan
- Department of Molecular, Cellular and Developmental Biology,Yale University, New Haven, Connecticut, USA
| | - Frank J Slack
- Department of Molecular, Cellular and Developmental Biology,Yale University, New Haven, Connecticut, USA
| | - Amy E Pasquinelli
- Department of Biology, University of California San Diego, La Jolla, California, USA
| |
Collapse
|
12
|
|
13
|
Krepp J, Gelmedin V, Hawdon JM. Characterisation of hookworm heat shock factor binding protein (HSB-1) during heat shock and larval activation. Int J Parasitol 2010; 41:533-43. [PMID: 21172351 DOI: 10.1016/j.ijpara.2010.12.001] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2010] [Revised: 12/07/2010] [Accepted: 12/09/2010] [Indexed: 11/30/2022]
Abstract
When hookworm infective L3s infect their mammalian host, they undergo a temperature shift from that of the ambient environment to that of their endothermic host. Additionally, L3s living in the environment can be exposed to temperature extremes associated with weather fluctuations. The heat shock response (HSR) is a conserved response to heat shock and other stress that involves the expression of protective heat shock proteins (HSPs). The HSR is controlled by heat shock factor-1 (HSF-1), a conserved transcription factor that binds to a heat shock element in the promoter of HSPs, causing their expression. HSF-1 is negatively regulated in part by a HSF binding protein (HSB-1) that binds to and removes HSF-1 trimers bound to HSP gene promoters, resulting in attenuation of the HSR. Herein we describe an HSB-1 orthologue, Ac-HSB-1, from the hookworm Ancylostoma caninum. The Ac-hsb-1 cDNA encodes a 79 amino acid protein that is 71% identical to the Caenorhabditis elegans HSB-1, and is predicted to share the characteristic coiled-coil structural motif comprised of two interacting alpha helices. Recombinant Ac-HSB-1 immunoprecipitated Ce-HSF-1 expressed in mammalian cells that had been heat shocked for 1h at 42°C, but not from cells incubated at 37°C, indicating that HSB-1 only bound to the active DNA binding form of HSF-1. Expression of Ac-hsb-1 transcripts decreased following 1h of heat shock, but increased when L3s were incubated at 37°C for 1h. Activation of hookworm L3s induces a five-sixfold increase in Ac-hsb-1 expression that peaks at 12h, coincident with L3 feeding, but that subsequently decreases to two-threefold above control at 24h. Recombinant Ac-HSB-1 immunoprecipitates greater amounts of 70 and 40kDa proteins from extracts of activated L3s than from non-activated L3s. We propose that an increase in Ac-hsb-1 levels early in activation allows feeding to resume, but that a subsequent decrease in expression permits a HSR that protects non-developing L3s at host-like temperatures. Further investigations of the HSR will clarify the role of HSB-1 and HSF-1 in hookworm infection.
Collapse
Affiliation(s)
- Joseph Krepp
- Department of Microbiology, Immunology, and Tropical Medicine, The George Washington University Medical Center, 2300 Eye St. NW, Washington, DC 20037, USA
| | | | | |
Collapse
|
14
|
Akhtar MN, Bukhari SA, Fazal Z, Qamar R, Shahmuradov IA. POLYAR, a new computer program for prediction of poly(A) sites in human sequences. BMC Genomics 2010; 11:646. [PMID: 21092114 PMCID: PMC3053588 DOI: 10.1186/1471-2164-11-646] [Citation(s) in RCA: 49] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2010] [Accepted: 11/19/2010] [Indexed: 01/12/2023] Open
Abstract
BACKGROUND mRNA polyadenylation is an essential step of pre-mRNA processing in eukaryotes. Accurate prediction of the pre-mRNA 3'-end cleavage/polyadenylation sites is important for defining the gene boundaries and understanding gene expression mechanisms. RESULTS 28761 human mapped poly(A) sites have been classified into three classes containing different known forms of polyadenylation signal (PAS) or none of them (PAS-strong, PAS-weak and PAS-less, respectively) and a new computer program POLYAR for the prediction of poly(A) sites of each class was developed. In comparison with polya_svm (till date the most accurate computer program for prediction of poly(A) sites) while searching for PAS-strong poly(A) sites in human sequences, POLYAR had a significantly higher prediction sensitivity (80.8% versus 65.7%) and specificity (66.4% versus 51.7%) However, when a similar sort of search was conducted for PAS-weak and PAS-less poly(A) sites, both programs had a very low prediction accuracy, which indicates that our knowledge about factors involved in the determination of the poly(A) sites is not sufficient to identify such polyadenylation regions. CONCLUSIONS We present a new classification of polyadenylation sites into three classes and a novel computer program POLYAR for prediction of poly(A) sites/regions of each of the class. In tests, POLYAR shows high accuracy of prediction of the PAS-strong poly(A) sites, though this program's efficiency in searching for PAS-weak and PAS-less poly(A) sites is not very high but is comparable to other available programs. These findings suggest that additional characteristics of such poly(A) sites remain to be elucidated. POLYAR program with a stand-alone version for downloading is available at http://cub.comsats.edu.pk/polyapredict.htm.
Collapse
Affiliation(s)
- Malik Nadeem Akhtar
- Department of Biosciences, COMSATS Institute of Information Technology, Islamabad, Pakistan
| | | | | | | | | |
Collapse
|
15
|
Ji G, Wu X, Shen Y, Huang J, Quinn Li Q. A classification-based prediction model of messenger RNA polyadenylation sites. J Theor Biol 2010; 265:287-96. [DOI: 10.1016/j.jtbi.2010.05.015] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2009] [Revised: 03/21/2010] [Accepted: 05/13/2010] [Indexed: 12/30/2022]
|
16
|
Merritt C, Seydoux G. The Puf RNA-binding proteins FBF-1 and FBF-2 inhibit the expression of synaptonemal complex proteins in germline stem cells. Development 2010; 137:1787-98. [PMID: 20431119 DOI: 10.1242/dev.050799] [Citation(s) in RCA: 45] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
Abstract
FBF-1 and FBF-2 (collectively FBF) are two nearly identical Puf-domain RNA-binding proteins that regulate the switch from mitosis to meiosis in the C. elegans germline. In germline stem cells, FBF prevents premature meiotic entry by inhibiting the expression of meiotic regulators, such as the RNA-binding protein GLD-1. Here, we demonstrate that FBF also directly inhibits the expression of structural components of meiotic chromosomes. HIM-3, HTP-1, HTP-2, SYP-2 and SYP-3 are components of the synaptonemal complex (SC) that forms between homologous chromosomes during meiotic prophase. In wild-type germlines, the five SC proteins are expressed shortly before meiotic entry. This pattern depends on FBF binding sites in the 3' UTRs of the SC mRNAs. In the absence of FBF or the FBF binding sites, SC proteins are expressed precociously in germline stem cells and their precursors. SC proteins aggregate and SC formation fails at meiotic entry. Precocious SC protein expression is observed even when meiotic entry is delayed in fbf mutants by reducing GLD-1. We propose that parallel regulation by FBF ensures that in wild-type gonads, meiotic entry is coordinated with just-in-time synthesis of synaptonemal proteins.
Collapse
Affiliation(s)
- Christopher Merritt
- Department of Molecular Biology and Genetics, Howard Hughes Medical Institute, Center for Cell Dynamics, Johns Hopkins School of Medicine, Baltimore, MD 21205, USA
| | | |
Collapse
|
17
|
Messenger RNA Polyadenylation Site Recognition in Green Alga Chlamydomonas Reinhardtii. ADVANCES IN NEURAL NETWORKS - ISNN 2010 2010. [DOI: 10.1007/978-3-642-13278-0_3] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/10/2023]
|
18
|
Broitman-Maduro G, Owraghi M, Hung WWK, Kuntz S, Sternberg PW, Maduro MF. The NK-2 class homeodomain factor CEH-51 and the T-box factor TBX-35 have overlapping function in C. elegans mesoderm development. Development 2009; 136:2735-46. [PMID: 19605496 DOI: 10.1242/dev.038307] [Citation(s) in RCA: 41] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
Abstract
The C. elegans MS blastomere, born at the 7-cell stage of embryogenesis, generates primarily mesodermal cell types, including pharynx cells, body muscles and coelomocytes. A presumptive null mutation in the T-box factor gene tbx-35, a target of the MED-1 and MED-2 divergent GATA factors, was previously found to result in a profound decrease in the production of MS-derived tissues, although the tbx-35(-) embryonic arrest phenotype was variable. We report here that the NK-2 class homeobox gene ceh-51 is a direct target of TBX-35 and at least one other factor, and that CEH-51 and TBX-35 share functions. Embryos homozygous for a ceh-51 null mutation arrest as larvae with pharynx and muscle defects, although these tissues appear to be specified correctly. Loss of tbx-35 and ceh-51 together results in a synergistic phenotype resembling loss of med-1 and med-2. Overexpression of ceh-51 causes embryonic arrest and generation of ectopic body muscle and coelomocytes. Our data show that TBX-35 and CEH-51 have overlapping function in MS lineage development. As T-box regulators and NK-2 homeodomain factors are both important for heart development in Drosophila and vertebrates, our results suggest that these regulators function in a similar manner in C. elegans to specify a major precursor of mesoderm.
Collapse
|
19
|
Prediction of non-canonical polyadenylation signals in human genomic sequences based on a novel algorithm using a fuzzy membership function. J Biosci Bioeng 2009; 107:569-78. [DOI: 10.1016/j.jbiosc.2009.01.001] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2008] [Revised: 01/05/2009] [Accepted: 01/05/2009] [Indexed: 11/23/2022]
|
20
|
Interaction of hookworm 14-3-3 with the forkhead transcription factor DAF-16 requires intact Akt phosphorylation sites. Parasit Vectors 2009; 2:21. [PMID: 19393088 PMCID: PMC2683825 DOI: 10.1186/1756-3305-2-21] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2009] [Accepted: 04/24/2009] [Indexed: 01/29/2023] Open
Abstract
Background Third-stage infective larvae (L3) of hookworms are in an obligatory state of developmental arrest that ends upon entering the definitive host, where they receive a signal that re-activates development. Recovery from the developmentally arrested dauer stage of Caenorhabditis elegans is analogous to the resumption of development during hookworm infection. Insulin-like signaling (ILS) mediates recovery from arrest in C. elegans and activation of hookworm dauer L3. In C. elegans, phosphorylation of the forkhead transcription factor DAF-16 in response to ILS creates binding cites for the 14-3-3 protein Ce-FTT-2, which translocates DAF-16 out of the nucleus, resulting in resumption of reproductive development. Results To determine if hookworm 14-3-3 proteins play a similar role in L3 activation, hookworm FTT-2 was identified and tested for its ability to interact with A. caninum DAF-16 in vitro. The Ac-FTT-2 amino acid sequence was 91% identical to the Ce-FTT-2, and was most closely related to FTT-2 from other nematodes. Ac-FTT-2 was expressed in HEK 293T cells, and was recognized by an antibody against human 14-3-3β isoform. Reciprocal co-immunoprecipitations using anti-epitope tag antibodies indicated that Ac-FTT-2 interacts with Ac-DAF-16 when co-expressed in serum-stimulated HEK 293T cells. This interaction requires intact Akt consensus phosphorylation sites at serine107 and threonine312, but not serine381. Ac-FTT-2 was undetectable by Western blot in excretory/secretory products from serum-stimulated (activated) L3 or adult A. caninum. Conclusion The results indicate that Ac-FTT-2 interacts with DAF-16 in a phosphorylation-site dependent manner, and suggests that Ac-FTT-2 mediates activation of L3 by binding Ac-DAF-16 during hookworm infection.
Collapse
|
21
|
Gao X, Frank D, Hawdon JM. Molecular cloning and DNA binding characterization of DAF-16 orthologs from Ancylostoma hookworms. Int J Parasitol 2008; 39:407-15. [PMID: 18930062 DOI: 10.1016/j.ijpara.2008.09.005] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2008] [Revised: 09/19/2008] [Accepted: 09/20/2008] [Indexed: 11/30/2022]
Abstract
Infective hookworm L3s encounter a host-specific signal during infection that re-initiates a suspended developmental pathway, resulting in development to the adult stage. This resumption of development in the host is analogous to recovery of developmentally arrested Caenorhabditis elegans dauer larvae in response to favorable environmental signals. Dauer recovery in C. elegans dauers and hookworm L3s is mediated by insulin-like signaling (ILS). A key output of ILS in C. elegans is the forkhead transcription factor DAF-16, which controls the expression of genes required for maintenance of the dauer stage. The similarity between recovery pathways of L3s and dauers suggests that DAF-16 functions similarly in hookworm L3 activation. To test this, orthologs of Ce-DAF-16 were isolated from the hookworms Ancylostoma caninum and Ancylostoma ceylanicum. The protein sequences of hookworm DAF-16 DNA binding domains were identical, and shared 94% identity with the b and c isoforms of Ce-DAF-16. Ac-DAF-16 expressed in HEK293 kidney cells bound strongly to the conserved DAF family binding element (DBE), but not to a random DNA sequence. Ac-DAF-16 was able to drive transcription of a reporter gene located downstream of six copies of the DBE in NIH3T3 cells under starved conditions. Addition of serum caused a decrease in reporter gene expression, indicating that DAF-16 is negatively regulated by growth factor stimulation. These data confirm the presence of DAF-16 orthologs in hookworms, and demonstrate that Ac-DAF-16 binds to and drives transcription from a conserved DAF-16 family DNA binding element.
Collapse
Affiliation(s)
- Xin Gao
- Department of Microbiology and Tropical Medicine, The George Washington University Medical Center, Washington, DC 20037, USA
| | | | | |
Collapse
|
22
|
Stumpf CR, Kimble J, Wickens M. A Caenorhabditis elegans PUF protein family with distinct RNA binding specificity. RNA (NEW YORK, N.Y.) 2008; 14:1550-7. [PMID: 18579869 PMCID: PMC2491472 DOI: 10.1261/rna.1095908] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/05/2023]
Abstract
PUF proteins comprise a highly conserved family of sequence-specific RNA binding proteins that regulate target mRNAs via binding directly to their 3'UTRs. The Caenorhabditis elegans genome encodes several PUF proteins, which cluster into four groups based on sequence similarity; all share amino acids that interact with the RNA in the cocrystal of human Pumilio with RNA. Members of the FBF and the PUF-8/9 groups bind different but related RNA sequences. We focus here on the binding specificity of representatives of a third cluster, comprising PUF-5, -6, and -7. We performed in vivo selection experiments using the yeast three-hybrid system to identify RNA sequences that bind PUF-5 and PUF-6, and we confirmed binding to optimal sites in vitro. The consensus sequences derived from the screens are similar for PUF-5 and PUF-6 but differ from those of the FBF or PUF-8/-9 groups. Similarly, neither PUF-5 nor PUF-6 bind the recognition sites preferred by the other clusters. Mutagenesis studies confirmed the unique RNA specificity of PUF-5/-6. Using the PUF-5 consensus derived from our experiments, we searched a database of C. elegans 3'UTRs to identify potential targets of PUF-5, several of which indeed bind PUF-5. Therefore the consensus has predictive value and provides a route to finding genuine targets of these proteins.
Collapse
Affiliation(s)
- Craig R Stumpf
- Program in Cellular and Molecular Biology, University of Wisconsin-Madison, Madison, Wisconsin 53706, USA
| | | | | |
Collapse
|
23
|
Abstract
As the number of sequenced genomes increases, the ability to deduce genome function becomes increasingly salient. For many genome sequences, the only annotation that will be available for the foreseeable future will be based on computational predictions and comparisons with functional elements in related species. Here we discuss computational approaches for automated genome-wide annotation of functional elements in mammalian genomes. These include methods for ab initio and comparative gene-structure predictions. Gene features such as intron splice sites, 3' untranslated regions, promoters, and cis-regulatory elements are discussed, as is a novel method for predicting DNaseI hypersensitive sites. Recent methodologies for predicting noncoding RNA genes, including microRNA genes and their targets, are also reviewed.
Collapse
Affiliation(s)
- Steven J M Jones
- Genome Sciences Centre, British Columbia Cancer Research Center, Vancouver, British Columbia, V5Z 1L3, Canada.
| |
Collapse
|
24
|
Zamorano A, López-Camarillo C, Orozco E, Weber C, Guillen N, Marchat LA. In silico analysis of EST and genomic sequences allowed the prediction of cis-regulatory elements for Entamoeba histolytica mRNA polyadenylation. Comput Biol Chem 2008; 32:256-63. [PMID: 18514032 DOI: 10.1016/j.compbiolchem.2008.03.019] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2007] [Revised: 03/24/2008] [Accepted: 03/24/2008] [Indexed: 10/22/2022]
Abstract
In most eukaryotic cells, the poly(A) tail at the 3'-end of messenger RNA (mRNA) is essential for nuclear export, translatability, stability and transcription termination. Poly(A) tail formation involves multi-protein complexes that interact with specific sequences in 3'-untranslated region (3'-UTR) of precursor mRNA (pre-mRNA). Here we have performed a computational analysis of a large EST and genomic sequences collection from Entamoeba histolytica, the protozoan parasite responsible for human amoebiasis, to identify conserved elements that could be involved in pre-mRNA polyadenylation. Results evidenced the presence of an AU-rich domain corresponding to the consensus UA(A/U)UU polyadenylation signal or variants, the cleavage and polyadenylation site that is generally denoted by U residue and flanked by two U-rich tracts, and a novel A-rich element. This predicted array was validated through the analysis of genomic sequences and predicted mRNA folding of genes with known polyadenylation site. The molecular organization of pre-mRNA 3'-UTR cis-regulatory elements appears to be roughly conserved through evolutionary scale, whereas the polyadenylation signal seems to be species-specific in protozoan parasites and the novel A-rich element is unique for the primitive eukaryote E. histolytica. To our knowledge, this paper is the first work about the identification of potential pre-mRNA 3'-UTR cis-regulatory sequences through in silico analysis of large sets of cDNA and genomic sequences in a protozoan parasite.
Collapse
Affiliation(s)
- Absalom Zamorano
- ENMH-IPN, Programa Institucional de Biomedicina Molecular, Guillermo Massieu Heguera #239, Ticoman, CP 07320, México, D.F., Mexico
| | | | | | | | | | | |
Collapse
|
25
|
Liu F, Xu W, Tan L, Xue Y, Sun C, Su Z. Case study for identification of potentially indel-caused alternative expression isoforms in the rice subspecies japonica and indica by integrative genome analysis. Genomics 2007; 91:186-94. [PMID: 18037265 DOI: 10.1016/j.ygeno.2007.10.001] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2007] [Revised: 09/27/2007] [Accepted: 10/03/2007] [Indexed: 11/30/2022]
Abstract
Alternative splicing (AS) is one of the most significant components of the functional complexity of the eukaryote genome, increasing protein diversity, creating isoforms, and affecting mRNA stability. Recently, whole genome sequences and large microarray data sets have become available, making data integration feasible and allowing the study of the possible regulatory mechanism of AS in rice (Oryza sativa) by erecting and testing hypotheses before doing bench studies. We have developed a new strategy and have identified 215 rice genes with alternative expression isoforms related to insertion and deletion (indel) between subspecies indica and subspecies japonica. We did a case study for alternative expression isoforms of the rice peroxidase gene LOC_Os06g48030 to investigate possible mechanisms by which indels caused alternative splicing between the indica and the japonica varieties by mining of array data together with validation by RT-PCR and genome sequencing analysis. Multiple poly(A) signals were detected in the specific indel region for LOC_Os06g48030. We present a new methodology to promote more discoveries of potentially indel-caused AS genes in rice, which may serve as the foundation for research into the regulatory mechanism of alternative expression isoforms between subspecies.
Collapse
Affiliation(s)
- Fengxia Liu
- State Key Laboratory of Plant Physiology and Biochemistry, College of Biological Sciences, China Agricultural University, Beijing 100094, China
| | | | | | | | | | | |
Collapse
|
26
|
Mangone M, Macmenamin P, Zegar C, Piano F, Gunsalus KC. UTRome.org: a platform for 3'UTR biology in C. elegans. Nucleic Acids Res 2007; 36:D57-62. [PMID: 17986455 PMCID: PMC2238901 DOI: 10.1093/nar/gkm946] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023] Open
Abstract
Three-prime untranslated regions (3′UTRs) are widely recognized as important post-transcriptional regulatory regions of mRNAs. RNA-binding proteins and small non-coding RNAs such as microRNAs (miRNAs) bind to functional elements within 3′UTRs to influence mRNA stability, translation and localization. These interactions play many important roles in development, metabolism and disease. However, even in the most well-annotated metazoan genomes, 3′UTRs and their functional elements are not well defined. Comprehensive and accurate genome-wide annotation of 3′UTRs and their functional elements is thus critical. We have developed an open-access database, available at http://www.UTRome.org, to provide a rich and comprehensive resource for 3′UTR biology in the well-characterized, experimentally tractable model system Caenorhabditis elegans. UTRome.org combines data from public repositories and a large-scale effort we are undertaking to characterize 3′UTRs and their functional elements in C. elegans, including 3′UTR sequences, graphical displays, predicted and validated functional elements, secondary structure predictions and detailed data from our cloning pipeline. UTRome.org will grow substantially over time to encompass individual 3′UTR isoforms for the majority of genes, new and revised functional elements, and in vivo data on 3′UTR function as they become available. The UTRome database thus represents a powerful tool to better understand the biology of 3′UTRs.
Collapse
Affiliation(s)
- Marco Mangone
- Department of Biology and Center for Genomics and Systems Biology, New York University, 100 Washington Square East, New York, NY 10003, USA
| | | | | | | | | |
Collapse
|
27
|
Graber JH, Salisbury J, Hutchins LN, Blumenthal T. C. elegans sequences that control trans-splicing and operon pre-mRNA processing. RNA (NEW YORK, N.Y.) 2007; 13:1409-26. [PMID: 17630324 PMCID: PMC1950753 DOI: 10.1261/rna.596707] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/12/2007] [Accepted: 05/17/2007] [Indexed: 05/04/2023]
Abstract
Many mRNAs in Caenorhabditis elegans are generated through a trans-splicing reaction that adds one of two classes of spliced leader RNA to an independently transcribed pre-mRNA. SL1 leaders are spliced mostly to pre-mRNAs from genes with outrons, intron-like sequences at the 5'-ends of the pre-mRNAs. In contrast, SL2 leaders are nearly exclusively trans-spliced to genes that occur downstream in polycistronic pre-mRNAs produced from operons. Operon pre-mRNA processing requires separation into individual transcripts, which is accomplished by 3'-processing of upstream genes and spliced leader trans-splicing to the downstream genes. We used a novel computational analysis, based on nonnegative matrix factorization, to identify and characterize significant differences in the cis-acting sequence elements that differentiate various types of functional site, including internal versus terminal 3'-processing sites, and SL1 versus SL2 trans-splicing sites. We describe several key elements, including the U-rich (Ur) element that couples 3'-processing with SL2 trans-splicing, and a novel outron (Ou) element that occurs upstream of SL1 trans-splicing sites. Finally, we present models of the distinct classes of trans-splicing reaction, including SL1 trans-splicing at the outron, SL2 trans-splicing in standard operons, competitive SL1-SL2 trans-splicing in operons with large intergenic separation, and SL1 trans-splicing in SL1-type operons, which have no intergenic separation.
Collapse
|
28
|
Moucadel V, Lopez F, Ara T, Benech P, Gautheret D. Beyond the 3' end: experimental validation of extended transcript isoforms. Nucleic Acids Res 2007; 35:1947-57. [PMID: 17339231 PMCID: PMC1874610 DOI: 10.1093/nar/gkm062] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
High throughput EST and full-length cDNA sequencing have revealed extensive variations at the 3' ends of mammalian transcripts. Whether all of these changes are biologically meaningful has been the subject of controversy, as such, results may reflect in part transcription or polyadenylation leakage. We selected here a set of tandem poly(A) sites predicted from EST/cDNA sequence analysis that (i) are conserved between human and mouse, (ii) produce alternative 3' isoforms with unusual size features and (iii) are not documented in current genome databases, and we submitted these sites to experimental validation in mouse tissues. Out of 86 tested poly(A) sites from 44 genes, 84 were individually confirmed using a specially devised RT-PCR strategy. We then focused on validating the exon structure between distant tandem poly(A) sites separated by over 3 kb, and between stop codons and alternative poly(A) sites located at 4.5 kb or more, using a long-distance RT-PCR strategy. In most cases, long transcripts spanning the whole poly(A)-poly(A) or stop-poly(A) distance were detected, confirming that tandem sites were part of the same transcription unit. Given the apparent conservation of these long alternative 3' ends, different regulatory functions can be foreseen, depending on the location where transcription starts.
Collapse
Affiliation(s)
| | | | | | | | - Daniel Gautheret
- *To whom correspondence should be addressed. 33 (0)1 69 15 46 3233 (0)1 69 15 46 29
| |
Collapse
|
29
|
Hayes GD, Frand AR, Ruvkun G. The mir-84 and let-7 paralogous microRNA genes of Caenorhabditis elegans direct the cessation of molting via the conserved nuclear hormone receptors NHR-23 and NHR-25. Development 2006; 133:4631-41. [PMID: 17065234 DOI: 10.1242/dev.02655] [Citation(s) in RCA: 88] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
Abstract
The let-7 microRNA (miRNA) gene of Caenorhabditis elegans controls the timing of developmental events. let-7 is conserved throughout bilaterian phylogeny and has multiple paralogs. Here, we show that the paralog mir-84 acts synergistically with let-7 to promote terminal differentiation of the hypodermis and the cessation of molting in C. elegans. Loss of mir-84 exacerbates phenotypes caused by mutations in let-7, whereas increased expression of mir-84 suppresses a let-7 null allele. Adults with reduced levels of mir-84 and let-7 express genes characteristic of larval molting as they initiate a supernumerary molt. mir-84 and let-7 promote exit from the molting cycle by regulating targets in the heterochronic pathway and also nhr-23 and nhr-25, genes encoding conserved nuclear hormone receptors essential for larval molting. The synergistic action of miRNA paralogs in development may be a general feature of the diversified miRNA gene family.
Collapse
Affiliation(s)
- Gabriel D Hayes
- Department of Genetics, Harvard Medical School and Department of Molecular Biology, Massachusetts General Hospital, Boston, MA 02114, USA
| | | | | |
Collapse
|
30
|
Hajarnavis A, Durbin R. A conserved sequence motif in 3' untranslated regions of ribosomal protein mRNAs in nematodes. RNA (NEW YORK, N.Y.) 2006; 12:1786-9. [PMID: 16917125 PMCID: PMC1581986 DOI: 10.1261/rna.51306] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/11/2023]
Abstract
The 3' untranslated regions (3' UTR) of eukaryotic genes can contain motifs involved in regulation of gene expression or localization at the post-transcriptional level. This study concerns the identification of novel, conserved elements in 3' UTRs of many ribosomal protein mRNAs in Caenorhabditis elegans and Caenorhabditis briggsae. Analysis of the region around the polyadenylation signal in many ribosomal protein mRNAs indicates the conservation of a sequence motif UUGUU occurring both before and immediately after the polyadenylation signal. Building a statistical model of this motif and searching a database of C. elegans 3' UTRs reveals that this motif is also present in the 3' UTR of some genes involved in translation and ribosome maturation, among others. We suggest that this signal may be involved in translation or other message-level regulation of ribosomal genes in C. elegans.
Collapse
|
31
|
Abstract
mRNA polyadenylation is responsible for the 3' end formation of most mRNAs in eukaryotic cells and is linked to termination of transcription. Prediction of mRNA polyadenylation sites [poly(A) sites] can help identify genes, define gene boundaries, and elucidate regulatory mechanisms. Current methods for poly(A) site prediction achieve moderate sensitivity and specificity. Here, we present a method using support vector machine for poly(A) site prediction. Using 15 cis-regulatory elements that are over-represented in various regions surrounding poly(A) sites, this method achieves higher sensitivity and similar specificity when compared with polyadq, a common tool for poly(A) site prediction. In addition, we found that while the polyadenylation signal AAUAAA and U-rich elements are primary determinants for poly(A) site prediction, other elements contribute to both sensitivity and specificity of the prediction, indicating a combinatorial mechanism involving multiple elements when choosing poly(A) sites in human cells.
Collapse
Affiliation(s)
- Yiming Cheng
- Department of Mathematical Sciences, New Jersey Institute of Technology Newark, NJ 07102, USA
| | | | | |
Collapse
|
32
|
Retelska D, Iseli C, Bucher P, Jongeneel CV, Naef F. Similarities and differences of polyadenylation signals in human and fly. BMC Genomics 2006; 7:176. [PMID: 16836751 PMCID: PMC1574307 DOI: 10.1186/1471-2164-7-176] [Citation(s) in RCA: 42] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2005] [Accepted: 07/12/2006] [Indexed: 02/02/2023] Open
Abstract
BACKGROUND Cleavage of messenger RNA (mRNA) precursors is an essential step in mRNA maturation. The signal recognized by the cleavage enzyme complex has been characterized as an A rich region upstream of the cleavage site containing a motif with consensus AAUAAA, followed by a U or UG rich region downstream of the cleavage site. RESULTS We studied these signals using exhaustive databases of cleavage sites obtained from aligning raw expressed sequence tags (EST) sequences to genomic sequences in Homo sapiens and Drosophila melanogaster. These data show that the polyadenylation signal is highly conserved in human and fly. In addition, de novo motif searches generated a refined description of the U-rich downstream sequence (DSE) element, which shows more divergence between the two species. These refined motifs are applied, within a Hidden Markov Model (HMM) framework, to predict mRNA cleavage sites. CONCLUSION We demonstrate that the DSE is a specific motif in both human and Drosophila. These findings shed light on the sequence correlates of a highly conserved biological process, and improve in silico prediction of 3' mRNA cleavage and polyadenylation sites.
Collapse
Affiliation(s)
- Dorota Retelska
- Swiss Institute of Bioinformatics, Batiment Genopode, UNIL, 1015 Lausanne, Switzerland
- Swiss Institute for Experimental Cancer Research (ISREC), Ecole Polytechnique Fédérale de Lausanne (EPFL), AAB-021, CH-1015 Lausanne, Switzerland
- Ludwig Institute for Cancer Research, Batiment Genopode, UNIL, 1015 Lausanne, Switzerland
| | - Christian Iseli
- Swiss Institute of Bioinformatics, Batiment Genopode, UNIL, 1015 Lausanne, Switzerland
- Ludwig Institute for Cancer Research, Batiment Genopode, UNIL, 1015 Lausanne, Switzerland
| | - Philipp Bucher
- Swiss Institute of Bioinformatics, Batiment Genopode, UNIL, 1015 Lausanne, Switzerland
- Swiss Institute for Experimental Cancer Research (ISREC), Ecole Polytechnique Fédérale de Lausanne (EPFL), AAB-021, CH-1015 Lausanne, Switzerland
| | - C Victor Jongeneel
- Swiss Institute of Bioinformatics, Batiment Genopode, UNIL, 1015 Lausanne, Switzerland
- Ludwig Institute for Cancer Research, Batiment Genopode, UNIL, 1015 Lausanne, Switzerland
| | - Felix Naef
- Swiss Institute of Bioinformatics, Batiment Genopode, UNIL, 1015 Lausanne, Switzerland
- Swiss Institute for Experimental Cancer Research (ISREC), Ecole Polytechnique Fédérale de Lausanne (EPFL), AAB-021, CH-1015 Lausanne, Switzerland
| |
Collapse
|
33
|
Le Texier V, Riethoven JJ, Kumanduri V, Gopalakrishnan C, Lopez F, Gautheret D, Thanaraj TA. AltTrans: transcript pattern variants annotated for both alternative splicing and alternative polyadenylation. BMC Bioinformatics 2006; 7:169. [PMID: 16556303 PMCID: PMC1435940 DOI: 10.1186/1471-2105-7-169] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2005] [Accepted: 03/23/2006] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The three major mechanisms that regulate transcript formation involve the selection of alternative sites for transcription start (TS), splicing, and polyadenylation. Currently there are efforts that collect data & annotation individually for each of these variants. It is important to take an integrated view of these data sets and to derive a data set of alternate transcripts along with consolidated annotation. We have been developing in the past computational pipelines that generate value-added data at genome-scale on individual variant types; these include AltSplice on splicing and AltPAS on polyadenylation. We now extend these pipelines and integrate the resultant data sets to facilitate an integrated view of the contributions from splicing and polyadenylation in the formation of transcript variants. DESCRIPTION The AltSplice pipeline examines gene-transcript alignments and delineates alternative splice events and splice patterns; this pipeline is extended as AltTrans to delineate isoform transcript patterns for each of which both introns/exons and 'terminating' polyA site are delineated; EST/mRNA sequences that qualify the transcript pattern confirm both the underlying splicing and polyadenylation. The AltPAS pipeline examines gene-transcript alignments and delineates all potential polyA sites irrespective of underlying splicing patterns. Resultant polyA sites from both AltTrans and AltPAS are merged. The generated database reports data on alternative splicing, alternative polyadenylation and the resultant alternate transcript patterns; the basal data is annotated for various biological features. The data (named as integrated AltTrans data) generated for both the organisms of human and mouse is made available through the Alternate Transcript Diversity web site at http://www.ebi.ac.uk/atd/. CONCLUSION The reported data set presents alternate transcript patterns that are annotated for both alternative splicing and alternative polyadenylation. Results based on current transcriptome data indicate that the contribution of alternative splicing is larger than that of alternative polyadenylation.
Collapse
Affiliation(s)
- Vincent Le Texier
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Jean-Jack Riethoven
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
- 18 Crispin Close, Haverhill, Suffolk, CB9 9PT, UK
| | - Vasudev Kumanduri
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Chellappa Gopalakrishnan
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Fabrice Lopez
- INSERM ERM206, Université de la Méditerranée, Luminy case 928 – 13 288 Marseille Cedex 09, France
| | - Daniel Gautheret
- INSERM ERM206, Université de la Méditerranée, Luminy case 928 – 13 288 Marseille Cedex 09, France
| | - Thangavel Alphonse Thanaraj
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
- 4 Copperfields, Saffron Walden, Essex, CB11 4FG, UK
| |
Collapse
|
34
|
Schwarz EM, Antoshechkin I, Bastiani C, Bieri T, Blasiar D, Canaran P, Chan J, Chen N, Chen WJ, Davis P, Fiedler TJ, Girard L, Harris TW, Kenny EE, Kishore R, Lawson D, Lee R, Müller HM, Nakamura C, Ozersky P, Petcherski A, Rogers A, Spooner W, Tuli MA, Van Auken K, Wang D, Durbin R, Spieth J, Stein LD, Sternberg PW. WormBase: better software, richer content. Nucleic Acids Res 2006; 34:D475-8. [PMID: 16381915 PMCID: PMC1347424 DOI: 10.1093/nar/gkj061] [Citation(s) in RCA: 65] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
WormBase (http://wormbase.org), the public database for genomics and biology of Caenorhabditis elegans, has been restructured for stronger performance and expanded for richer biological content. Performance was improved by accelerating the loading of central data pages such as the omnibus Gene page, by rationalizing internal data structures and software for greater portability, and by making the Genome Browser highly customizable in how it views and exports genomic subsequences. Arbitrarily complex, user-specified queries are now possible through Textpresso (for all available literature) and through WormMart (for most genomic data). Biological content was enriched by reconciling all available cDNA and expressed sequence tag data with gene predictions, clarifying single nucleotide polymorphism and RNAi sites, and summarizing known functions for most genes studied in this organism.
Collapse
Affiliation(s)
- Erich M Schwarz
- Division of Biology, 156-29 California Institute of Technology, Pasadena, CA, 91125, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
35
|
Loke JC, Stahlberg EA, Strenski DG, Haas BJ, Wood PC, Li QQ. Compilation of mRNA polyadenylation signals in Arabidopsis revealed a new signal element and potential secondary structures. PLANT PHYSIOLOGY 2005; 138:1457-68. [PMID: 15965016 PMCID: PMC1176417 DOI: 10.1104/pp.105.060541] [Citation(s) in RCA: 153] [Impact Index Per Article: 8.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/03/2023]
Abstract
Using a novel program, SignalSleuth, and a database containing authenticated polyadenylation [poly(A)] sites, we analyzed the composition of mRNA poly(A) signals in Arabidopsis (Arabidopsis thaliana), and reevaluated previously described cis-elements within the 3'-untranslated (UTR) regions, including near upstream elements and far upstream elements. As predicted, there are absences of high-consensus signal patterns. The AAUAAA signal topped the near upstream elements patterns and was found within the predicted location to only approximately 10% of 3'-UTRs. More importantly, we identified a new set, named cleavage elements, of poly(A) signals flanking both sides of the cleavage site. These cis-elements were not previously revealed by conventional mutagenesis and are contemplated as a cluster of signals for cleavage site recognition. Moreover, a single-nucleotide profile scan on the 3'-UTR regions unveiled a distinct arrangement of alternate stretches of U and A nucleotides, which led to a prediction of the formation of secondary structures. Using an RNA secondary structure prediction program, mFold, we identified three main types of secondary structures on the sequences analyzed. Surprisingly, these observed secondary structures were all interrupted in previously constructed mutations in these regions. These results will enable us to revise the current model of plant poly(A) signals and to develop tools to predict 3'-ends for gene annotation.
Collapse
Affiliation(s)
- Johnny C Loke
- Department of Botany, Miami University, Oxford, Ohio 45056, USA
| | | | | | | | | | | |
Collapse
|
36
|
Brown RH, Gross SS, Brent MR. Begin at the beginning: predicting genes with 5' UTRs. Genome Res 2005; 15:742-7. [PMID: 15867435 PMCID: PMC1088303 DOI: 10.1101/gr.3696205] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2005] [Accepted: 02/14/2005] [Indexed: 02/03/2023]
Abstract
The retrainable, comparative gene predictor N-SCAN integrates multigenome modeling and 5' untranslated region (5' UTR) modeling. In this article, we evaluate N-SCAN's transcription-start site (TSS) and first exon predictions both computationally and experimentally. The computational results indicate that N-SCAN is more accurate than any of the other tools we tested at predicting the TSS and the complete first exon. It is the only one of these tools that can predict complete gene structures together with 5' UTRs. Experimental evaluation shows that N-SCAN can be used to validate novel UTR introns in human gene predictions that do not overlap any RefSeq gene and even to correct RefSeq mRNAs by adding validated UTR exons that are missing from RefSeq.
Collapse
Affiliation(s)
- Randall H Brown
- Laboratory for Computational Genomics, Washington University, St. Louis, MO 63130, USA
| | | | | |
Collapse
|
37
|
Abstract
Messenger RNA polyadenylation is one of the key post-transcriptional events in eukaryotic cells. A large number of genes in mammalian species can undergo alternative polyadenylation, which leads to mRNAs with variable 3' ends. As the 3' end of mRNAs often contains cis elements important for mRNA stability, mRNA localization and translation, the implications of the regulation of polyadenylation can be multifold. Alternative polyadenylation is controlled by cis elements and trans factors, and is believed to occur in a tissue- or disease-specific manner. Given the availability of many databases devoted to other aspects of mRNA metabolism, such as transcriptional initiation and splicing, systematic information on polyadenylation, including alternative polyadenylation and its regulation, is noticeably lacking. Here, we present a database named polyA_DB, through which we strive to provide several types of information regarding polyadenylation in mammalian species: (i) polyadenylation sites and their locations with respect to the genomic structure of genes; (ii) cis elements surrounding polyadenylation sites; (iii) comparison of polyadenylation configuration between orthologous genes; and (iv) tissue/organ information for alternative polyadenylation sites. Currently, polyA_DB contains 45,565 polyadenylation sites for 25,097 human and mouse genes, representing the most comprehensive polyadenylation database till date. The database is accessible via the website (http://polya.umdnj.edu/polyadb).
Collapse
Affiliation(s)
- Haibo Zhang
- Center for Computational Biology and Bioengineering, New Jersey Institute of Technology, Newark, NJ 07102, USA
| | | | | | | |
Collapse
|
38
|
Qiu S, Adema CM, Lane T. A computational study of off-target effects of RNA interference. Nucleic Acids Res 2005; 33:1834-47. [PMID: 15800213 PMCID: PMC1072799 DOI: 10.1093/nar/gki324] [Citation(s) in RCA: 178] [Impact Index Per Article: 9.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2004] [Revised: 02/19/2005] [Accepted: 03/07/2005] [Indexed: 01/26/2023] Open
Abstract
RNA interference (RNAi) is an intracellular mechanism for post-transcriptional gene silencing that is frequently used to study gene function. RNAi is initiated by short interfering RNA (siRNA) of approximately 21 nt in length, either generated from the double-stranded RNA (dsRNA) by using the enzyme Dicer or introduced experimentally. Following association with an RNAi silencing complex, siRNA targets mRNA transcripts that have sequence identity for destruction. A phenotype resulting from this knockdown of expression may inform about the function of the targeted gene. However, 'off-target effects' compromise the specificity of RNAi if sequence identity between siRNA and random mRNA transcripts causes RNAi to knockdown expression of non-targeted genes. The complete off-target effects must be investigated systematically on each gene in a genome by adjusting a group of parameters, which is too expensive to conduct experimentally and motivates a study in silico. This computational study examined the potential for off-target effects of RNAi, employing the genome and transcriptome sequence data of Homo sapiens, Caenorhabditis elegans and Schizosaccharomyces pombe. The chance for RNAi off-target effects proved considerable, ranging from 5 to 80% for each of the organisms, when using as parameter the exact identity between any possible siRNA sequences (arbitrary length ranging from 17 to 28 nt) derived from a dsRNA (range 100-400 nt) representing the coding sequences of target genes and all other siRNAs within the genome. Remarkably, high-sequence specificity and low probability for off-target reactivity were optimally balanced for siRNA of 21 nt, the length observed mostly in vivo. The chance for off-target RNAi increased (although not always significantly) with greater length of the initial dsRNA sequence, inclusion into the analysis of available untranslated region sequences and allowing for mismatches between siRNA and target sequences. siRNA sequences from within 100 nt of the 5' termini of coding sequences had low chances for off-target reactivity. This may be owing to coding constraints for signal peptide-encoding regions of genes relative to regions that encode for mature proteins. Off-target distribution varied along the chromosomes of C.elegans, apparently owing to the use of more unique sequences in gene-dense regions. Finally, biological and thermodynamical descriptors of effective siRNA reduced the number of potential siRNAs compared with those identified by sequence identity alone, but off-target RNAi remained likely, with an off-target error rate of approximately 10%. These results also suggest a direction for future in vivo studies that could both help in calibrating true off-target rates in living organisms and also in contributing evidence toward the debate of whether siRNA efficacy is correlated with, or independent of, the target molecule. In summary, off-target effects present a real but not prohibitive concern that should be considered for RNAi experiments.
Collapse
Affiliation(s)
- Shibin Qiu
- Department of Computer Science, University of New MexicoAlbuquerque, NM 87131, USA
- Department of Biology, University of New MexicoAlbuquerque, NM 87131, USA
| | - Coen M. Adema
- Department of Biology, University of New MexicoAlbuquerque, NM 87131, USA
| | - Terran Lane
- To whom correspondence should be addressed at Department of Computer Science, University of New Mexico, Farris Engineering Building Room 325, Albuquerque, NM 87131-1386, USA. Tel: +1 505 277 9609; Fax: +1 505 277 9627;
| |
Collapse
|
39
|
Porter MY, Turmaine M, Mole SE. Identification and characterization ofCaenorhabditis elegans palmitoyl protein thioesterase1. J Neurosci Res 2005; 79:836-48. [PMID: 15672447 DOI: 10.1002/jnr.20403] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Infantile neuronal ceroid lipofuscinosis (INCL; Batten disease) is a severe neurodegenerative disorder of childhood characterized by the accumulation of autofluorescent storage material in lysosomes. It is caused by mutation of the CLN1/PPT1 gene, which encodes the lysosomal enzyme palmitoyl protein thioesterase-1 (PPT1), but the mechanism of disease pathogenesis and substrates for the enzyme are unknown. Caenorhabditis elegans is a simple nematode worm, with a fully sequenced genome, which is easy to maintain and manipulate. It has a completely mapped cell lineage and nervous system and has already provided clues about the pathogenesis of several human neuronal and lysosomal storage disorders. We have identified and characterized a PPT1 homologue in C. elegans. We found that, although this gene was not essential for the animal's survival, its mutation resulted in a mild developmental and reproductive phenotype, affected the number and size of mitochondria, and resulted in an abnormality in mitochondrial morphology, possibly suggestive of a role for this organelle in INCL pathogenesis. This strain, deleted for ppt-1, potentially provides a model system for the study of PPT1 and the pathogenesis of INCL.
Collapse
Affiliation(s)
- Morwenna Y Porter
- Department of Paediatrics and Child Health, Royal Free and University College Medical School, University College London, London, United Kingdom
| | | | | |
Collapse
|