51
|
Retelska D, Iseli C, Bucher P, Jongeneel CV, Naef F. Similarities and differences of polyadenylation signals in human and fly. BMC Genomics 2006; 7:176. [PMID: 16836751 PMCID: PMC1574307 DOI: 10.1186/1471-2164-7-176] [Citation(s) in RCA: 42] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2005] [Accepted: 07/12/2006] [Indexed: 02/02/2023] Open
Abstract
BACKGROUND Cleavage of messenger RNA (mRNA) precursors is an essential step in mRNA maturation. The signal recognized by the cleavage enzyme complex has been characterized as an A rich region upstream of the cleavage site containing a motif with consensus AAUAAA, followed by a U or UG rich region downstream of the cleavage site. RESULTS We studied these signals using exhaustive databases of cleavage sites obtained from aligning raw expressed sequence tags (EST) sequences to genomic sequences in Homo sapiens and Drosophila melanogaster. These data show that the polyadenylation signal is highly conserved in human and fly. In addition, de novo motif searches generated a refined description of the U-rich downstream sequence (DSE) element, which shows more divergence between the two species. These refined motifs are applied, within a Hidden Markov Model (HMM) framework, to predict mRNA cleavage sites. CONCLUSION We demonstrate that the DSE is a specific motif in both human and Drosophila. These findings shed light on the sequence correlates of a highly conserved biological process, and improve in silico prediction of 3' mRNA cleavage and polyadenylation sites.
Collapse
Affiliation(s)
- Dorota Retelska
- Swiss Institute of Bioinformatics, Batiment Genopode, UNIL, 1015 Lausanne, Switzerland
- Swiss Institute for Experimental Cancer Research (ISREC), Ecole Polytechnique Fédérale de Lausanne (EPFL), AAB-021, CH-1015 Lausanne, Switzerland
- Ludwig Institute for Cancer Research, Batiment Genopode, UNIL, 1015 Lausanne, Switzerland
| | - Christian Iseli
- Swiss Institute of Bioinformatics, Batiment Genopode, UNIL, 1015 Lausanne, Switzerland
- Ludwig Institute for Cancer Research, Batiment Genopode, UNIL, 1015 Lausanne, Switzerland
| | - Philipp Bucher
- Swiss Institute of Bioinformatics, Batiment Genopode, UNIL, 1015 Lausanne, Switzerland
- Swiss Institute for Experimental Cancer Research (ISREC), Ecole Polytechnique Fédérale de Lausanne (EPFL), AAB-021, CH-1015 Lausanne, Switzerland
| | - C Victor Jongeneel
- Swiss Institute of Bioinformatics, Batiment Genopode, UNIL, 1015 Lausanne, Switzerland
- Ludwig Institute for Cancer Research, Batiment Genopode, UNIL, 1015 Lausanne, Switzerland
| | - Felix Naef
- Swiss Institute of Bioinformatics, Batiment Genopode, UNIL, 1015 Lausanne, Switzerland
- Swiss Institute for Experimental Cancer Research (ISREC), Ecole Polytechnique Fédérale de Lausanne (EPFL), AAB-021, CH-1015 Lausanne, Switzerland
| |
Collapse
|
52
|
White J, Pacey-Miller T, Crawford A, Cordeiro G, Barbary D, Bundock P, Henry R. Abundant transcripts of malting barley identified by serial analysis of gene expression (SAGE). PLANT BIOTECHNOLOGY JOURNAL 2006; 4:289-301. [PMID: 17147635 DOI: 10.1111/j.1467-7652.2006.00181.x] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/09/2023]
Abstract
Serial analysis of gene expression (SAGE) was applied to the major cereal crop barley (Hordeum vulgare) to characterize the transcriptional profile of grain during the malting process. Seven SAGE libraries were generated from seed at different time points during malting, in addition to one library from dry mature seed. A total of 155,206 LongSAGE tags, representing 41,909 unique sequences, was generated. This study reports an in-depth analysis of the most abundant transcripts from each of eight specific time points in a malting barley time course. The 100 most abundant tags from each library were analysed to identify the putative functional role of highly abundant transcripts. The largest functional groups included transcripts coding for stress response and cell defence, ribosomal proteins and storage proteins. The most abundant tag represented B22EL8, a barley metallothionein, which showed significant up-regulation across the malting time course. Considerable changes in the abundance profiles of some of the highly abundant tags occurred at 24 h post-steeping, indicating that it may be an important time point for gene expression changes associated with barley seed germination.
Collapse
Affiliation(s)
- Jessica White
- Grain Foods CRC, Centre for Plant Conservation Genetics, Southern Cross University, PO Box 157, Lismore, NSW 2480, Australia
| | | | | | | | | | | | | |
Collapse
|
53
|
Chen JM, Férec C, Cooper DN. A systematic analysis of disease-associated variants in the 3' regulatory regions of human protein-coding genes I: general principles and overview. Hum Genet 2006; 120:1-21. [PMID: 16645853 DOI: 10.1007/s00439-006-0180-7] [Citation(s) in RCA: 105] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2006] [Accepted: 03/26/2006] [Indexed: 10/24/2022]
Abstract
The 3' regulatory regions (3' RRs) of human genes play an important role in regulating mRNA 3' end formation, stability/degradation, nuclear export, subcellular localization and translation and are consequently rich in regulatory elements. Although 3' RRs contain only approximately 0.2% of known disease-associated mutations, this is likely to represent a rather conservative estimate of their actual prevalence. In an attempt to catalogue 3' RR-mediated disease and also to gain a greater understanding of the functional role of regulatory elements within 3' RRs, we have performed a systematic analysis of disease-associated 3' RR variants; 121 3' RR variants in 94 human genes were collated. These included 17 mutations in the upstream core polyadenylation signal sequence (UCPAS), 81 in the upstream sequence (USS) between the translational termination codon and the UCPAS, 6 in the left arm of the 'spacer' sequence (LAS) between the UCPAS and the pre-mRNA cleavage site (CS), 3 in the right arm of the 'spacer' sequence (RAS) or downstream core polyadenylation signal sequence (DCPAS) and 7 in the downstream sequence (DSS) of the 3'-flanking region, with 7 further mutations being treated as isolated examples. All the UCPAS mutations and the rather unusual cases of the DMPK, SCA8, FCMD and GLA mutations exert a significant effect on the mRNA phenotype and are usually associated with monogenic disease. By contrast, most of the remaining variants are polymorphisms that exert a comparatively minor influence on mRNA expression, but which may nevertheless predispose to or otherwise modify complex clinical phenotypes. Considerable efforts have been made to validate/elucidate the mechanisms through which the 3' untranslated region (3' UTR) variants affect gene expression. It is hoped that the integrative approach employed here in the study of naturally occurring variants of actual or potential pathological significance will serve to complement ongoing efforts to identify all functional regulatory elements in the human genome.
Collapse
|
54
|
Le Texier V, Riethoven JJ, Kumanduri V, Gopalakrishnan C, Lopez F, Gautheret D, Thanaraj TA. AltTrans: transcript pattern variants annotated for both alternative splicing and alternative polyadenylation. BMC Bioinformatics 2006; 7:169. [PMID: 16556303 PMCID: PMC1435940 DOI: 10.1186/1471-2105-7-169] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2005] [Accepted: 03/23/2006] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The three major mechanisms that regulate transcript formation involve the selection of alternative sites for transcription start (TS), splicing, and polyadenylation. Currently there are efforts that collect data & annotation individually for each of these variants. It is important to take an integrated view of these data sets and to derive a data set of alternate transcripts along with consolidated annotation. We have been developing in the past computational pipelines that generate value-added data at genome-scale on individual variant types; these include AltSplice on splicing and AltPAS on polyadenylation. We now extend these pipelines and integrate the resultant data sets to facilitate an integrated view of the contributions from splicing and polyadenylation in the formation of transcript variants. DESCRIPTION The AltSplice pipeline examines gene-transcript alignments and delineates alternative splice events and splice patterns; this pipeline is extended as AltTrans to delineate isoform transcript patterns for each of which both introns/exons and 'terminating' polyA site are delineated; EST/mRNA sequences that qualify the transcript pattern confirm both the underlying splicing and polyadenylation. The AltPAS pipeline examines gene-transcript alignments and delineates all potential polyA sites irrespective of underlying splicing patterns. Resultant polyA sites from both AltTrans and AltPAS are merged. The generated database reports data on alternative splicing, alternative polyadenylation and the resultant alternate transcript patterns; the basal data is annotated for various biological features. The data (named as integrated AltTrans data) generated for both the organisms of human and mouse is made available through the Alternate Transcript Diversity web site at http://www.ebi.ac.uk/atd/. CONCLUSION The reported data set presents alternate transcript patterns that are annotated for both alternative splicing and alternative polyadenylation. Results based on current transcriptome data indicate that the contribution of alternative splicing is larger than that of alternative polyadenylation.
Collapse
Affiliation(s)
- Vincent Le Texier
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Jean-Jack Riethoven
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
- 18 Crispin Close, Haverhill, Suffolk, CB9 9PT, UK
| | - Vasudev Kumanduri
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Chellappa Gopalakrishnan
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Fabrice Lopez
- INSERM ERM206, Université de la Méditerranée, Luminy case 928 – 13 288 Marseille Cedex 09, France
| | - Daniel Gautheret
- INSERM ERM206, Université de la Méditerranée, Luminy case 928 – 13 288 Marseille Cedex 09, France
| | - Thangavel Alphonse Thanaraj
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
- 4 Copperfields, Saffron Walden, Essex, CB11 4FG, UK
| |
Collapse
|
55
|
Lakshman DK, Liu C, Mishra PK, Tavantzis S. Characterization of the arom gene in Rhizoctonia solani, and transcription patterns under stable and induced hypovirulence conditions. Curr Genet 2006; 49:166-77. [PMID: 16479402 DOI: 10.1007/s00294-005-0005-6] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2005] [Revised: 05/19/2005] [Accepted: 06/11/2005] [Indexed: 10/25/2022]
Abstract
The quinate pathway is induced by quinate in the wild-type virulent Rhizoctonia solani isolate Rhs 1AP but is constitutive in the hypovirulent, M2 dsRNA-containing isolate Rhs 1A1. Constitutive expression of the quinate pathway results in downregulation of the shikimate pathway, which includes the pentafunctional arom gene in Rhs 1A1. The arom gene has 5,323 bp including five introns as opposed to a single intron found in arom in ascomycetes. A 199-bp upstream sequence has a GC box, no TATAA box, but two GTATTAGA repeats. The largest arom transcript is 5,108 nucleotides long, excluding the poly(A) tail. It contains an open reading frame of 4,857 bases, coding for a putative 1,618-residue pentafunctional AROM protein. A Kozak sequence (GCGCCATGG) is present between +127 and +135. The 5'-end of the arom mRNA includes two nucleotides (UA) that are not found in the genomic sequence, and are probably added post-transcriptionally. Size and sequence heterogeneity were observed at both 5'- and 3'-end of the mRNA. Northern blot and suppression subtractive hybridization analyses showed that presence of a low amount of quinate, inducer of the quinate pathway, resulted in increased levels of arom mRNA, consistent with the compensation effect observed in ascomycetes.
Collapse
Affiliation(s)
- Dilip K Lakshman
- Department of Biological Sciences, University of Maine, Orono, ME 04469-5735, USA
| | | | | | | |
Collapse
|
56
|
|
57
|
Coemans B, Matsumura H, Terauchi R, Remy S, Swennen R, Sági L. SuperSAGE combined with PCR walking allows global gene expression profiling of banana (Musa acuminata), a non-model organism. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2005; 111:1118-26. [PMID: 16133315 DOI: 10.1007/s00122-005-0039-7] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/21/2005] [Accepted: 07/04/2005] [Indexed: 05/04/2023]
Abstract
Super-serial analysis of gene expression (SuperSAGE) was used to characterize, for the first time, the global gene expression pattern in banana (Musa acuminata). A total of 10,196 tags were generated from leaf tissue, representing 5,292 expressed genes. Forty-nine tags of the top 100 most abundantly expressed transcripts were annotated by homology to cDNA or EST sequences. Typically for leaf tissue, analysis of the transcript profiles showed that the majority of the abundant transcripts are involved in energy production, mainly photosynthesis. However, the most abundant tag was derived from a type 3 metallothionein transcript, which accounted for nearly 3% of total transcripts analysed. Furthermore, the 26-bp long SuperSAGE tags were applied in 3'-rapid amplification of cDNA ends (3'RACE) for the identification of unknown tags. In combination with thermal asymmetric interlaced PCR (TAIL-PCR), this allowed the recovery of a full gene sequence of a novel NADPH:protochlorophyllide oxidoreductase, the key enzyme in chlorophyll biosynthesis. SuperSAGE in conjunction with 3'RACE and TAIL-PCR will be a powerful tool for transcriptomics of non-model, but otherwise important organisms.
Collapse
Affiliation(s)
- Bert Coemans
- Laboratory of Tropical Crop Improvement, Katholieke Universiteit Leuven, Kasteelpark Arenberg 13, 3001 Leuven, Belgium.
| | | | | | | | | | | |
Collapse
|
58
|
van Ruissen F, Ruijter JM, Schaaf GJ, Asgharnegad L, Zwijnenburg DA, Kool M, Baas F. Evaluation of the similarity of gene expression data estimated with SAGE and Affymetrix GeneChips. BMC Genomics 2005; 6:91. [PMID: 15955238 PMCID: PMC1186021 DOI: 10.1186/1471-2164-6-91] [Citation(s) in RCA: 51] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2004] [Accepted: 06/14/2005] [Indexed: 12/20/2022] Open
Abstract
Background Serial Analysis of Gene Expression (SAGE) and microarrays have found awidespread application, but much ambiguity exists regarding the evaluation of these technologies. Cross-platform utilization of gene expression data from the SAGE and microarray technology could reduce the need for duplicate experiments and facilitate a more extensive exchange of data within the research community. This requires a measure for the correspondence of the different gene expression platforms. To date, a number of cross-platform evaluations (including a few studies using SAGE and Affymetrix GeneChips) have been conducted showing a variable, but overall low, concordance. This study evaluates these overall measures and introduces the between-ratio difference as a concordance measure pergene. Results In this study, gene expression measurements of Unigene clusters represented by both Affymetrix GeneChips HG-U133A and SAGE were compared using two independent RNA samples. After matching of the data sets the final comparison contains a small data set of 1094 unique Unigene clusters, which is unbiased with respect to expression level. Different overall correlation approaches, like Up/Down classification, contingency tables and correlation coefficients were used to compare both platforms. In addition, we introduce a novel approach to compare two platforms based on the calculation of differences between expression ratios observed in each platform for each individual transcript. This approach results in a concordance measure per gene (with statistical probability value), as opposed to the commonly used overall concordance measures between platforms. Conclusion We can conclude that intra-platform correlations are generally good, but that overall agreement between the two platforms is modest. This might be due to the binomially distributed sampling variation in SAGE tag counts, SAGE annotation errors and the intensity variation between probe sets of a single gene in Affymetrix GeneChips. We cannot identify or advice which platform performs better since both have their (dis)-advantages. Therefore it is strongly recommended to perform follow-up studies of interesting genes using additional techniques. The newly introduced between-ratio difference is a filtering-independent measure for between-platform concordance. Moreover, the between-ratio difference per gene can be used to detect transcripts with similar regulation on both platforms.
Collapse
Affiliation(s)
- Fred van Ruissen
- Department of Neurogenetics, Academic Medical Center, University of Amsterdam, Meibergdreef 9, 1105 AZ Amsterdam, The Netherlands
| | - Jan M Ruijter
- Department of Anatomy and Embryology, Academic Medical Center, University of Amsterdam, Meibergdreef 9, 1105 AZ Amsterdam, The Netherlands
| | - Gerben J Schaaf
- Department of Neurogenetics, Academic Medical Center, University of Amsterdam, Meibergdreef 9, 1105 AZ Amsterdam, The Netherlands
- Department of Human Genetics, Academic Medical Center, University of Amsterdam, Meibergdreef 9, 1105 AZ Amsterdam, The Netherlands
| | - Lida Asgharnegad
- Department of Neurogenetics, Academic Medical Center, University of Amsterdam, Meibergdreef 9, 1105 AZ Amsterdam, The Netherlands
- Department of Human Genetics, Academic Medical Center, University of Amsterdam, Meibergdreef 9, 1105 AZ Amsterdam, The Netherlands
| | - Danny A Zwijnenburg
- Department of Neurogenetics, Academic Medical Center, University of Amsterdam, Meibergdreef 9, 1105 AZ Amsterdam, The Netherlands
- Department of Human Genetics, Academic Medical Center, University of Amsterdam, Meibergdreef 9, 1105 AZ Amsterdam, The Netherlands
| | - Marcel Kool
- Department of Neurogenetics, Academic Medical Center, University of Amsterdam, Meibergdreef 9, 1105 AZ Amsterdam, The Netherlands
- Department of Human Genetics, Academic Medical Center, University of Amsterdam, Meibergdreef 9, 1105 AZ Amsterdam, The Netherlands
| | - Frank Baas
- Department of Neurogenetics, Academic Medical Center, University of Amsterdam, Meibergdreef 9, 1105 AZ Amsterdam, The Netherlands
| |
Collapse
|
59
|
Rozman D, Seliskar M, Cotman M, Fink M. Pre-cholesterol precursors in gametogenesis. Mol Cell Endocrinol 2005; 234:47-56. [PMID: 15836952 DOI: 10.1016/j.mce.2004.11.009] [Citation(s) in RCA: 18] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/19/2004] [Accepted: 11/09/2004] [Indexed: 10/25/2022]
Abstract
Meiosis activating sterols (MAS) are biologically active post-lanosterol intermediates of cholesterol biosynthesis that are synthetized primarily in the gonads, including the sperm. MAS reinitiate the meiosis of oocytes in vitro while in vivo they seem to contribute to the oocyte quality and the progression of meiosis. The mRNAs for the MAS-producing enzyme lanosterol 14alpha-demethylase (CYP51) arise by alternative poly (A) signal selection. Only signals with low cleavage activity are used in the testis. Translation of mammalian CYP51s starts at one of the tandem in-frame ATGs. CYP51 protein of the bull is shorter compared to the human due to the usage of a more downstream translation start site. CYP51 proteins are post-translationally modified by glycosylations in the Golgi and on acrosomal membranes of the sperm. Green fluorescence protein-based ex vivo system has been developed to aid studying the intracellular transport of the MAS-producing CYP51. The influence of the post-translational modifications on MAS-synthesizing capacity is under investigation.
Collapse
Affiliation(s)
- Damjana Rozman
- Medical Centre for Molecular Biology, Institute of Biochemistry, Faculty of Medicine, University of Ljubljana, Vrazov trg 2, SI-1000 Ljubljana, Slovenia.
| | | | | | | |
Collapse
|
60
|
Abstract
Messenger RNA polyadenylation is one of the key post-transcriptional events in eukaryotic cells. A large number of genes in mammalian species can undergo alternative polyadenylation, which leads to mRNAs with variable 3' ends. As the 3' end of mRNAs often contains cis elements important for mRNA stability, mRNA localization and translation, the implications of the regulation of polyadenylation can be multifold. Alternative polyadenylation is controlled by cis elements and trans factors, and is believed to occur in a tissue- or disease-specific manner. Given the availability of many databases devoted to other aspects of mRNA metabolism, such as transcriptional initiation and splicing, systematic information on polyadenylation, including alternative polyadenylation and its regulation, is noticeably lacking. Here, we present a database named polyA_DB, through which we strive to provide several types of information regarding polyadenylation in mammalian species: (i) polyadenylation sites and their locations with respect to the genomic structure of genes; (ii) cis elements surrounding polyadenylation sites; (iii) comparison of polyadenylation configuration between orthologous genes; and (iv) tissue/organ information for alternative polyadenylation sites. Currently, polyA_DB contains 45,565 polyadenylation sites for 25,097 human and mouse genes, representing the most comprehensive polyadenylation database till date. The database is accessible via the website (http://polya.umdnj.edu/polyadb).
Collapse
Affiliation(s)
- Haibo Zhang
- Center for Computational Biology and Bioengineering, New Jersey Institute of Technology, Newark, NJ 07102, USA
| | | | | | | |
Collapse
|
61
|
Ibrahim AFM, Hedley PE, Cardle L, Kruger W, Marshall DF, Muehlbauer GJ, Waugh R. A comparative analysis of transcript abundance using SAGE and Affymetrix arrays. Funct Integr Genomics 2005; 5:163-74. [PMID: 15714318 DOI: 10.1007/s10142-005-0135-4] [Citation(s) in RCA: 34] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2004] [Revised: 12/13/2004] [Accepted: 12/22/2004] [Indexed: 12/18/2022]
Abstract
A number of methods are currently used for gene expression profiling. They differ in scale, economy and sensitivity. We present the results of a direct comparison between serial analysis of gene expression (SAGE) and the Barley1 Affymetrix GeneChip. Both technology platforms were used to obtain quantitative measurements of transcript abundance using identical RNA samples and assessed for their ability to quantify differential gene expression. For SAGE, a total of 82,122 tags were generated from two independent libraries representing whole developing barley caryopsis and dissected embryos. The Barley1 GeneChip contains 22,791 probe sets. Results obtained from both methods are generally comparable, indicating that both will lead to similar conclusions regarding transcript levels and differential gene expression. However, excluding singletons, 24.4% of the unique SAGE tags had no corresponding probe set on the Barley1 array indicating that a broader snapshot of gene expression was obtained by SAGE. Discrepancies were observed for a number of "genes" and these are discussed.
Collapse
Affiliation(s)
- Adel F M Ibrahim
- Genome Dynamics, Scottish Crop Research Institute, Invergowrie, Dundee, UK.
| | | | | | | | | | | | | |
Collapse
|
62
|
Tian B, Hu J, Zhang H, Lutz CS. A large-scale analysis of mRNA polyadenylation of human and mouse genes. Nucleic Acids Res 2005; 33:201-12. [PMID: 15647503 PMCID: PMC546146 DOI: 10.1093/nar/gki158] [Citation(s) in RCA: 707] [Impact Index Per Article: 37.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/30/2023] Open
Abstract
mRNA polyadenylation is a critical cellular process in eukaryotes. It involves 3′ end cleavage of nascent mRNAs and addition of the poly(A) tail, which plays important roles in many aspects of the cellular metabolism of mRNA. The process is controlled by various cis-acting elements surrounding the cleavage site, and their binding factors. In this study, we surveyed genome regions containing cleavage sites [herein called poly(A) sites], for 13 942 human and 11 155 mouse genes. We found that a great proportion of human and mouse genes have alternative polyadenylation (∼54 and 32%, respectively). The conservation of alternative polyadenylation type or polyadenylation configuration between human and mouse orthologs is statistically significant, indicating that alternative polyadenylation is widely employed by these two species to produce alternative gene transcripts. Genes belonging to several functional groups, indicated by their Gene Ontology annotations, are biased with respect to polyadenylation configuration. Many poly(A) sites harbor multiple cleavage sites (51.25% human and 46.97% mouse sites), leading to heterogeneous 3′ end formation for transcripts. This implies that the cleavage process of polyadenylation is largely imprecise. Different types of poly(A) sites, with regard to their relative locations in a gene, are found to have distinct nucleotide composition in surrounding genomic regions. This large-scale study provides important insights into the mechanism of polyadenylation in mammalian species and represents a genomic view of the regulation of gene expression by alternative polyadenylation.
Collapse
Affiliation(s)
- Bin Tian
- Department of Biochemistry and Molecular Biology, New Jersey Medical School UMDNJ, Newark, NJ 07101, USA.
| | | | | | | |
Collapse
|
63
|
Quéré R, Manchon L, Lejeune M, Clément O, Pierrat F, Bonafoux B, Commes T, Piquemal D, Marti J. Mining SAGE data allows large-scale, sensitive screening of antisense transcript expression. Nucleic Acids Res 2004; 32:e163. [PMID: 15561998 PMCID: PMC534641 DOI: 10.1093/nar/gnh161] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
As a growing number of complementary transcripts, susceptible to exert various regulatory functions, are being found in eukaryotes, high throughput analytical methods are needed to investigate their expression in multiple biological samples. Serial Analysis of Gene Expression (SAGE), based on the enumeration of directionally reliable short cDNA sequences (tags), is capable of revealing antisense transcripts. We initially detected them by observing tags that mapped on to the reverse complement of known mRNAs. The presence of such tags in individual SAGE libraries suggested that SAGE datasets contain latent information on antisense transcripts. We raised a collection of virtual tags for mining these data. Tag pairs were assembled by searching for complementarities between 24-nt long sequences centered on the potential SAGE-anchoring sites of well-annotated human expressed sequences. An analysis of their presence in a large collection of published SAGE libraries revealed transcripts expressed at high levels from both strands of two adjacent, oppositely oriented, transcription units. In other cases, the respective transcripts of such cis-oriented genes displayed a mutually exclusive expression pattern or were co-expressed in a small number of libraries. Other tag pairs revealed overlapping transcripts of trans-encoded unique genes. Finally, we isolated a group of tags shared by multiple transcripts. Most of them mapped on to retroelements, essentially represented in humans by Alu sequences inserted in opposite orientations in the 3'UTR of otherwise different mRNAs. Registering these tags in separate files makes possible computational searches focused on unique sense-antisense pairs. The method developed in the present work shows that SAGE datasets constitute a major resource of rapidly investigating with high sensitivity the expression of antisense transcripts, so that a single tag may be detected in one library when screening a large number of biological samples.
Collapse
Affiliation(s)
- Ronan Quéré
- Institut de Génétique Humaine, UPR CNRS 1142, 141 rue de la Cardonille, 34396 Montpellier, France
| | | | | | | | | | | | | | | | | |
Collapse
|
64
|
Hoarau JJ, Cesari M, Caillens H, Cadet F, Pabion M. HLA DQA1 genes generate multiple transcripts by alternative splicing and polyadenylation of the 3' untranslated region. ACTA ACUST UNITED AC 2004; 63:58-71. [PMID: 14651525 DOI: 10.1111/j.1399-0039.2004.00140.x] [Citation(s) in RCA: 17] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Regulation of the human leucocyte antigen (HLA) class II genes expression is an important field in immunology, because these molecules play a crucial role in the function of the immune system. HLA DQ genes expression is a complex phenomenon regulated at both transcriptional and post-transcriptional levels. In this study, we have investigated the post-transcriptional mechanisms accounting for alleles-dependent length polymorphism of DQA1 mRNA. We have first sequenced the genomic DNA encoding the 3' untranslated region (UTR) of DQA1 *0101, *0102, *0103, *0201, *0301, *0401, and *0501 alleles. We have identified two competing splicing sites: a unique splicing donor site AG/GTA located 20 nucleotides downstream from the stop codon associated to two spliced acceptor sequences, approximately 165 and approximately 370 nucleotides downstream. In addition, three polyadenylation signals have been identified, respectively, at approximately 475, approximately 795, and approximately 855 nucleotides downstream from the stop codon. Subsequently, we have analyzed mRNAs derived from DQA1 alleles in homozygous B lymphoblastoid cell lines by reverse transcriptase-polymerase chain reaction. We show that allele-dependent length polymorphism of DQA1 mRNA-3' UTR results from a combination of differential splicing and alternative polyadenylations. Four mRNA isoforms (two spliced variant cleaved at two distinct polyadenylation sites) were detected in DQA1 *0101, *0102, and *0103 homozygous cell lines, and six mRNA species (three spliced variant cleaved at two polyadenylation-sequence signal) were generated by the other four alleles. Possible advantages for cells to generate multiple transcripts previously undetected are discussed.
Collapse
Affiliation(s)
- J-J Hoarau
- Laboratoire de Biochimie et Génétique Moléculaire, Université de la Réunion, La Réunion, France
| | | | | | | | | |
Collapse
|
65
|
Druker R, Bruxner TJ, Lehrbach NJ, Whitelaw E. Complex patterns of transcription at the insertion site of a retrotransposon in the mouse. Nucleic Acids Res 2004; 32:5800-8. [PMID: 15520464 PMCID: PMC528799 DOI: 10.1093/nar/gkh914] [Citation(s) in RCA: 93] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Here we report that transcriptional effects of the insertion of a retrotransposon can occur simultaneously both upstream and downstream of the insertion site. We have identified an intra-cisternal A particle (IAP) retrotransposon in intron 6 of a gene that we have named Cabp (CDK5 activator binding protein). The presence of the IAP is associated with an aberrant transcript initiating from a cryptic promoter in the IAP, reading out into the adjacent Cabp gene sequence. The expression of this transcript is highly variable among isogenic mice within the C57BL/6J strain and so Cabp(IAP) can be classified as a metastable epiallele. As expected, the presence or absence of the transcript correlates with differential DNA methylation of the 5' LTR of the IAP. More surprisingly, in mice where the retrotransposon is unmethylated and presumably transcriptionally active, we find a number of short Cabp transcripts which initiate at the normal 5' end of the gene but terminate prematurely, just 5' of the retrotransposon. This is the first report of a retrotransposon having both upstream and downstream effects on transcription at the site of insertion and it suggests that alternative polyadenylation may sometimes be caused by a downstream convergent transcription unit.
Collapse
Affiliation(s)
- Riki Druker
- School of Molecular and Microbial Biosciences, Biochemistry Building G08, University of Sydney, NSW 2006, Australia
| | | | | | | |
Collapse
|
66
|
Coyne KJ, Burkholder JM, Feldman RA, Hutchins DA, Cary SC. Modified serial analysis of gene expression method for construction of gene expression profiles of microbial eukaryotic species. Appl Environ Microbiol 2004; 70:5298-304. [PMID: 15345413 PMCID: PMC520878 DOI: 10.1128/aem.70.9.5298-5304.2004] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2004] [Accepted: 05/13/2004] [Indexed: 11/20/2022] Open
Abstract
Serial analysis of gene expression (SAGE) is a powerful approach for the identification of differentially expressed genes, providing comprehensive and quantitative gene expression profiles in the form of short tag sequences. Each tag represents a unique transcript, and the relative frequencies of tags in the SAGE library are equal to the relative proportions of the transcripts they represent. One of the major obstacles in the preparation of SAGE libraries from microorganisms is the requirement for large amounts of starting material (i.e., mRNA). Here, we present a novel approach for the construction of SAGE libraries from small quantities of total RNA by using Y linkers to selectively amplify 3' cDNA fragments. To validate this method, we constructed comprehensive gene expression profiles of the toxic dinoflagellate Pfiesteria shumwayae. SAGE libraries were constructed from an actively toxic fish-fed culture of P. shumwayae and from a recently toxic alga-fed culture. P. shumwayae-specific gene transcripts were identified by comparison of tag sequences in the two libraries. Representative tags with frequencies ranging from 0.026 to 3.3% of the total number of tags in the libraries were chosen for further analysis. Expression of each transcript was confirmed in separate control cultures of toxic P. shumwayae. The modified SAGE method described here produces gene expression profiles that appear to be both comprehensive and quantitative, and it is directly applicable to the study of gene expression in other environmentally relevant microbial species.
Collapse
Affiliation(s)
- Kathryn J Coyne
- Graduate College of Marine Studies, University of Delaware, 700 Pilottown Rd., Lewes, DE 19958, USA
| | | | | | | | | |
Collapse
|
67
|
Hashimoto SI, Suzuki Y, Kasai Y, Morohoshi K, Yamada T, Sese J, Morishita S, Sugano S, Matsushima K. 5′-end SAGE for the analysis of transcriptional start sites. Nat Biotechnol 2004; 22:1146-9. [PMID: 15300261 DOI: 10.1038/nbt998] [Citation(s) in RCA: 78] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2004] [Accepted: 06/07/2004] [Indexed: 11/09/2022]
Abstract
Identification of the mRNA start site is essential in establishing the full-length cDNA sequence of a gene and analyzing its promoter region, which regulates gene expression. Here we describe the development of a 5'-end serial analysis of gene expression (5' SAGE) that can be used to globally identify transcriptional start sites and the frequency of individual mRNAs. Of the 25,684 5' SAGE tags in the HEK293 human cell library, 19,893 matched to the human genome. Among 15,448 tags in one locus of the genome, 85.8%-96.1% of the 5' SAGE tags were assigned within -500 to +200 nt of mRNA start sites using the RefSeq, UniGene and DBTSS databases. This technique should facilitate 5'-end transcriptome analysis in a variety of cells and tissues.
Collapse
|
68
|
Noble CG, Walker PA, Calder LJ, Taylor IA. Rna14-Rna15 assembly mediates the RNA-binding capability of Saccharomyces cerevisiae cleavage factor IA. Nucleic Acids Res 2004; 32:3364-75. [PMID: 15215336 PMCID: PMC443540 DOI: 10.1093/nar/gkh664] [Citation(s) in RCA: 35] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
The Rna14-Rna15 complex is a core component of the cleavage factor IA RNA-processing complex from Saccharomyces cerevisiae. To understand the assembly and RNA-binding properties, we have isolated and characterized the Rna14-Rna15 complex using a combination of biochemical and biophysical methods. Analysis of the purified complex, using transmission electron microscopy, reveals that the two proteins assemble into a kinked rod-shaped structure and that these rods are able to further self-associate. Analytical ultracentrifugation reveals that Rna14 mediates this association and facilitates assembly of an A2B2 tetramer (M(r) 230 000), where relatively compact Rna14-Rna15 heterodimers are in rapid equilibrium with tetramers that have a more extended shape. The Rna14-Rna15 complex, unlike the individual components, binds to an RNA oligonucleotide derived from the 3'-untranslated region of the S.cerevisiae GAL7 gene. Based on these structural and thermodynamic data, we propose that CFIA assembly regulates RNA-binding activity.
Collapse
Affiliation(s)
- Christian G Noble
- Division of Protein Structur, National Institute for Medical Research, The Ridgeway, Mill Hill, London NW7 1AA, UK
| | | | | | | |
Collapse
|
69
|
Cerutti JM, Delcelo R, Amadei MJ, Nakabashi C, Maciel RMB, Peterson B, Shoemaker J, Riggins GJ. A preoperative diagnostic test that distinguishes benign from malignant thyroid carcinoma based on gene expression. J Clin Invest 2004; 113:1234-42. [PMID: 15085203 PMCID: PMC385398 DOI: 10.1172/jci19617] [Citation(s) in RCA: 142] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2003] [Accepted: 02/17/2004] [Indexed: 01/16/2023] Open
Abstract
Accurate diagnosis of thyroid tumors is challenging. A particular problem is distinguishing between follicular thyroid carcinoma (FTC) and benign follicular thyroid adenoma (FTA), where histology of fine-needle aspirates is not conclusive. It is often necessary to remove healthy thyroid to rule out carcinoma. In order to find markers to improve diagnosis, we quantified gene transcript expression from FTC, FTA, and normal thyroid, revealing 73 differentially expressed transcripts (P < or = 0.0001). Using an independent set of 23 FTCs, FTAs, and matched normal thyroids, 17 genes with large expression differences were tested by real-time RT-PCR. Four genes (DDIT3, ARG2, ITM1, and C1orf24) differed between the two classes FTC and FTA, and a linear combination of expression levels distinguished FTC from FTA with an estimated predictive accuracy of 0.83. Furthermore, immunohistochemistry for DDIT3 and ARG2 showed consistent staining for carcinoma in an independent set 59 follicular tumors (estimated concordance, 0.76; 95% confidence interval, [0.59, 0.93]). A simple test based on a combination of these markers might improve preoperative diagnosis of thyroid nodules, allowing better treatment decisions and reducing long-term health costs.
Collapse
Affiliation(s)
- Janete M Cerutti
- Laboratory of Molecular Endocrinology, Division of Endocrinology, Department of Medicine, Federal University of São Paulo, Brazil
| | | | | | | | | | | | | | | |
Collapse
|
70
|
Pauws E, Veenboer GJM, Smit JWA, de Vijlder JJM, Morreau H, Ris-Stalpers C. Genes differentially expressed in thyroid carcinoma identified by comparison of SAGE expression profiles. FASEB J 2004; 18:560-1. [PMID: 14715705 DOI: 10.1096/fj.03-0101fje] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]
Abstract
To identify transcripts that distinguish malignant from benign thyroid disease serial analysis of gene expression (SAGE) profiles of papillary thyroid carcinoma and of normal thyroid are compared. Of the 21,000 tags analyzed, 204 tags are differentially expressed with statistical significance in the tumor. Thyroid tumor specificity of these transcripts is determined in silico using the tissue preferential expression (TPE) algorithm. TPE values demonstrate that 42 tags of the 204 are thyroid tumor specific. BC013035, a cDNA encoding a novel protein, is up-regulated from 0 to 24 tags in the thyroid tumor SAGE library. In a tissue panel of 30 thyroid tumors and 12 controls, it has an expression pattern similar to thyroid peroxidase, indicating possible involvement of BC013035 in thyroid differentiation. A tag coding for extracellular matrix protein 1 (ECM1) is absent in the normal thyroid SAGE library and present 55 times in the tumor. ECM1, a protein recently associated with angiogenesis and expressed in metastatic breast carcinoma, is up-regulated in 50% of all thyroid carcinoma and absent in normal controls and follicular adenoma. In conclusion, SAGE analysis and subsequent determination of TPE values facilitates the rapid distinction of genes specifically expressed in cancer tissues.
Collapse
Affiliation(s)
- Erwin Pauws
- Laboratory of Pediatric Endocrinology, Academic Medical Center, Amsterdam, The Netherlands
| | | | | | | | | | | |
Collapse
|
71
|
Tuteja R, Tuteja N. Serial Analysis of Gene Expression: Applications in Human Studies. J Biomed Biotechnol 2004; 2004:113-120. [PMID: 15240922 PMCID: PMC548805 DOI: 10.1155/s1110724304308119] [Citation(s) in RCA: 17] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open
Abstract
Serial analysis of gene expression (SAGE) is a powerful tool, which provides quantitative and comprehensive expression profile of genes in a given cell population. It works by isolating short fragments of genetic information from the expressed genes that are present in the cell being studied. These short sequences, called SAGE tags, are linked together for efficient sequencing. The frequency of each SAGE tag in the cloned multimers directly reflects the transcript abundance. Therefore, SAGE results in an accurate picture of gene expression at both the qualitative and the quantitative levels. It does not require a hybridization probe for each transcript and allows new genes to be discovered. This technique has been applied widely in human studies and various SAGE tags/SAGE libraries have been generated from different cells/tissues such as dendritic cells, lung fibroblast cells, oocytes, thyroid tissue, B-cell lymphoma, cultured keratinocytes, muscles, brain tissues, sciatic nerve, cultured Schwann cells, cord blood-derived mast cells, retina, macula, retinal pigment epithelial cells, skin cells, and so forth. In this review we present the updated information on the applications of SAGE technology mainly to human studies.
Collapse
Affiliation(s)
- Renu Tuteja
- International Centre for Genetic Engineering and Biotechnology, Aruna Asaf Ali Marg, New Delhi 110067, India
- *Renu Tuteja:
| | - Narendra Tuteja
- International Centre for Genetic Engineering and Biotechnology, Aruna Asaf Ali Marg, New Delhi 110067, India
| |
Collapse
|
72
|
von Ahsen N, Oellerich M. The intronic prothrombin 19911A>G polymorphism influences splicing efficiency and modulates effects of the 20210G>A polymorphism on mRNA amount and expression in a stable reporter gene assay system. Blood 2003; 103:586-93. [PMID: 14504098 DOI: 10.1182/blood-2003-02-0419] [Citation(s) in RCA: 46] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
The common prothrombin gene cleavage site mutation 20210G>A is associated with elevated prothrombin levels and thrombosis. The pathomechanism of the 20210G>A mutation was explained by increased mRNA formation and/or more efficient translation. Human studies also showed an influence of the intronic 19911A>G polymorphism on prothrombin activity. We established HepG2 cell lines stably transfected with prothrombin mini-genes containing the last 2 prothrombin exons, the last intron, 3' untranslated region (UTR), and flanking sequence. The highest mRNA expression and protein activity resulted from the mutant haplotype 19911A-20210A. Haplotypes with wild-type cleavage site (19911A-20210G, 19911G-20210G) also differed significantly as a consequence of the intronic 19911 mutation; the 19911G-20210G haplotype showed lower expression than the 19911A-20210G haplotype, whereas previous clinical studies have reported elevated prothrombin activity with the 19911G-20210G haplotype. The cleavage site pattern was homogeneous with 20210A, which may cause a favorable intracellular processing, and heterogeneous with 20210G. In an independent assay for splicing efficiency, 19911G showed about 30% higher efficiency than 19911A. We conclude that the intronic 19911A>G single nucleotide polymorphism is itself functional and changes splicing efficiency by altering a known functional pentamer motif. Further studies are needed to define the value of additional prothrombin 19911 genotyping for thrombophilia screening, especially in cases heterozygous for 20210G>A.
Collapse
Affiliation(s)
- Nicolas von Ahsen
- Georg-August-University, Department of Clinical Chemistry, Robert-Koch-Str 40, 37075 Göttingen, Germany.
| | | |
Collapse
|
73
|
Fukumura R, Takahashi H, Saito T, Tsutsumi Y, Fujimori A, Sato S, Tatsumi K, Araki R, Abe M. A sensitive transcriptome analysis method that can detect unknown transcripts. Nucleic Acids Res 2003; 31:e94. [PMID: 12907746 PMCID: PMC169986 DOI: 10.1093/nar/gng094] [Citation(s) in RCA: 72] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
We have developed an AFLP-based gene expression profiling method called 'high coverage expression profiling' (HiCEP) analysis. By making improvements to the selective PCR technique we have reduced the rate of false positive peaks to approximately 4% and consequently the number of peaks, including overlapping peaks, has been markedly decreased. As a result we can determine the relationship between peaks and original transcripts unequivocally. This will make it practical to prepare a database of all peaks, allowing gene assignment without having to isolate individual peaks. This precise selection also enables us to easily clone peaks of interest and predict the corresponding gene for each peak in some species. The procedure is highly reproducible and sensitive enough to detect even a 1.2-fold difference in gene expression. Most importantly, the low false positive rate enables us to analyze gene expression with wide coverage by means of four instead of six nucleotide recognition site restriction enzymes for fingerprinting mRNAs. Therefore, the method detects 70-80% of all transcripts, including non-coding transcripts, unknown and known genes. Moreover, the method requires no sequence information and so is applicable even to eukaryotes for which there is no genome information available.
Collapse
Affiliation(s)
- Ryutaro Fukumura
- Transcriptome Profiling Group, National Institute of Radiological Sciences, Anagawa 4-9-1, Inage-ku, Chiba-shi, Chiba 263-8555, Japan
| | | | | | | | | | | | | | | | | |
Collapse
|
74
|
Abstract
An essential step in Serial Analysis of Gene Expression (SAGE) is tag mapping, which refers to the unambiguous determination of the gene represented by a SAGE tag. Current resources for tag mapping are incomplete, and thus do not allow assessment of the efficacy of SAGE in transcript identification. A method of tag mapping is described here and applied to the Drosophila melanogaster and Caenorhabditis elegans genomes, which permits detailed SAGE assessment and provides tag-mapping resources that were unavailable previously for these organisms. In our method, a conceptual transcriptome is constructed using genomic sequence and annotation by extending predicted coding regions to include UTRs on the basis of EST and cDNA alignments, UTR length distributions, and polyadenylation signals. Analysis of extracted tags suggests that, using the standard SAGE procedure, expression of 8% of D. melanogaster and 15% of C. elegans genes cannot be detected unambiguously by SAGE due to shared sequence or lack of NlaIII-anchoring enzyme sites. Both increasing tag length by 2-3 bp and using Sau3A instead of NlaIII as the anchoring enzyme increases potential for transcript detection. This work identifies and quantifies genes not amenable to SAGE analysis, in addition to providing tag-to-gene mappings for two model organisms.
Collapse
Affiliation(s)
- Erin D Pleasance
- Canada's Michael Smith Genome Sciences Centre, British Columbia Cancer Agency, Vancouver V5Z 4E6, Canada
| | | | | |
Collapse
|
75
|
Unneberg P, Wennborg A, Larsson M. Transcript identification by analysis of short sequence tags--influence of tag length, restriction site and transcript database. Nucleic Acids Res 2003; 31:2217-26. [PMID: 12682372 PMCID: PMC153741 DOI: 10.1093/nar/gkg313] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
There exist a number of gene expression profiling techniques that utilize restriction enzymes for generation of short expressed sequence tags. We have studied how the choice of restriction enzyme influences various characteristics of tags generated in an experiment. We have also investigated various aspects of in silico transcript identification that these profiling methods rely on. First, analysis of 14 248 mRNA sequences derived from the RefSeq transcript database showed that 1-30% of the sequences lack a given restriction enzyme recognition site. Moreover, 1-5% of the transcripts have recognition sites located less than 10 bases from the poly(A) tail. The uniqueness of 10 bp tags lies in the range 90-95%, which increases only slightly with longer tags, due to the existence of closely related transcripts. Furthermore, 3-30% of upstream 10 bp tags are identical to 3' tags, introducing a risk of misclassification if upstream tags are present in a sample. Second, we found that a sequence length of 16-17 bp, including the recognition site, is sufficient for unique transcript identification by BLAST based sequence alignment to the UniGene Human non-redundant database. Third, we constructed a tag-to-gene mapping for UniGene and compared it to an existing mapping database. The mappings agreed to 79-83%, where the selection of representative sequences in the UniGene clusters is the main cause of the disagreement. The results of this study may serve to improve the interpretation of sequence-based expression studies and the design of hybridization arrays, by identifying short tags that have a high reliability and separating them from tags that carry an inherent ambiguity in their capacity to discriminate between genes. To this end, supplementary information in the form of a web companion to this paper is located at http:// biobase.biotech.kth.se/tagseq.
Collapse
Affiliation(s)
- Per Unneberg
- Department of Biotechnology, Royal Institute of Technology (KTH), Roslagsvägen 30B, S-106 91 Stockholm, Sweden.
| | | | | |
Collapse
|
76
|
Jongeneel CV, Iseli C, Stevenson BJ, Riggins GJ, Lal A, Mackay A, Harris RA, O'Hare MJ, Neville AM, Simpson AJG, Strausberg RL. Comprehensive sampling of gene expression in human cell lines with massively parallel signature sequencing. Proc Natl Acad Sci U S A 2003; 100:4702-5. [PMID: 12671075 PMCID: PMC153619 DOI: 10.1073/pnas.0831040100] [Citation(s) in RCA: 85] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Whereas information is rapidly accumulating about the structure and position of genes encoded in the human genome, less is known about the complexity and relative abundance of their expression in individual human cells and tissues. Here, we describe the characteristics of the transcriptomes of two cultured cell lines, HB4a (normal breast epithelium) and HCT-116 (colon adenocarcinoma), using massively parallel signature sequencing (MPSS). We generated in excess of 10(7) short signature sequences per cell line, thus providing a comprehensive snapshot of gene expression, within the technical limitations of the method. The number of genes expressed at one copy per cell or more in either of the lines was estimated to be between 10,000 and 15,000. The vast majority of the transcripts found in these cells can be mapped to known genes and their polyadenylation variants. Among the genes that could be identified from their signature sequences, approximately 8,500 were expressed by both cell lines, whereas 6,000 showed cellular specificity. Taking into account sequence tags that map uniquely to the genome but not to known transcripts, overall the data are consistent with an upper limit of 17,000 for the total number of genes expressed at more than one copy per cell in one or both of the two cell lines examined.
Collapse
Affiliation(s)
- C Victor Jongeneel
- Office of Information Technology, Ludwig Institute for Cancer Research, and Swiss Institute of Bioinformatics, 1066 Epalinges, Switzerland.
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
77
|
Chen J, Sun M, Lee S, Zhou G, Rowley JD, Wang SM. Identifying novel transcripts and novel genes in the human genome by using novel SAGE tags. Proc Natl Acad Sci U S A 2002; 99:12257-62. [PMID: 12213963 PMCID: PMC129432 DOI: 10.1073/pnas.192436499] [Citation(s) in RCA: 123] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/23/2002] [Indexed: 11/18/2022] Open
Abstract
The number of genes in the human genome is still a controversial issue. Whereas most of the genes in the human genome are said to have been physically or computationally identified, many short cDNA sequences identified as tags by use of serial analysis of gene expression (SAGE) do not match these genes. By performing experimental verification of more than 1,000 SAGE tags and analyzing 4,285,923 SAGE tags of human origin in the current SAGE database, we examined the nature of the unmatched SAGE tags. Our study shows that most of the unmatched SAGE tags are truly novel SAGE tags that originated from novel transcripts not yet identified in the human genome, including alternatively spliced transcripts from known genes and potential novel genes. Our study indicates that by using novel SAGE tags as probes, we should be able to identify efficiently many novel transcripts/novel genes in the human genome that are difficult to identify by conventional methods.
Collapse
Affiliation(s)
- Jianjun Chen
- Department of Medicine, University of Chicago, 5841 South Maryland, MC2115, Chicago, IL 60637, USA
| | | | | | | | | | | |
Collapse
|
78
|
Iseli C, Stevenson BJ, de Souza SJ, Samaia HB, Camargo AA, Buetow KH, Strausberg RL, Simpson AJG, Bucher P, Jongeneel CV. Long-range heterogeneity at the 3' ends of human mRNAs. Genome Res 2002; 12:1068-74. [PMID: 12097343 PMCID: PMC186619 DOI: 10.1101/gr.62002] [Citation(s) in RCA: 62] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
The publication of a draft of the human genome and of large collections of transcribed sequences has made it possible to study the complex relationship between the transcriptome and the genome. In the work presented here, we have focused on mapping mRNA 3' ends onto the genome by use of the raw data generated by the expressed sequence tag (EST) sequencing projects. We find that at least half of the human genes encode multiple transcripts whose polyadenylation is driven by multiple signals. The corresponding transcript 3' ends are spread over distances in the kilobase range. This finding has profound implications for our understanding of gene expression regulation and of the diversity of human transcripts, for the design of cDNA microarray probes, and for the interpretation of gene expression profiling experiments.
Collapse
Affiliation(s)
- Christian Iseli
- Office of Information Technology, Ludwig Institute for Cancer Research, Switzerland
| | | | | | | | | | | | | | | | | | | |
Collapse
|
79
|
Zhang XD, Callahan FE, Jenkins JN, Ma DP, Karaca M, Saha S, Creech RG. A novel root-specific gene, MIC-3, with increased expression in nematode-resistant cotton (Gossypium hirsutum L.) after root-knot nematode infection. BIOCHIMICA ET BIOPHYSICA ACTA 2002; 1576:214-8. [PMID: 12031505 DOI: 10.1016/s0167-4781(02)00309-3] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
Abstract
A full-length cDNA, MIC-3, has been identified from a lambda ZAPII cDNA library constructed from the mRNA of nematode-resistant cotton (Gossypium hirsutum L.) roots after infection with root-knot nematode (Meloidogyne incognita). The putative open reading frame of MIC-3 encoded a protein of 141 amino acids with a calculated molecular mass of 15.3 kDa. Seven alternative polyadenylation sites have been identified for the MIC-3 transcripts, and the major transcripts are the longest ones. The MIC-3 gene contains a single intron within its coding region and belongs to a novel, multi-gene family containing up to six members. Expression of MIC-3 is root localized and specifically enhanced in the nematode induced, immature galls of resistant cotton line M-249, suggesting that MIC-3 may play a critical role in the resistance response to root-knot nematode.
Collapse
Affiliation(s)
- Xiang-Dong Zhang
- Department of Biochemistry and Molecular Biology, Box 9650, Mississippi State University, Mississippi State, MS 39762, USA
| | | | | | | | | | | | | |
Collapse
|
80
|
van Ruissen F, Jansen BJH, de Jongh GJ, Zeeuwen PLJM, Schalkwijk J. A partial transcriptome of human epidermis. Genomics 2002; 79:671-8. [PMID: 11991716 DOI: 10.1006/geno.2002.6756] [Citation(s) in RCA: 30] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Serial analysis of gene expression (SAGE) is a powerful technique for global expression profiling without prior knowledge of the genes of interest. We carried out SAGE analysis of purified keratinocytes derived from human skin biopsy specimens, resulting in a partial transcriptome of human epidermis. We identified 7645 unique SAGE tags with quantitative information from 15,131 collected SAGE tags obtained from approximately 3 x 10(6) epidermal cells. This catalog contains a large number of genes that were not previously known to be expressed by human epidermis. Comparison with the databases of all known human SAGE tags allowed us to identify a number of keratinocyte-specific tags that putatively correspond to formerly unknown genes. Surprisingly, human epidermal keratinocytes in vivo show relatively low expression levels of genes typically associated with epidermal differentiation, whereas the expression levels of housekeeping genes are considerably higher than in cultured keratinocytes. This study provides a first step toward a transcriptome of human epidermis and, as such, harbors a wealth of information to identify genes involved in skin function, and candidate genes for genetic skin disorders.
Collapse
Affiliation(s)
- Fred van Ruissen
- Neurozintuigen Laboratory, Academic Medical Center, 1100 D Amsterdam, The Netherlands.
| | | | | | | | | |
Collapse
|
81
|
Abstract
Polyadenylation is the process by which most eukaryotic mRNAs form their 3' ends. It was long held that polyadenylation required the sequence AAUAAA and that 90% of mRNAs had AAUAAA within 30 nucleotides of the site of poly(A) addition. More recent studies, aided by computer analysis of sequences made available in GenBank and expressed sequence tag (EST) databases, have suggested that the actual incidence of AAUAAA is much lower, perhaps as low as 50-60%. Reproductive biologists have long recognized that a large number of mRNAs in male germ cells of mammals lack AAUAAA but are otherwise normally polyadenylated. Recent research in our laboratory has uncovered a new form of an essential polyadenylation protein, tauCstF-64, that is most highly expressed in male germ cells, and to a smaller extent in the brain, and which we propose plays a significant role in AAUAAA-independent mRNA polyadenylation in germ cells.
Collapse
Affiliation(s)
- Clinton C MacDonald
- Department of Cell Biology & Biochemistry and Southwest Cancer Center at University Medical Center, Texas Tech University Health Sciences Center, 3601 4th Street, Lubbock 79430, USA.
| | | |
Collapse
|
82
|
Beaudoing E, Gautheret D. Identification of alternate polyadenylation sites and analysis of their tissue distribution using EST data. Genome Res 2001; 11:1520-6. [PMID: 11544195 PMCID: PMC311108 DOI: 10.1101/gr.190501] [Citation(s) in RCA: 141] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/14/2022]
Abstract
Alternate polyadenylation affects a large fraction of higher eucaryote mRNAs, producing mature transcripts with 3' ends of variable length. This variation is poorly represented in the current transcript catalogs derived from whole genome sequences, mostly because such posttranscriptional events are not detectable directly at the DNA level. Alternate polyadenylation of an mRNA is better understood by comparison to EST databases. Comparing ESTs to mRNAs, however, is a difficult task subjected to the pitfalls of internal priming, presence of intron sequences, repeated elements, chimerical ESTs or matches with EST from paralogous genes. We present here a computer program that addresses these problems and displays ESTs matches to a query mRNA sequence to predict alternate polyadenylation and to suggest library-specific forms. The output highlights effective polyadenylation signals, possible sources of artifacts such as A-rich stretches in the mRNA sequences, and allows for a direct visualization of EST libraries using color codes. Statistical biases in the distribution of alternative mRNA forms among EST libraries were systematically sought. About 1450 human and 200 mouse mRNAs displayed such biases, suggesting in each case a tissue- or disease-specific regulation of polyadenylation.
Collapse
Affiliation(s)
- E Beaudoing
- Centre d'Immunologie de Marseille-Luminy, Institut National de la Santé et de la Recherche Médicale, Centre National de la Recherche Scientifique, Marseille Cedex 09, France
| | | |
Collapse
|
83
|
Current Awareness on Comparative and Functional Genomics. Comp Funct Genomics 2001. [PMCID: PMC2447222 DOI: 10.1002/cfg.60] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2022] Open
|