1
|
Affiliation(s)
- Preethi H Gunaratne
- Human Genome Sequencing Center, FACMG Department of Pathology, Baylor College of Medicine, Texas Children's Hospital, Houston, TX 77030, USA.
| |
Collapse
|
2
|
Smandi S, Guerfali FZ, Farhat M, Ben-Aissa K, Laouini D, Guizani-Tabbane L, Dellagi K, Benkahla A. Methodology optimizing SAGE library tag-to-gene mapping: application to Leishmania. BMC Res Notes 2012; 5:74. [PMID: 22283878 PMCID: PMC3292834 DOI: 10.1186/1756-0500-5-74] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2011] [Accepted: 01/27/2012] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Leishmaniasis are widespread parasitic-diseases with an urgent need for more active and less toxic drugs and for effective vaccines. Understanding the biology of the parasite especially in the context of host parasite interaction is a crucial step towards such improvements in therapy and control. Several experimental approaches including SAGE (Serial analysis of gene expression) have been developed in order to investigate the parasite transcriptome organisation and plasticity. Usual SAGE tag-to-gene mapping techniques are inadequate because almost all tags are normally located in the 3'-UTR outside the CDS, whereas most information available for Leishmania transcripts is restricted to the CDS predictions. The aim of this work is to optimize a SAGE libraries tag-to-gene mapping technique and to show how this development improves the understanding of Leishmania transcriptome. FINDINGS The in silico method implemented herein was based on mapping the tags to Leishmania genome using BLAST then mapping the tags to their gene using a data-driven probability distribution. This optimized tag-to-gene mappings improved the knowledge of Leishmania genome structure and transcription. It allowed analyzing the expression of a maximal number of Leishmania genes, the delimitation of the 3' UTR of 478 genes and the identification of biological processes that are differentially modulated during the promastigote to amastigote differentiation. CONCLUSION The developed method optimizes the assignment of SAGE tags in trypanosomatidae genomes as well as in any genome having polycistronic transcription and small intergenic regions.
Collapse
Affiliation(s)
- Sondos Smandi
- Laboratoire d'Immuno-Pathologie, Vaccinologie et Génétique Moléculaire (LIVGM), WHO Collaborating Center for Research and Training in Leishmaniasis, Institut Pasteur de Tunis, 13 place Pasteur BP74 1002, Tunis, Tunisia.
| | | | | | | | | | | | | | | |
Collapse
|
3
|
Xiang Y, Wang Y, Luo Y, Zhang B, Xin J, Zheng D. Molecular biocompatibility evaluation of poly(d,l-lactic acid)-modified biomaterials based on long serial analysis of gene expression. Colloids Surf B Biointerfaces 2011; 85:248-61. [DOI: 10.1016/j.colsurfb.2011.02.036] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2010] [Revised: 02/23/2011] [Accepted: 02/25/2011] [Indexed: 10/18/2022]
|
4
|
Wu Q, Kim YC, Lu J, Xuan Z, Chen J, Zheng Y, Zhou T, Zhang MQ, Wu CI, Wang SM. Poly A- transcripts expressed in HeLa cells. PLoS One 2008; 3:e2803. [PMID: 18665230 PMCID: PMC2481391 DOI: 10.1371/journal.pone.0002803] [Citation(s) in RCA: 73] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2008] [Accepted: 07/04/2008] [Indexed: 12/20/2022] Open
Abstract
BACKGROUND Transcripts expressed in eukaryotes are classified as poly A+ transcripts or poly A- transcripts based on the presence or absence of the 3' poly A tail. Most transcripts identified so far are poly A+ transcripts, whereas the poly A- transcripts remain largely unknown. METHODOLOGY/PRINCIPAL FINDINGS We developed the TRD (Total RNA Detection) system for transcript identification. The system detects the transcripts through the following steps: 1) depleting the abundant ribosomal and small-size transcripts; 2) synthesizing cDNA without regard to the status of the 3' poly A tail; 3) applying the 454 sequencing technology for massive 3' EST collection from the cDNA; and 4) determining the genome origins of the detected transcripts by mapping the sequences to the human genome reference sequences. Using this system, we characterized the cytoplasmic transcripts from HeLa cells. Of the 13,467 distinct 3' ESTs analyzed, 24% are poly A-, 36% are poly A+, and 40% are bimorphic with poly A+ features but without the 3' poly A tail. Most of the poly A- 3' ESTs do not match known transcript sequences; they have a similar distribution pattern in the genome as the poly A+ and bimorphic 3' ESTs, and their mapped intergenic regions are evolutionarily conserved. Experiments confirmed the authenticity of the detected poly A- transcripts. CONCLUSION/SIGNIFICANCE Our study provides the first large-scale sequence evidence for the presence of poly A- transcripts in eukaryotes. The abundance of the poly A- transcripts highlights the need for comprehensive identification of these transcripts for decoding the transcriptome, annotating the genome and studying biological relevance of the poly A- transcripts.
Collapse
Affiliation(s)
- Qingfa Wu
- Center for Functional Genomics, Division of Medical Genetics, Department of Medicine, ENH Research Institute, Northwestern University Feinberg School of Medicine, Chicago, Illinois, United States of America
| | - Yeong C. Kim
- Center for Functional Genomics, Division of Medical Genetics, Department of Medicine, ENH Research Institute, Northwestern University Feinberg School of Medicine, Chicago, Illinois, United States of America
| | - Jian Lu
- Department of Ecology and Evolution, University of Chicago, Chicago, Illinois, United States of America
| | - Zhenyu Xuan
- Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, United States of America
| | - Jun Chen
- Center for Functional Genomics, Division of Medical Genetics, Department of Medicine, ENH Research Institute, Northwestern University Feinberg School of Medicine, Chicago, Illinois, United States of America
| | - Yonglan Zheng
- Center for Functional Genomics, Division of Medical Genetics, Department of Medicine, ENH Research Institute, Northwestern University Feinberg School of Medicine, Chicago, Illinois, United States of America
| | - Tom Zhou
- Center for Functional Genomics, Division of Medical Genetics, Department of Medicine, ENH Research Institute, Northwestern University Feinberg School of Medicine, Chicago, Illinois, United States of America
| | - Michael Q. Zhang
- Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, United States of America
| | - Chung-I Wu
- Department of Ecology and Evolution, University of Chicago, Chicago, Illinois, United States of America
| | - San Ming Wang
- Center for Functional Genomics, Division of Medical Genetics, Department of Medicine, ENH Research Institute, Northwestern University Feinberg School of Medicine, Chicago, Illinois, United States of America
- Robert H. Lurie Comprehensive Cancer Center, Northwestern University Feinberg School of Medicine, Chicago, Illinois, United States of America
| |
Collapse
|
5
|
Zhu J, He F, Wang J, Yu J. Modeling transcriptome based on transcript-sampling data. PLoS One 2008; 3:e1659. [PMID: 18286206 PMCID: PMC2243018 DOI: 10.1371/journal.pone.0001659] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2007] [Accepted: 01/21/2008] [Indexed: 01/10/2023] Open
Abstract
Background Newly-evolved multiplex sequencing technology has been bringing transcriptome sequencing into an unprecedented depth. Millions of transcript tags now can be acquired in a single experiment through parallelization. The significant increase in throughput and reduction in cost required us to address some fundamental questions, such as how many transcript tags do we have to sequence for a given transcriptome? How could we estimate the total number of unique transcripts for different cell types (transcriptome diversity) and the distribution of their copy numbers (transcriptome dynamics)? What is the probability that a transcript with a given expression level to be detected at a certain sampling depth? Methodology/Principal Findings We developed a statistical model to evaluate these parameters based on transcriptome-sampling data. Three mixture models were exploited for their potentials to model the sampling frequencies. We demonstrated that relative abundances of all transcripts in a transcriptome follow the generalized inverse Gaussian distribution. The widely known beta and gamma distributions failed to fulfill the singular characteristics of relative abundance distribution, i.e., highly skewed toward zero and with a long tail. An estimator of transcriptome diversity and an analytical form of sampling growth curve were proposed in a coherent framework. Experimental data fitted this model very well and Monte Carlo simulations based on this model replicated sampling experiments in a remarkable precision. Conclusions Taking human embryonic stem cell as a prototype, we demonstrated that sequencing tens of thousands of transcript tags in an ordinary EST/SAGE experiment was far from sufficient. In order to fully characterize a human transcriptome, millions of transcript tags had to be sequenced. This model lays a statistical basis for transcriptome-sampling experiments and in essence can be used in all sampling-based data.
Collapse
Affiliation(s)
- Jiang Zhu
- Chinese Academy of Sciences (CAS) Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China
- Graduate University of Chinese Academy of Sciences, Beijing, China
| | - Fuhong He
- Chinese Academy of Sciences (CAS) Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China
- Graduate University of Chinese Academy of Sciences, Beijing, China
| | - Jing Wang
- Chinese Academy of Sciences (CAS) Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China
- * To whom correspondence should be addressed. E-mail: (JW); (JY)
| | - Jun Yu
- Chinese Academy of Sciences (CAS) Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China
- * To whom correspondence should be addressed. E-mail: (JW); (JY)
| |
Collapse
|
6
|
Abstract
Many serial analysis of gene expression (SAGE) tags can be matched to multiple genes, leading to difficulty in SAGE data interpretation and analysis. As only a subset of genes in the human genome are transcribed in a certain type of tissue/cell, we used microarray expression data from different tissue types to define contexts of gene expression and to annotate SAGE tags collected from the same or similar tissue sources. To predict the original transcript contributing a nonspecific SAGE tag collected from a particular tissue, we ranked the corresponding genes by their expression levels determined by microarray. We developed a tissue-specific SAGE tag annotation database based on microarray data collected from 73 normal human tissues and 18 cancer tissues and cell lines. The database can be queried online at: http://www.basic.northwestern.edu/SAGE/. The accuracy of this database was confirmed by experimental data.
Collapse
Affiliation(s)
- Xijin Ge
- Evanston Northwestern Healtcare Research Institute, Evanston, IL, USA
| | | |
Collapse
|
7
|
Wang SM. Understanding SAGE data. Trends Genet 2006; 23:42-50. [PMID: 17109989 DOI: 10.1016/j.tig.2006.11.001] [Citation(s) in RCA: 43] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2006] [Revised: 10/05/2006] [Accepted: 11/01/2006] [Indexed: 02/08/2023]
Abstract
Serial analysis of gene expression (SAGE) is a method for identifying and quantifying transcripts from eukaryotic genomes. Since its invention, SAGE has been widely applied to analyzing gene expression in many biological and medical studies. Vast amounts of SAGE data have been collected and more than a thousand SAGE-related studies have been published since the mid-1990s. The principle of SAGE has been developed to address specific issues such as determination of normal gene structure and identification of abnormal genome structural changes. This review focuses on the general features of SAGE data, including the specificity of SAGE tags with respect to their original transcripts, the quantitative nature of SAGE data for differentially expressed genes, the reproducibility, the comparability of SAGE with microarray and the future potential of SAGE. Understanding these basic features should aid the proper interpretation of SAGE data to address biological and medical questions.
Collapse
Affiliation(s)
- San Ming Wang
- Center for Functional Genomics, ENH Research Institute, Robert H. Lurie Comprehensive Cancer Center, Northwestern University, 1001 University Place, Evanston, IL 60201, USA.
| |
Collapse
|
8
|
Accurate and unambiguous tag-to-gene mapping in serial analysis of gene expression. BMC Bioinformatics 2006; 7:487. [PMID: 17083742 PMCID: PMC1637119 DOI: 10.1186/1471-2105-7-487] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2006] [Accepted: 11/04/2006] [Indexed: 12/24/2022] Open
Abstract
BACKGROUND In this study, we present a robust and reliable computational method for tag-to-gene assignment in serial analysis of gene expression (SAGE). The method relies on current genome information and annotation, incorporation of several new features, and key improvements over alternative methods, all of which are important to determine gene expression levels more accurately. The method provides a complete annotation of potential virtual SAGE tags within a genome, along with an estimation of their confidence for experimental observation that ranks tags that present multiple matches in the genome. RESULTS We applied this method to the Saccharomyces cerevisiae genome, producing the most thorough and accurate annotation of potential virtual SAGE tags that is available today for this organism. The usefulness of this method is exemplified by the significant reduction of ambiguous cases in existing experimental SAGE data. In addition, we report new insights from the analysis of existing SAGE data. First, we found that experimental SAGE tags mapping onto introns, intron-exon boundaries, and non-coding RNA elements are observed in all available SAGE data. Second, a significant fraction of experimental SAGE tags was found to map onto genomic regions currently annotated as intergenic. Third, a significant number of existing experimental SAGE tags for yeast has been derived from truncated cDNAs, which are synthesized through oligo-d(T) priming to internal poly-(A) regions during reverse transcription. CONCLUSION We conclude that an accurate and unambiguous tag mapping process is essential to increase the quality and the amount of information that can be extracted from SAGE experiments. This is supported by the results obtained here and also by the large impact that the erroneous interpretation of these data could have on downstream applications.
Collapse
|
9
|
Wang SM. Applying the SAGE technique to study the effects of electromagnetic field on biological systems. Proteomics 2006; 6:4765-8. [PMID: 16897688 DOI: 10.1002/pmic.200500881] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
Identification of genes alternatively expressed in electromagnetic field (EMF)-exposed cells could provide direct evidence for biological effects of EMF. As there are a few indications so far for certain genes to be influenced by EMF, genome-wide scans of the transcriptome appear to be necessary. Among the several technologies used for genome-wide gene expression analysis, serial analysis of gene expression (SAGE) is one promising method, which seems particularly applicable for EMF research. This review provides a brief description of the features of gene expression, illustrates the basic principle of SAGE, and discusses the advantages and limitations of SAGE as well as examples of application. This information should help investigators determine if the SAGE technique is an optimal method for evaluating the biological effects of EMF.
Collapse
Affiliation(s)
- San Ming Wang
- Center for Functional Genomics, ENH Research Institute, Department of Medicine, Northwestern University, Evanston, IL 60201, USA.
| |
Collapse
|
10
|
Friedland DR, Popper P, Eernisse R, Ringger B, Cioffi JA. Differential expression of cytoskeletal genes in the cochlear nucleus. ACTA ACUST UNITED AC 2006; 288:447-65. [PMID: 16550590 PMCID: PMC2570442 DOI: 10.1002/ar.a.20303] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
The relationship between structure and function is clearly illustrated by emerging evidence demonstrating the role of the neuronal cytoskeleton in physiological processes. For example, alterations in axonal caliber, a feature of the cytoskeleton, have been shown to affect reflex arc latencies and are prominent features of several neuropathological disorders. Even in the nonpathologic situation, however, axonal diameter may be a crucial element for the normal function of specialized auditory neurons. To investigate this relationship, we used serial analysis of gene expression and microarray analyses to characterize the expression of cytoskeletal genes in the central auditory system. These data, confirmed by real-time RT-PCR, identified differential expression of intermediate neurofilament transcripts (i.e., Nefh, Nef3, and Nfl) among the subdivisions of the cochlear nucleus. In situ hybridization was used to identify specific classes of neurons within the cochlear nucleus expressing these neurofilament genes. Robust neurofilament expression was seen in bushy cells and cochlear nerve root neurons, suggesting an association between cytoskeletal structure and rapid conduction velocities. Gene expression data were also identified for other classes of cytoskeletal and structural genes important in neuronal function. These results may help to explain some causes of hearing loss in hereditary neuropathies and provide an anatomic basis for understanding normal neuronal function in the central auditory system.
Collapse
Affiliation(s)
- David R Friedland
- Department of Otolaryngology and Communication Sciences, Medical College of Wisconsin, Milwaukee, Wisconsin 53226, USA.
| | | | | | | | | |
Collapse
|
11
|
Ge X, Jung YC, Wu Q, Kibbe WA, Wang SM. Annotating nonspecific SAGE tags with microarray data. Genomics 2006; 87:173-80. [PMID: 16314072 DOI: 10.1016/j.ygeno.2005.08.014] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2005] [Revised: 08/09/2005] [Accepted: 08/27/2005] [Indexed: 11/26/2022]
Abstract
SAGE (serial analysis of gene expression) detects transcripts by extracting short tags from the transcripts. Because of the limited length, many SAGE tags are shared by transcripts from different genes. Relying on sequence information in the general gene expression database has limited power to solve this problem due to the highly heterogeneous nature of the deposited sequences. Considering that the complexity of gene expression at a single tissue level should be much simpler than that in the general expression database, we reasoned that by restricting gene expression to tissue level, the accuracy of gene annotation for the nonspecific SAGE tags should be significantly improved. To test the idea, we developed a tissue-specific SAGE annotation database based on microarray data (). This database contains microarray expression information represented as UniGene clusters for 73 normal human tissues and 18 cancer tissues and cell lines. The nonspecific SAGE tag is first matched to the database by the same tissue type used by both SAGE and microarray analysis; then the multiple UniGene clusters assigned to the nonspecific SAGE tag are searched in the database under the matched tissue type. The UniGene cluster presented solely or at higher expression levels in the database is annotated to represent the specific gene for the nonspecific SAGE tags. The accuracy of gene annotation by this database was largely confirmed by experimental data. Our study shows that microarray data provide a useful source for annotating the nonspecific SAGE tags.
Collapse
Affiliation(s)
- Xijin Ge
- Center for Functional Genomics, ENH Research Institute, Northwestern University, Chicago, IL 60611, USA
| | | | | | | | | |
Collapse
|
12
|
Poroyko V, Hejlek LG, Spollen WG, Springer GK, Nguyen HT, Sharp RE, Bohnert HJ. The maize root transcriptome by serial analysis of gene expression. PLANT PHYSIOLOGY 2005; 138:1700-10. [PMID: 15965024 PMCID: PMC1176439 DOI: 10.1104/pp.104.057638] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/03/2023]
Abstract
Serial Analysis of Gene Expression was used to define number and relative abundance of transcripts in the root tip of well-watered maize seedlings (Zea mays cv FR697). In total, 161,320 tags represented a minimum of 14,850 genes, based on at least two tags detected per transcript. The root transcriptome has been sampled to an estimated copy number of approximately five transcripts per cell. An extrapolation from the data and testing of single-tag identifiers by reverse transcription-PCR indicated that the maize root transcriptome should amount to at least 22,000 expressed genes. Frequency ranged from low copy number (2-5, 68.8%) to highly abundant transcripts (100-->1,200; 1%). Quantitative reverse transcription-PCR for selected transcripts indicated high correlation with tag frequency. Computational analysis compared this set with known maize transcripts and other root transcriptome models. Among the 14,850 tags, 7,010 (47%) were found for which no maize cDNA or gene model existed. Comparing the maize root transcriptome with that in other plants indicated that highly expressed transcripts differed substantially; less than 5% of the most abundant transcripts were shared between maize and Arabidopsis (Arabidopsis thaliana). Transcript categories highlight functions of the maize root tip. Significant variation in abundance characterizes transcripts derived from isoforms of individual enzymes in biochemical pathways.
Collapse
Affiliation(s)
- V Poroyko
- Department of Plant Biology , University of Illinois, Urbana, Illinois 61801, USA
| | | | | | | | | | | | | |
Collapse
|
13
|
Ibrahim AFM, Hedley PE, Cardle L, Kruger W, Marshall DF, Muehlbauer GJ, Waugh R. A comparative analysis of transcript abundance using SAGE and Affymetrix arrays. Funct Integr Genomics 2005; 5:163-74. [PMID: 15714318 DOI: 10.1007/s10142-005-0135-4] [Citation(s) in RCA: 34] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2004] [Revised: 12/13/2004] [Accepted: 12/22/2004] [Indexed: 12/18/2022]
Abstract
A number of methods are currently used for gene expression profiling. They differ in scale, economy and sensitivity. We present the results of a direct comparison between serial analysis of gene expression (SAGE) and the Barley1 Affymetrix GeneChip. Both technology platforms were used to obtain quantitative measurements of transcript abundance using identical RNA samples and assessed for their ability to quantify differential gene expression. For SAGE, a total of 82,122 tags were generated from two independent libraries representing whole developing barley caryopsis and dissected embryos. The Barley1 GeneChip contains 22,791 probe sets. Results obtained from both methods are generally comparable, indicating that both will lead to similar conclusions regarding transcript levels and differential gene expression. However, excluding singletons, 24.4% of the unique SAGE tags had no corresponding probe set on the Barley1 array indicating that a broader snapshot of gene expression was obtained by SAGE. Discrepancies were observed for a number of "genes" and these are discussed.
Collapse
Affiliation(s)
- Adel F M Ibrahim
- Genome Dynamics, Scottish Crop Research Institute, Invergowrie, Dundee, UK.
| | | | | | | | | | | | | |
Collapse
|
14
|
Fujishima N, Hirokawa M, Aiba N, Ichikawa Y, Fujishima M, Komatsuda A, Suzuki Y, Kawabata Y, Miura I, Sawada KI. Gene Expression Profiling of Human Erythroid Progenitors by Micro-Serial Analysis of Gene Expression. Int J Hematol 2004; 80:239-45. [PMID: 15540898 DOI: 10.1532/ijh97.04053] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
Abstract
We compared the expression profiles of highly purified human CD34+ cells and erythroid progenitor cells by micro-serial analysis of gene expression (microSAGE). Human CD34+ cells were purified from granulocyte colony-stimulating factor-mobilized blood stem cells, and erythroid progenitors were obtained by cultivating these cells in the presence of stem cell factor, interleukin 3, and erythropoietin. Our 10,202 SAGE tags allowed us to identify 1354 different transcripts appearing more than once. Erythroid progenitor cells showed increased expression of LRBA, EEF1A1, HSPCA, PILRB, RANBP1, NACA, and SMURF. Overexpression of HSPCA was confirmed by real-time polymerase chain reaction analysis. MicroSAGE revealed an unexpected preferential expression of several genes in erythroid progenitor cells in addition to the known functional genes, including hemoglobins. Our results provide reference data for future studies of gene expression in various hematopoietic disorders, including myelodysplastic syndrome and leukemia.
Collapse
Affiliation(s)
- Naohito Fujishima
- Department of Internal Medicine III, Akita University School of Medicine, Akita, Japan
| | | | | | | | | | | | | | | | | | | |
Collapse
|
15
|
Abstract
This review deals with the methods of identifying genes that have been activated by inner or outer impulses. The activation and subsequent expression of a gene can be detected by its transcription into a corresponding messenger ribonucleic acid (mRNA). Principles of the methods for identification of individual activated genes, as well as groups of activated genes are described, the former methods being mostly based on subtractive hybridization and serial analysis of gene expression (SAGE), the latter on microarrays. Examples of gene activation by the hormone 17beta-estradiol (E2) are given.
Collapse
Affiliation(s)
- Sten Z Cekan
- Karolinska Institute, Department of Woman and Child Health, Division of Reproductive Endocrinology, Karolinska University Hospital, Building L5, 17176 Stockholm, Sweden.
| |
Collapse
|
16
|
|
17
|
Sun M, Zhou G, Lee S, Chen J, Shi RZ, Wang SM. SAGE is far more sensitive than EST for detecting low-abundance transcripts. BMC Genomics 2004; 5:1. [PMID: 14704093 PMCID: PMC317289 DOI: 10.1186/1471-2164-5-1] [Citation(s) in RCA: 58] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2003] [Accepted: 01/05/2004] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Isolation of low-abundance transcripts expressed in a genome remains a serious challenge in transcriptome studies. The sensitivity of the methods used for analysis has a direct impact on the efficiency of the detection. We compared the EST method and the SAGE method to determine which one is more sensitive and to what extent the sensitivity is great for the detection of low-abundance transcripts. RESULTS Using the same low-abundance transcripts detected by both methods as the targeted sequences, we observed that the SAGE method is 26 times more sensitive than the EST method for the detection of low-abundance transcripts. CONCLUSIONS The SAGE method is more efficient than the EST method in detecting the low-abundance transcripts.
Collapse
Affiliation(s)
- Miao Sun
- Department of Medicine, University of Chicago, 5841 S. Maryland Avenue, MC2115, Chicago, Illinois 60637, USA
| | - Guolin Zhou
- Department of Medicine, University of Chicago, 5841 S. Maryland Avenue, MC2115, Chicago, Illinois 60637, USA
| | - Sanggyu Lee
- Department of Medicine, University of Chicago, 5841 S. Maryland Avenue, MC2115, Chicago, Illinois 60637, USA
| | - Jianjun Chen
- Department of Medicine, University of Chicago, 5841 S. Maryland Avenue, MC2115, Chicago, Illinois 60637, USA
| | - Run Zhang Shi
- Department of Medicine, University of Chicago, 5841 S. Maryland Avenue, MC2115, Chicago, Illinois 60637, USA
| | - San Ming Wang
- Department of Medicine, University of Chicago, 5841 S. Maryland Avenue, MC2115, Chicago, Illinois 60637, USA
- ENH Research Institute, Northwestern University, 1001 University Place, Evanston, IL 60201
| |
Collapse
|
18
|
Fizames C, Muños S, Cazettes C, Nacry P, Boucherez J, Gaymard F, Piquemal D, Delorme V, Commes T, Doumas P, Cooke R, Marti J, Sentenac H, Gojon A. The Arabidopsis root transcriptome by serial analysis of gene expression. Gene identification using the genome sequence. PLANT PHYSIOLOGY 2004; 134:67-80. [PMID: 14730065 PMCID: PMC316288 DOI: 10.1104/pp.103.030536] [Citation(s) in RCA: 44] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/18/2003] [Revised: 09/07/2003] [Accepted: 10/22/2003] [Indexed: 05/18/2023]
Abstract
Large-scale identification of genes expressed in roots of the model plant Arabidopsis was performed by serial analysis of gene expression (SAGE), on a total of 144,083 sequenced tags, representing at least 15,964 different mRNAs. For tag to gene assignment, we developed a computational approach based on 26,620 genes annotated from the complete sequence of the genome. The procedure selected warrants the identification of the genes corresponding to the majority of the tags found experimentally, with a high level of reliability, and provides a reference database for SAGE studies in Arabidopsis. This new resource allowed us to characterize the expression of more than 3,000 genes, for which there is no expressed sequence tag (EST) or cDNA in the databases. Moreover, 85% of the tags were specific for one gene. To illustrate this advantage of SAGE for functional genomics, we show that our data allow an unambiguous analysis of most of the individual genes belonging to 12 different ion transporter multigene families. These results indicate that, compared with EST-based tag to gene assignment, the use of the annotated genome sequence greatly improves gene identification in SAGE studies. However, more than 6,000 different tags remained with no gene match, suggesting that a significant proportion of transcripts present in the roots originate from yet unknown or wrongly annotated genes. The root transcriptome characterized in this study markedly differs from those obtained in other organs, and provides a unique resource for investigating the functional specificities of the root system. As an example of the use of SAGE for transcript profiling in Arabidopsis, we report here the identification of 270 genes differentially expressed between roots of plants grown either with NO3- or NH4NO3 as N source.
Collapse
Affiliation(s)
- Cécile Fizames
- Biochimie et Physiologie Moléculaire des Plantes, Unité Mixte de Recherche 5004, Agro-M/Centre National de la Recherche Scientifique/Institut National de la Recherche Agronomique/UM2, Place Viala, 34060 Montpellier 1, France
| | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
19
|
Hayden PS, El-Meanawy A, Schelling JR, Sedor JR. DNA expression analysis: serial analysis of gene expression, microarrays and kidney disease. Curr Opin Nephrol Hypertens 2003; 12:407-14. [PMID: 12815337 DOI: 10.1097/00041552-200307000-00009] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
Abstract
PURPOSE OF REVIEW Expression profiling using serial analysis of gene expression and microarray technologies allows global description of expressed genes present in biological systems. Although relatively new technologies, each having been developed in the mid-1990s, both have become established and widely used tools for identification of gene networks and gene function. RECENT FINDINGS This review highlights DNA expression analyses published in 2002, emphasizing primarily serial analysis of gene expression and microarray technologies. We focus on the applicability of DNA expression analysis to renal disease, especially as some investigators have developed custom serial analysis of gene expression kidney libraries and kidney disease-specific 'designer chip' microarrays. Data analysis techniques and statistics are also discussed, since the challenge is generation of accurate messenger RNA profiles and interpretation of data in a manner that is both coherent and reproducible. SUMMARY Because kidney disease pathophysiology is complex, expression analysis can identify candidate nephropathy pathogenesis genes and gene networks, which eventually could become targets for therapeutic intervention.
Collapse
Affiliation(s)
- Patrick S Hayden
- Departments of Medicine and Physiology and Biophysics, School of Medicine, Case Western Reserve University, Cleveland, Ohio, USA
| | | | | | | |
Collapse
|
20
|
Abstract
An essential step in Serial Analysis of Gene Expression (SAGE) is tag mapping, which refers to the unambiguous determination of the gene represented by a SAGE tag. Current resources for tag mapping are incomplete, and thus do not allow assessment of the efficacy of SAGE in transcript identification. A method of tag mapping is described here and applied to the Drosophila melanogaster and Caenorhabditis elegans genomes, which permits detailed SAGE assessment and provides tag-mapping resources that were unavailable previously for these organisms. In our method, a conceptual transcriptome is constructed using genomic sequence and annotation by extending predicted coding regions to include UTRs on the basis of EST and cDNA alignments, UTR length distributions, and polyadenylation signals. Analysis of extracted tags suggests that, using the standard SAGE procedure, expression of 8% of D. melanogaster and 15% of C. elegans genes cannot be detected unambiguously by SAGE due to shared sequence or lack of NlaIII-anchoring enzyme sites. Both increasing tag length by 2-3 bp and using Sau3A instead of NlaIII as the anchoring enzyme increases potential for transcript detection. This work identifies and quantifies genes not amenable to SAGE analysis, in addition to providing tag-to-gene mappings for two model organisms.
Collapse
Affiliation(s)
- Erin D Pleasance
- Canada's Michael Smith Genome Sciences Centre, British Columbia Cancer Agency, Vancouver V5Z 4E6, Canada
| | | | | |
Collapse
|
21
|
Unneberg P, Wennborg A, Larsson M. Transcript identification by analysis of short sequence tags--influence of tag length, restriction site and transcript database. Nucleic Acids Res 2003; 31:2217-26. [PMID: 12682372 PMCID: PMC153741 DOI: 10.1093/nar/gkg313] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
There exist a number of gene expression profiling techniques that utilize restriction enzymes for generation of short expressed sequence tags. We have studied how the choice of restriction enzyme influences various characteristics of tags generated in an experiment. We have also investigated various aspects of in silico transcript identification that these profiling methods rely on. First, analysis of 14 248 mRNA sequences derived from the RefSeq transcript database showed that 1-30% of the sequences lack a given restriction enzyme recognition site. Moreover, 1-5% of the transcripts have recognition sites located less than 10 bases from the poly(A) tail. The uniqueness of 10 bp tags lies in the range 90-95%, which increases only slightly with longer tags, due to the existence of closely related transcripts. Furthermore, 3-30% of upstream 10 bp tags are identical to 3' tags, introducing a risk of misclassification if upstream tags are present in a sample. Second, we found that a sequence length of 16-17 bp, including the recognition site, is sufficient for unique transcript identification by BLAST based sequence alignment to the UniGene Human non-redundant database. Third, we constructed a tag-to-gene mapping for UniGene and compared it to an existing mapping database. The mappings agreed to 79-83%, where the selection of representative sequences in the UniGene clusters is the main cause of the disagreement. The results of this study may serve to improve the interpretation of sequence-based expression studies and the design of hybridization arrays, by identifying short tags that have a high reliability and separating them from tags that carry an inherent ambiguity in their capacity to discriminate between genes. To this end, supplementary information in the form of a web companion to this paper is located at http:// biobase.biotech.kth.se/tagseq.
Collapse
Affiliation(s)
- Per Unneberg
- Department of Biotechnology, Royal Institute of Technology (KTH), Roslagsvägen 30B, S-106 91 Stockholm, Sweden.
| | | | | |
Collapse
|
22
|
Abstract
The human genome sequence is the book of our life. Buried in this large volume are our genes, which are scattered as small DNA fragments throughout the genome and comprise a small percentage of the total text. Finding these indistinct 'needles' in a vast genomic 'haystack' can be extremely challenging. In response to this challenge, computational prediction approaches have proliferated in recent years that predict the location and structure of genes. Here, I discuss these approaches and explain why they have become essential for the analyses of newly sequenced genomes.
Collapse
Affiliation(s)
- Michael Q Zhang
- Watson School of Biological Sciences, Cold Spring Harbor Laboratory, 1 Bungtown Road, PO Box 100, Cold Spring Harbor, New York 11724, USA.
| |
Collapse
|
23
|
Current Awareness on Comparative and Functional Genomics. Comp Funct Genomics 2002. [PMCID: PMC2447335 DOI: 10.1002/cfg.120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022] Open
|