1
|
So AP, Turner RFB, Haynes CA. Minimizing loss of sequence information in SAGE ditags by modulating the temperature dependent 3' --> 5' exonuclease activity of DNA polymerases on 3'-terminal isoheptyl amino groups. Biotechnol Bioeng 2006; 94:54-65. [PMID: 16552775 DOI: 10.1002/bit.20805] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Numerous steps are required to prepare a sequencing library for serial analysis of gene expression (or SAGE) from an original mRNA sample. The presence of inefficiencies, however, can lead to a cumulative loss of sample during processing which can yield a library of short sequence tags (SSTs) that represents only a minute fraction of the original starting sample, potentially compromising the quality of the analysis and necessitating relatively large amounts of starting material. We show here that commonly observed higher molecular weight (HMW) amplification products observed following the PCR amplification of ditags are a direct result of the presence of HMW ligation products created during ditag formation. Using model tags, we demonstrate that the formation of these HMW ligation products becomes permissible following the release of the 3'-terminal isoheptyl amine (3'-IHA) from the SST during the fill-in reaction with the Klenow fragment (KF) of DNA polymerase (DNAP) I and is mediated by its 3' --> 5' exonuclease activity. We further show that the incorporation of SSTs into HMW ligation products can lead to a loss of sequence information from SAGE analysis, potentially skewing sequencing results away from the true distribution in the original sample. By modifying fill-in conditions through the use of Vent DNAP at 12 degrees C and by including terminal phosphorothioate linkages within the SAGE adaptors to specifically inhibit exonucleolytic removal of the 3'-terminal amine, we are able to maximize the yield of ditags and bypass the need for gel purification via PAGE following PCR. The modifications described here, combined with the modifications described previously by our group for adaptor ligation, ensure that the full sequence information content in SSTs derived from the transcriptome is preserved in the pool of amplified ditags prior to the creation of a SAGE library.
Collapse
Affiliation(s)
- Austin P So
- Michael Smith Laboratories, University of British Columbia, Vancouver, British Columbia V6T 1Z3
| | | | | |
Collapse
|
2
|
Silva APM, De Souza JES, Galante PAF, Riggins GJ, De Souza SJ, Camargo AA. The impact of SNPs on the interpretation of SAGE and MPSS experimental data. Nucleic Acids Res 2004; 32:6104-10. [PMID: 15562001 PMCID: PMC534621 DOI: 10.1093/nar/gkh937] [Citation(s) in RCA: 30] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Serial Analysis of Gene Expression (SAGE) and Massively Parallel Signature Sequencing (MPSS) are powerful techniques for gene expression analysis. A crucial step in analyzing SAGE and MPSS data is the assignment of experimentally obtained tags to a known transcript. However, tag to transcript assignment is not a straightforward process since alternative tags for a given transcript can also be experimentally obtained. Here, we have evaluated the impact of Single Nucleotide Polymorphisms (SNPs) on the generation of alternative SAGE and MPSS tags. This was achieved through the construction of a reference database of SNP-associated alternative tags, which has been integrated with SAGE Genie. A total of 2020 SNP-associated alternative tags were catalogued in our reference database and at least one SNP-associated alternative tag was observed for approximately 8.6% of all known human genes. A significant fraction (61.9%) of these alternative tags matched a list of experimentally obtained tags, validating their existence. In addition, the origin of four out of five SNP-associated alternative MPSS tags was experimentally confirmed through the use of the GLGI-MPSS protocol (Generation of Long cDNA fragments for Gene Identification). The availability of our SNP-associated alternative tag database will certainly improve the interpretation of SAGE and MPSS experiments.
Collapse
Affiliation(s)
- Ana Paula M Silva
- Laboratory of Molecular Biology, Ludwig Institute for Cancer Research, 01509-010, São Paulo, SP, Brazil
| | | | | | | | | | | |
Collapse
|
3
|
Tuteja R, Tuteja N. Serial analysis of gene expression (SAGE): unraveling the bioinformatics tools. Bioessays 2004; 26:916-22. [PMID: 15273993 DOI: 10.1002/bies.20070] [Citation(s) in RCA: 36] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
Serial analysis of gene expression (SAGE) is a powerful technique that can be used for global analysis of gene expression. Its chief advantage over other methods is that it does not require prior knowledge of the genes of interest and provides qualitative and quantitative data of potentially every transcribed sequence in a particular cell or tissue type. This is a technique of expression profiling, which permits simultaneous, comparative and quantitative analysis of gene-specific, 9- to 13-basepair sequences. These short sequences, called SAGE tags, are linked together for efficient sequencing. The sequencing data are then analyzed to identify each gene expressed in the cell and the levels at which each gene is expressed. The main benefit of SAGE includes the digital output and the identification of novel genes. In this review, we present an outline of the method, various bioinformatics methods for data analysis and general applications of this important technology.
Collapse
Affiliation(s)
- Renu Tuteja
- International Centre for Genetic Engineering and Biotechnology, Aruna Asaf Ali Marg, New Delhi, India.
| | | |
Collapse
|
4
|
Beisel KW, Shiraki T, Morris KA, Pompeia C, Kachar B, Arakawa T, Bono H, Kawai J, Hayashizaki Y, Carninci P. Identification of unique transcripts from a mouse full-length, subtracted inner ear cDNA library. Genomics 2004; 83:1012-23. [PMID: 15177555 DOI: 10.1016/j.ygeno.2004.01.006] [Citation(s) in RCA: 17] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2003] [Revised: 12/15/2003] [Accepted: 01/25/2004] [Indexed: 11/20/2022]
Abstract
A small-scale full-length library construction approach was developed to facilitate production of a mouse full-length cDNA encyclopedia representing approximately 250 enriched, normalized, and/or subtracted cDNA libraries. One library produced using this approach was a subtracted adult mouse inner ear cDNA library (sIEa). The average size of the inserts was approximately 2.5 kb, with the majority ranging from 0.5 to 7.0 kb. From this library 22,574 sequence reads were obtained from 15,958 independent clones. Sequencing and chromosomal localization established 5240 clusters, with 1302 clusters being unique and 359 representing new ESTs. Our sIEa library contributed 56.1% of the 7773 nonredundant Unigene clusters associated with the four mouse inner ear libraries in the NCBI dbEST. Based on homologous chromosomal regions between human and mouse, we identified 1018 UniGene clusters associated with the deafness locus critical regions. Of these, 59 clusters were found only in our sIEa library and represented approximately 50% of the identified critical regions.
Collapse
Affiliation(s)
- Kirk W Beisel
- Department of Biomedical Sciences, Creighton University, 2500 California, Omaha, NE 68178, USA.
| | | | | | | | | | | | | | | | | | | |
Collapse
|
5
|
Snape JR, Maund SJ, Pickford DB, Hutchinson TH. Ecotoxicogenomics: the challenge of integrating genomics into aquatic and terrestrial ecotoxicology. AQUATIC TOXICOLOGY (AMSTERDAM, NETHERLANDS) 2004; 67:143-154. [PMID: 15003699 DOI: 10.1016/j.aquatox.2003.11.011] [Citation(s) in RCA: 183] [Impact Index Per Article: 9.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/20/2002] [Accepted: 11/30/2003] [Indexed: 05/24/2023]
Abstract
Rapid progress in the field of genomics (the study of how an individual's entire genetic make-up, the genome, translates into biological functions) is beginning to provide tools that may assist our understanding of how chemicals can impact on human and ecosystem health. In many ways, if scientific and regulatory efforts in the 20th century have sought to establish which chemicals cause damage to ecosystems, then the challenge in ecotoxicology for the 21st century is to understand the mechanisms of toxicity to different wildlife species. In the human context, 'toxicogenomics' is the study of expression of genes important in adaptive responses to toxic exposures and a reflection of the toxic processes per se. Given the parallel implications for ecological (environmental) risk assessment, we propose the term 'ecotoxicogenomics' to describe the integration of genomics (transcriptomics, proteomics and metabolomics) into ecotoxicology. Ecotoxicogenomics is defined as the study of gene and protein expression in non-target organisms that is important in responses to environmental toxicant exposures. The potential of ecotoxicogenomic tools in ecological risk assessment seems great. Many of the standardized methods used to assess potential impact of chemicals on aquatic organisms rely on measuring whole-organism responses (e.g. mortality, growth, reproduction) of generally sensitive indicator species at maintained concentrations, and deriving 'endpoints' based on these phenomena (e.g. median lethal concentrations, no observed effect concentrations, etc.). Whilst such phenomenological approaches are useful for identifying chemicals of potential concern they provide little understanding of the mechanism of chemical toxicity. Without this understanding, it will be difficult to address some of the key challenges that currently face aquatic ecotoxicology, e.g. predicting toxicant responses across the very broad diversity of the phylogenetic groups present in aquatic ecosystems; estimating how changes at one ecological level or organisation will affect other levels (e.g. predicting population-level effects); predicting the influence of time-varying exposure on toxicant responses. Ecotoxicogenomic tools may provide us with a better mechanistic understanding of aquatic ecotoxicology. For ecotoxicogenomics to fulfil its potential, collaborative efforts are necessary through the parallel use of model microorganisms (e.g. Saccharomyces cerevisiae) together with aquatic (e.g. Danio rerio, Daphnia magna, Lemna minor and Xenopus tropicalis) and terrestrial (e.g. Arabidopsis thailiana, Caenorhabdites elegans and Eisenia foetida) plants, animals and microorganisms.
Collapse
Affiliation(s)
- Jason R Snape
- AstraZeneca Global Safety Health and Environment, Brixham Environmental Laboratory, Freshwater Quarry, Brixham, Devon TQ5 8BA, UK.
| | | | | | | |
Collapse
|
6
|
Sun M, Zhou G, Lee S, Chen J, Shi RZ, Wang SM. SAGE is far more sensitive than EST for detecting low-abundance transcripts. BMC Genomics 2004; 5:1. [PMID: 14704093 PMCID: PMC317289 DOI: 10.1186/1471-2164-5-1] [Citation(s) in RCA: 58] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2003] [Accepted: 01/05/2004] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Isolation of low-abundance transcripts expressed in a genome remains a serious challenge in transcriptome studies. The sensitivity of the methods used for analysis has a direct impact on the efficiency of the detection. We compared the EST method and the SAGE method to determine which one is more sensitive and to what extent the sensitivity is great for the detection of low-abundance transcripts. RESULTS Using the same low-abundance transcripts detected by both methods as the targeted sequences, we observed that the SAGE method is 26 times more sensitive than the EST method for the detection of low-abundance transcripts. CONCLUSIONS The SAGE method is more efficient than the EST method in detecting the low-abundance transcripts.
Collapse
Affiliation(s)
- Miao Sun
- Department of Medicine, University of Chicago, 5841 S. Maryland Avenue, MC2115, Chicago, Illinois 60637, USA
| | - Guolin Zhou
- Department of Medicine, University of Chicago, 5841 S. Maryland Avenue, MC2115, Chicago, Illinois 60637, USA
| | - Sanggyu Lee
- Department of Medicine, University of Chicago, 5841 S. Maryland Avenue, MC2115, Chicago, Illinois 60637, USA
| | - Jianjun Chen
- Department of Medicine, University of Chicago, 5841 S. Maryland Avenue, MC2115, Chicago, Illinois 60637, USA
| | - Run Zhang Shi
- Department of Medicine, University of Chicago, 5841 S. Maryland Avenue, MC2115, Chicago, Illinois 60637, USA
| | - San Ming Wang
- Department of Medicine, University of Chicago, 5841 S. Maryland Avenue, MC2115, Chicago, Illinois 60637, USA
- ENH Research Institute, Northwestern University, 1001 University Place, Evanston, IL 60201
| |
Collapse
|
7
|
Unneberg P, Wennborg A, Larsson M. Transcript identification by analysis of short sequence tags--influence of tag length, restriction site and transcript database. Nucleic Acids Res 2003; 31:2217-26. [PMID: 12682372 PMCID: PMC153741 DOI: 10.1093/nar/gkg313] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
There exist a number of gene expression profiling techniques that utilize restriction enzymes for generation of short expressed sequence tags. We have studied how the choice of restriction enzyme influences various characteristics of tags generated in an experiment. We have also investigated various aspects of in silico transcript identification that these profiling methods rely on. First, analysis of 14 248 mRNA sequences derived from the RefSeq transcript database showed that 1-30% of the sequences lack a given restriction enzyme recognition site. Moreover, 1-5% of the transcripts have recognition sites located less than 10 bases from the poly(A) tail. The uniqueness of 10 bp tags lies in the range 90-95%, which increases only slightly with longer tags, due to the existence of closely related transcripts. Furthermore, 3-30% of upstream 10 bp tags are identical to 3' tags, introducing a risk of misclassification if upstream tags are present in a sample. Second, we found that a sequence length of 16-17 bp, including the recognition site, is sufficient for unique transcript identification by BLAST based sequence alignment to the UniGene Human non-redundant database. Third, we constructed a tag-to-gene mapping for UniGene and compared it to an existing mapping database. The mappings agreed to 79-83%, where the selection of representative sequences in the UniGene clusters is the main cause of the disagreement. The results of this study may serve to improve the interpretation of sequence-based expression studies and the design of hybridization arrays, by identifying short tags that have a high reliability and separating them from tags that carry an inherent ambiguity in their capacity to discriminate between genes. To this end, supplementary information in the form of a web companion to this paper is located at http:// biobase.biotech.kth.se/tagseq.
Collapse
Affiliation(s)
- Per Unneberg
- Department of Biotechnology, Royal Institute of Technology (KTH), Roslagsvägen 30B, S-106 91 Stockholm, Sweden.
| | | | | |
Collapse
|
8
|
Current Awareness on Comparative and Functional Genomics. Comp Funct Genomics 2003. [PMCID: PMC2447381 DOI: 10.1002/cfg.226] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open
|
9
|
Lee S, Clark T, Chen J, Zhou G, Scott LR, Rowley JD, Wang SM. Correct identification of genes from serial analysis of gene expression tag sequences. Genomics 2002; 79:598-602. [PMID: 11944993 DOI: 10.1006/geno.2002.6730] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
SAGE (serial analysis of gene expression) is a remarkable technique for genome-wide analysis of gene expression. It is crucial to understand the extent to which SAGE can accurately indicate a gene or expressed sequence tag (EST) with a single tag. We analyzed the effect of the size of SAGE tag on gene identification. Our observation indicates that SAGE tags are in general not long enough to achieve the degree of uniqueness of identification originally envisaged. Our observations also indicate that the limitation of using SAGE tag to identify a gene can be overcome by converting SAGE tags into longer 3' EST sequences with the generation of longer cDNA fragments from SAGE tages for gene identification (GLGI) method.
Collapse
Affiliation(s)
- Sanggyu Lee
- Department of Medicine, University of Chicago, 5841 S. Maryland, MC2115, Chicago, Illinois 60637, USA
| | | | | | | | | | | | | |
Collapse
|