1
|
Li QT, Huang ZZ, Chen YB, Yao HY, Ke ZH, He XX, Qiu MJ, Wang MM, Xiong ZF, Yang SL. Integrative Analysis of Siglec-15 mRNA in Human Cancers Based on Data Mining. J Cancer 2020; 11:2453-2464. [PMID: 32201516 PMCID: PMC7066007 DOI: 10.7150/jca.38747] [Citation(s) in RCA: 26] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2019] [Accepted: 01/19/2020] [Indexed: 12/16/2022] Open
Abstract
Objective: Cancer is expected to be the leading cause of death worldwide within the 21st century and is the single most important obstacle to extending life expectancy. Unfortunately, the most effective approach to combating cancers remains a complex and unsolved problem. Siglec-15 is a member of the Siglec family and plays a conserved regulatory role in the immune system of vertebrates. Previous studies on Siglec-15 have focused on its function in osteoclast regulation. The purpose of this study was to explore the significance of Siglec-15 mRNA in human cancer mainly based on information obtained from online databases. Method: Data were collected from several online databases. Serial analysis of gene expression (SAGE) and Virtual Northern, UALCAN Database Analysis, Catalog of Somatic Mutations in Cancer (COSMIC) analysis, the cBio cancer genomics portal, Cancer Regulome tools and data, Kaplan-Meier Plotter Analysis and the UCSC Xena website were used to analyze the data. Results: Compared with normal tissues, Siglec-15 up-regulation was widely observed in tuomrs. Differences in Siglec-15 expression were associated with different prognoses. Siglec-15 mutations are widely observed in tumors and interact with different genes in different cancer types. Conclusion: Siglec-15 is a potential target for the expansion of cancer immunotherapy.
Collapse
Affiliation(s)
- Qiu-Ting Li
- Division of Gastroenterology, Liyuan Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan 430077, China
| | - Zao-Zao Huang
- Yangchunhu community Hospital, Liyuan Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan 430077, China
| | - Yao-Bin Chen
- Institute of Pathology, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan 430030, China
| | - Hong-Yi Yao
- Department of Rehabilitation, Liyuan Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan 430077, China
| | - Zun-Hui Ke
- Wuhan Children's Hospital (Wuhan Maternal and Child Healthcare Hospital), Tongji Medical College, Huazhong University of Science & Technology, Wuhan 430015, China
| | - Xiao-Xiao He
- Division of Gastroenterology, Liyuan Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan 430077, China
| | - Meng-Jun Qiu
- Division of Gastroenterology, Liyuan Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan 430077, China
| | - Meng-Meng Wang
- Division of Gastroenterology, Liyuan Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan 430077, China
| | - Zhi-Fan Xiong
- Division of Gastroenterology, Liyuan Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan 430077, China
| | - Sheng-Li Yang
- Cancer Center, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, 1277 JieFang Avenue, Wuhan 430022, China
| |
Collapse
|
2
|
Corrada D, Morra G, Colombo G. Investigating allostery in molecular recognition: insights from a computational study of multiple antibody-antigen complexes. J Phys Chem B 2013; 117:535-52. [PMID: 23240736 DOI: 10.1021/jp310753z] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Antibody-antigen recognition plays a key role in the immune response against pathogens. Here, we have investigated various aspects of this problem by analyzing a large and diverse set of antibodies and their respective complexes with protein antigens through atomistic simulations. Common features of antibody response to the presence of antigens are elucidated by the analysis of the proteins' internal dynamics and coordination in different ligand states, combined with the analysis of the interaction networks implicated in the stabilization of functional structures. The use of a common structural reference reveals preferential changes in the dynamic coordination and intramolecular interaction networks induced by antigen binding and shared by all antibodies. Such changes propagate from the binding region through the whole immunoglobulin domains. Overall, complexed antibodies show more diffuse networks of nonbonded interactions and a general higher internal dynamic coordination, which preferentially involve the immunoglobulin (Ig) domains of the heavy chain. The combined results provide atomistic insights into the correlations between the modulation of conformational dynamics, structural stability, and allosteric signal transduction. In particular, the results suggest that specific networks of residues, shared among all the analyzed proteins, define the molecular pathways by which antibody structures respond to antigen binding. Our studies may have implications in practical use, such as the rational design of antibodies with specifically modulated antigen-binding affinities.
Collapse
Affiliation(s)
- Dario Corrada
- Istituto di Chimica del Riconoscimento Molecolare - Consiglio Nazionale delle Ricerche (CNR-ICRM), via Mario Bianco 9, 20131 Milano, Italy
| | | | | |
Collapse
|
3
|
Bianchetti L, Kieffer D, Féderkeil R, Poch O. Increased frequency of single base substitutions in a population of transcripts expressed in cancer cells. BMC Cancer 2012; 12:509. [PMID: 23137041 PMCID: PMC3522053 DOI: 10.1186/1471-2407-12-509] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2012] [Accepted: 10/09/2012] [Indexed: 12/03/2022] Open
Abstract
Background Single Base Substitutions (SBS) that alter transcripts expressed in cancer originate from somatic mutations. However, recent studies report SBS in transcripts that are not supported by the genomic DNA of tumor cells. Methods We used sequence based whole genome expression profiling, namely Long-SAGE (L-SAGE) and Tag-seq (a combination of L-SAGE and deep sequencing), and computational methods to identify transcripts with greater SBS frequencies in cancer. Millions of tags produced by 40 healthy and 47 cancer L-SAGE experiments were compared to 1,959 Reference Tags (RT), i.e. tags matching the human genome exactly once. Similarly, tens of millions of tags produced by 7 healthy and 8 cancer Tag-seq experiments were compared to 8,572 RT. For each transcript, SBS frequencies in healthy and cancer cells were statistically tested for equality. Results In the L-SAGE and Tag-seq experiments, 372 and 4,289 transcripts respectively, showed greater SBS frequencies in cancer. Increased SBS frequencies could not be attributed to known Single Nucleotide Polymorphisms (SNP), catalogued somatic mutations or RNA-editing enzymes. Hypothesizing that Single Tags (ST), i.e. tags sequenced only once, were indicators of SBS, we observed that ST proportions were heterogeneously distributed across Embryonic Stem Cells (ESC), healthy differentiated and cancer cells. ESC had the lowest ST proportions, whereas cancer cells had the greatest. Finally, in a series of experiments carried out on a single patient at 1 healthy and 3 consecutive tumor stages, we could show that SBS frequencies increased during cancer progression. Conclusion If the mechanisms generating the base substitutions could be known, increased SBS frequency in transcripts would be a new useful biomarker of cancer. With the reduction of sequencing cost, sequence based whole genome expression profiling could be used to characterize increased SBS frequency in patient’s tumor and aid diagnostic.
Collapse
Affiliation(s)
- Laurent Bianchetti
- Plate-forme Bioinformatique de Strasbourg (BIPS), Institut de Génétique et de Biologie Moléculaire et Cellulaire (CNRS/INSERM/ULP), Illkirch, Cedex, France.
| | | | | | | |
Collapse
|
4
|
Vidal DO, de Souza JES, Pires LC, Masotti C, Salim ACM, Costa MCF, Galante PAF, de Souza SJ, Camargo AA. Analysis of allelic differential expression in the human genome using allele-specific serial analysis of gene expression tags. Genome 2011; 54:120-7. [PMID: 21326368 DOI: 10.1139/g10-103] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023]
Abstract
Recent reports have demonstrated that a significant proportion of human genes display allelic differential expression (ADE). ADE is associated with phenotypic variability and may contribute to complex genetic diseases. Here, we present a computational analysis of ADE using allele-specific serial analysis of gene expression (SAGE) tags representing 1295 human genes. We identified 472 genes for which unequal representation (>3-fold) of allele-specific SAGE tags was observed in at least one SAGE library, suggesting the occurrence of ADE. For 235 out of these 472 genes, the difference in the expression level between both allele-specific SAGE tags was statistically significant (p < 0.05). Eleven candidate genes were then subjected to experimental validation and ADE was confirmed for 8 out of these 11 genes. Our results suggest that at least 25% of the human genes display ADE and that allele-specific SAGE tags can be efficiently used for the identification of such genes.
Collapse
Affiliation(s)
- Daniel Onofre Vidal
- Ludwig Institute for Cancer Research, São Paulo Branch, Rua João Julião, 245 - 1°Andar, 01323-903, São Paulo, SP, Brazil
| | | | | | | | | | | | | | | | | |
Collapse
|
5
|
Molina C, Zaman-Allah M, Khan F, Fatnassi N, Horres R, Rotter B, Steinhauer D, Amenc L, Drevon JJ, Winter P, Kahl G. The salt-responsive transcriptome of chickpea roots and nodules via deepSuperSAGE. BMC PLANT BIOLOGY 2011; 11:31. [PMID: 21320317 PMCID: PMC3045889 DOI: 10.1186/1471-2229-11-31] [Citation(s) in RCA: 53] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/27/2010] [Accepted: 02/14/2011] [Indexed: 05/20/2023]
Abstract
BACKGROUND The combination of high-throughput transcript profiling and next-generation sequencing technologies is a prerequisite for genome-wide comprehensive transcriptome analysis. Our recent innovation of deepSuperSAGE is based on an advanced SuperSAGE protocol and its combination with massively parallel pyrosequencing on Roche's 454 sequencing platform. As a demonstration of the power of this combination, we have chosen the salt stress transcriptomes of roots and nodules of the third most important legume crop chickpea (Cicer arietinum L.). While our report is more technology-oriented, it nevertheless addresses a major world-wide problem for crops generally: high salinity. Together with low temperatures and water stress, high salinity is responsible for crop losses of millions of tons of various legume (and other) crops. Continuously deteriorating environmental conditions will combine with salinity stress to further compromise crop yields. As a good example for such stress-exposed crop plants, we started to characterize salt stress responses of chickpeas on the transcriptome level. RESULTS We used deepSuperSAGE to detect early global transcriptome changes in salt-stressed chickpea. The salt stress responses of 86,919 transcripts representing 17,918 unique 26 bp deepSuperSAGE tags (UniTags) from roots of the salt-tolerant variety INRAT-93 two hours after treatment with 25 mM NaCl were characterized. Additionally, the expression of 57,281 transcripts representing 13,115 UniTags was monitored in nodules of the same plants. From a total of 144,200 analyzed 26 bp tags in roots and nodules together, 21,401 unique transcripts were identified. Of these, only 363 and 106 specific transcripts, respectively, were commonly up- or down-regulated (>3.0-fold) under salt stress in both organs, witnessing a differential organ-specific response to stress.Profiting from recent pioneer works on massive cDNA sequencing in chickpea, more than 9,400 UniTags were able to be linked to UniProt entries. Additionally, gene ontology (GO) categories over-representation analysis enabled to filter out enriched biological processes among the differentially expressed UniTags. Subsequently, the gathered information was further cross-checked with stress-related pathways. From several filtered pathways, here we focus exemplarily on transcripts associated with the generation and scavenging of reactive oxygen species (ROS), as well as on transcripts involved in Na+ homeostasis. Although both processes are already very well characterized in other plants, the information generated in the present work is of high value. Information on expression profiles and sequence similarity for several hundreds of transcripts of potential interest is now available. CONCLUSIONS This report demonstrates, that the combination of the high-throughput transcriptome profiling technology SuperSAGE with one of the next-generation sequencing platforms allows deep insights into the first molecular reactions of a plant exposed to salinity. Cross validation with recent reports enriched the information about the salt stress dynamics of more than 9,000 chickpea ESTs, and enlarged their pool of alternative transcripts isoforms. As an example for the high resolution of the employed technology that we coin deepSuperSAGE, we demonstrate that ROS-scavenging and -generating pathways undergo strong global transcriptome changes in chickpea roots and nodules already 2 hours after onset of moderate salt stress (25 mM NaCl). Additionally, a set of more than 15 candidate transcripts are proposed to be potential components of the salt overly sensitive (SOS) pathway in chickpea. Newly identified transcript isoforms are potential targets for breeding novel cultivars with high salinity tolerance. We demonstrate that these targets can be integrated into breeding schemes by micro-arrays and RT-PCR assays downstream of the generation of 26 bp tags by SuperSAGE.
Collapse
Affiliation(s)
- Carlos Molina
- Molecular BioSciences, Biocenter, Johann Wolfgang Goethe University, Max-von-Laue-Str. 9, D-60439 Frankfurt am Main, Germany
- Unité de Recherche en Légumineuses, INRA-URLEG, 17 Rue Sully, 21000 Dijon, France
| | | | - Faheema Khan
- Molecular BioSciences, Biocenter, Johann Wolfgang Goethe University, Max-von-Laue-Str. 9, D-60439 Frankfurt am Main, Germany
- Molecular Ecology Laboratory, Department of Botany, Jamia Hamdard University, New Delhi, India
| | - Nadia Fatnassi
- Estación Experimental del Zaidín, CSIC, C/Profesor Albareda, 1, 18008-Granada, Spain
| | - Ralf Horres
- Molecular BioSciences, Biocenter, Johann Wolfgang Goethe University, Max-von-Laue-Str. 9, D-60439 Frankfurt am Main, Germany
| | - Björn Rotter
- GenXPro GmbH, Frankfurt Innovation Center FIZ Biotechnology, Altendörferallee 3, D-60438 Frankfurt am Main, Germany
| | - Diana Steinhauer
- GenXPro GmbH, Frankfurt Innovation Center FIZ Biotechnology, Altendörferallee 3, D-60438 Frankfurt am Main, Germany
| | - Laurie Amenc
- Soil Symbiosis and Environment, INRA, 1 place Viala, 34060 Montpellier-Cedex, France
| | - Jean-Jacques Drevon
- Soil Symbiosis and Environment, INRA, 1 place Viala, 34060 Montpellier-Cedex, France
| | - Peter Winter
- GenXPro GmbH, Frankfurt Innovation Center FIZ Biotechnology, Altendörferallee 3, D-60438 Frankfurt am Main, Germany
| | - Günter Kahl
- Molecular BioSciences, Biocenter, Johann Wolfgang Goethe University, Max-von-Laue-Str. 9, D-60439 Frankfurt am Main, Germany
| |
Collapse
|
6
|
Galante PAF, Sandhu D, de Sousa Abreu R, Gradassi M, Slager N, Vogel C, de Souza SJ, Penalva LOF. A comprehensive in silico expression analysis of RNA binding proteins in normal and tumor tissue: Identification of potential players in tumor formation. RNA Biol 2009; 6:426-33. [PMID: 19458496 DOI: 10.4161/rna.6.4.8841] [Citation(s) in RCA: 43] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
Abstract
RNA binding proteins (RBPs) are involved in several post-transcriptional stages of gene expression and dictate the quality and quantity of the cellular proteome. When aberrantly expressed, they can lead to disease states as well as cancers. A basic requirement to understand their role in normal tissue development and cancer is the build of comprehensive gene expression maps. In this direction, we generated a list with 383 human RBPs based on the NCBI and EMSEMBL databases. SAGE and MPSS were then used to verify their levels of expression in normal tissues while SAGE and microarray datasets were used to perform comparisons between normal and tumor tissues. As main outcomes of our studies, we identified clusters of co-expressed or co-regulated genes that could act together in the development and maintenance of specific tissues; we also obtained a high confidence list of RBPs aberrantly expressed in several tumor types. This later list contains potential candidates to be explored as diagnostic and prognostic markers as well as putative targets for cancer therapy approaches.
Collapse
Affiliation(s)
- Pedro A F Galante
- Ludwig Institute for Cancer Research-São Paulo Branch, São Paulo, Brazil
| | | | | | | | | | | | | | | |
Collapse
|
7
|
Philippe N, Boureux A, Bréhélin L, Tarhio J, Commes T, Rivals E. Using reads to annotate the genome: influence of length, background distribution, and sequence errors on prediction capacity. Nucleic Acids Res 2009; 37:e104. [PMID: 19531739 PMCID: PMC2731892 DOI: 10.1093/nar/gkp492] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Ultra high-throughput sequencing is used to analyse the transcriptome or interactome at unprecedented depth on a genome-wide scale. These techniques yield short sequence reads that are then mapped on a genome sequence to predict putatively transcribed or protein-interacting regions. We argue that factors such as background distribution, sequence errors, and read length impact on the prediction capacity of sequence census experiments. Here we suggest a computational approach to measure these factors and analyse their influence on both transcriptomic and epigenomic assays. This investigation provides new clues on both methodological and biological issues. For instance, by analysing chromatin immunoprecipitation read sets, we estimate that 4.6% of reads are affected by SNPs. We show that, although the nucleotide error probability is low, it significantly increases with the position in the sequence. Choosing a read length above 19 bp practically eliminates the risk of finding irrelevant positions, while above 20 bp the number of uniquely mapped reads decreases. With our procedure, we obtain 0.6% false positives among genomic locations. Hence, even rare signatures should identify biologically relevant regions, if they are mapped on the genome. This indicates that digital transcriptomics may help to characterize the wealth of yet undiscovered, low-abundance transcripts.
Collapse
Affiliation(s)
- Nicolas Philippe
- Laboratoire d'Informatique, de Robotique et de Microélectronique, Université de Montpellier II, UMR 5506 CNRS, 34392 Montpellier, France
| | | | | | | | | | | |
Collapse
|
8
|
Yan H, Chin ML, Horvath EA, Kane EA, Pfleger CM. Impairment of ubiquitylation by mutation in Drosophila E1 promotes both cell-autonomous and non-cell-autonomous Ras-ERK activation in vivo. J Cell Sci 2009; 122:1461-70. [PMID: 19366732 PMCID: PMC2721006 DOI: 10.1242/jcs.042267] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/12/2009] [Indexed: 12/31/2022] Open
Abstract
Ras signaling can promote proliferation, cell survival and differentiation. Mutations in components of the Ras pathway are found in many solid tumors and are associated with developmental disorders. We demonstrate here that Drosophila tissues containing hypomorphic mutations in E1, the most upstream enzyme in the ubiquitin pathway, display cell-autonomous upregulation of Ras-ERK activity and Ras-dependent ectopic proliferation. Ubiquitylation is widely accepted to regulate receptor tyrosine kinase (RTK) endocytosis upstream of Ras. However, although the ectopic proliferation of E1 hypomorphs is dramatically suppressed by removing one copy of Ras, removal of the more upstream components Egfr, Grb2 or sos shows no suppression. Thus, decreased ubiquitylation may lead to growth-relevant Ras-ERK activation by failing to regulate a step downstream of RTK endocytosis. We further demonstrate that Drosophila Ras is ubiquitylated. Our findings suggest that Ras ubiquitylation restricts growth and proliferation in vivo. We also report our intriguing observation that complete inactivation of E1 causes non-autonomous activation of Ras-ERK in adjacent tissue, mimicking oncogenic Ras overexpression. We demonstrate that maintaining sufficient E1 function is required both cell autonomously and non-cell autonomously to prevent inappropriate Ras-ERK-dependent growth and proliferation in vivo and may implicate loss of Ras ubiquitylation in developmental disorders and cancer.
Collapse
Affiliation(s)
- Hua Yan
- Department of Oncological Sciences, The Mount Sinai School of Medicine, New York, NY 10029, USA
| | | | | | | | | |
Collapse
|
9
|
Matukumalli LK, Schroeder SG. Sequence Based Gene Expression Analysis. Bioinformatics 2009. [DOI: 10.1007/978-0-387-92738-1_9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022] Open
|
10
|
Interrogating global gene expression in rat neuronal cultures using SAGE. Neurotox Res 2008; 12:209-14. [PMID: 18201949 DOI: 10.1007/bf03033905] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
Abstract
The normal function of the mammalian brain is regulated by complex networks of interactions between cells and molecules, which are to a considerable extent dependent on mechanisms of transcriptional regulation. Disruption of such interactions by neurotoxic stimuli may lead to severe forms of dementia and to other neuropsychiatric disorders. Therefore, critical insight into mechanisms of neuronal dysfunction may be obtained by examining global patterns of gene expression in mammalian models of neurotoxicity. In this regard, the combined use of rat neuronal cultures and serial analysis of gene expression (SAGE) can be viewed as a general platform to enable the search for molecular targets involved in neurotoxic processes. Here, we discuss potential advantages of this approach, highlighting the need for generation of robust SAGE libraries from rat neuronal cultures. The availability and current limitations of bioinformatics tools for SAGE data derived from rat samples is also discussed.
Collapse
|
11
|
Guo M, Yang S, Rupe M, Hu B, Bickel DR, Arthur L, Smith O. Genome-wide allele-specific expression analysis using Massively Parallel Signature Sequencing (MPSS) reveals cis- and trans-effects on gene expression in maize hybrid meristem tissue. PLANT MOLECULAR BIOLOGY 2008; 66:551-63. [PMID: 18224447 DOI: 10.1007/s11103-008-9290-z] [Citation(s) in RCA: 63] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/13/2007] [Accepted: 01/08/2008] [Indexed: 05/24/2023]
Abstract
Allelic differences in expression are important genetic factors contributing to quantitative trait variation in various organisms. However, the extent of genome-wide allele-specific expression by different modes of gene regulation has not been well characterized in plants. In this study we developed a new methodology for allele-specific expression analysis by applying Massively Parallel Signature Sequencing (MPSS), an open ended and sequencing based mRNA profiling technology. This methodology enabled a genome-wide evaluation of cis- and trans-effects on allelic expression in six meristem stages of the maize hybrid. Summarization of data from nearly 400 pairs of MPSS allelic signature tags showed that 60% of the genes in the hybrid meristems exhibited differential allelic expression. Because both alleles are subjected to the same trans-acting factors in the hybrid, the data suggest the abundance of cis-regulatory differences in the genome. Comparing the same allele expressed in the hybrid versus its inbred parents showed that 40% of the genes were differentially expressed, suggesting different trans-acting effects present in different genotypes. Such trans-acting effects may result in gene expression in the hybrid different from allelic additive expression. With this approach we quantified gene expression in the hybrid relative to its inbred parents at the allele-specific level. As compared to measuring total transcript levels, this study provides a new level of understanding of different modes of gene regulation in the hybrid and the molecular basis of heterosis.
Collapse
Affiliation(s)
- Mei Guo
- Pioneer Hi-Bred International, Inc., A DuPont Business, Johnston, IA, 50131-0552, USA.
| | | | | | | | | | | | | |
Collapse
|
12
|
Zhu J, He F, Wang J, Yu J. Modeling transcriptome based on transcript-sampling data. PLoS One 2008; 3:e1659. [PMID: 18286206 PMCID: PMC2243018 DOI: 10.1371/journal.pone.0001659] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2007] [Accepted: 01/21/2008] [Indexed: 01/10/2023] Open
Abstract
Background Newly-evolved multiplex sequencing technology has been bringing transcriptome sequencing into an unprecedented depth. Millions of transcript tags now can be acquired in a single experiment through parallelization. The significant increase in throughput and reduction in cost required us to address some fundamental questions, such as how many transcript tags do we have to sequence for a given transcriptome? How could we estimate the total number of unique transcripts for different cell types (transcriptome diversity) and the distribution of their copy numbers (transcriptome dynamics)? What is the probability that a transcript with a given expression level to be detected at a certain sampling depth? Methodology/Principal Findings We developed a statistical model to evaluate these parameters based on transcriptome-sampling data. Three mixture models were exploited for their potentials to model the sampling frequencies. We demonstrated that relative abundances of all transcripts in a transcriptome follow the generalized inverse Gaussian distribution. The widely known beta and gamma distributions failed to fulfill the singular characteristics of relative abundance distribution, i.e., highly skewed toward zero and with a long tail. An estimator of transcriptome diversity and an analytical form of sampling growth curve were proposed in a coherent framework. Experimental data fitted this model very well and Monte Carlo simulations based on this model replicated sampling experiments in a remarkable precision. Conclusions Taking human embryonic stem cell as a prototype, we demonstrated that sequencing tens of thousands of transcript tags in an ordinary EST/SAGE experiment was far from sufficient. In order to fully characterize a human transcriptome, millions of transcript tags had to be sequenced. This model lays a statistical basis for transcriptome-sampling experiments and in essence can be used in all sampling-based data.
Collapse
Affiliation(s)
- Jiang Zhu
- Chinese Academy of Sciences (CAS) Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China
- Graduate University of Chinese Academy of Sciences, Beijing, China
| | - Fuhong He
- Chinese Academy of Sciences (CAS) Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China
- Graduate University of Chinese Academy of Sciences, Beijing, China
| | - Jing Wang
- Chinese Academy of Sciences (CAS) Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China
- * To whom correspondence should be addressed. E-mail: (JW); (JY)
| | - Jun Yu
- Chinese Academy of Sciences (CAS) Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China
- * To whom correspondence should be addressed. E-mail: (JW); (JY)
| |
Collapse
|
13
|
Roach JC, Smith KD, Strobe KL, Nissen SM, Haudenschild CD, Zhou D, Vasicek TJ, Held GA, Stolovitzky GA, Hood LE, Aderem A. Transcription factor expression in lipopolysaccharide-activated peripheral-blood-derived mononuclear cells. Proc Natl Acad Sci U S A 2007; 104:16245-50. [PMID: 17913878 PMCID: PMC2042192 DOI: 10.1073/pnas.0707757104] [Citation(s) in RCA: 52] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
Transcription factors play a key role in integrating and modulating biological information. In this study, we comprehensively measured the changing abundances of mRNAs over a time course of activation of human peripheral-blood-derived mononuclear cells ("macrophages") with lipopolysaccharide. Global and dynamic analysis of transcription factors in response to a physiological stimulus has yet to be achieved in a human system, and our efforts significantly advanced this goal. We used multiple global high-throughput technologies for measuring mRNA levels, including massively parallel signature sequencing and GeneChip microarrays. We identified 92 of 1,288 known human transcription factors as having significantly measurable changes during our 24-h time course. At least 42 of these changes were previously unidentified in this system. Our data demonstrate that some transcription factors operate in a functional range below 10 transcripts per cell, whereas others operate in a range three orders of magnitude greater. The highly reproducible response of many mRNAs indicates feedback control. A broad range of activation kinetics was observed; thus, combinatorial regulation by small subsets of transcription factors would permit almost any timing input to cis-regulatory elements controlling gene transcription.
Collapse
Affiliation(s)
- Jared C. Roach
- *Institute for Systems Biology, 1441 North 34th Street, Seattle, WA 98103
- To whom correspondence may be addressed. E-mail: or
| | - Kelly D. Smith
- *Institute for Systems Biology, 1441 North 34th Street, Seattle, WA 98103
- Department of Pathology, University of Washington, Seattle, WA 98195
| | - Katie L. Strobe
- *Institute for Systems Biology, 1441 North 34th Street, Seattle, WA 98103
| | | | | | - Daixing Zhou
- Illumina, 25861 Industrial Boulevard, Hayward, CA 94545
| | | | - G. A. Held
- IBM Computational Biology Center, P.O. Box 218, Yorktown Heights, NY 10598
| | | | - Leroy E. Hood
- *Institute for Systems Biology, 1441 North 34th Street, Seattle, WA 98103
- To whom correspondence may be addressed. E-mail: or
| | - Alan Aderem
- *Institute for Systems Biology, 1441 North 34th Street, Seattle, WA 98103
| |
Collapse
|
14
|
Hene L, Sreenu VB, Vuong MT, Abidi SHI, Sutton JK, Rowland-Jones SL, Davis SJ, Evans EJ. Deep analysis of cellular transcriptomes - LongSAGE versus classic MPSS. BMC Genomics 2007; 8:333. [PMID: 17892551 PMCID: PMC2104538 DOI: 10.1186/1471-2164-8-333] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2007] [Accepted: 09/24/2007] [Indexed: 12/14/2022] Open
Abstract
BACKGROUND Deep transcriptome analysis will underpin a large fraction of post-genomic biology. 'Closed' technologies, such as microarray analysis, only detect the set of transcripts chosen for analysis, whereas 'open' e.g. tag-based technologies are capable of identifying all possible transcripts, including those that were previously uncharacterized. Although new technologies are now emerging, at present the major resources for open-type analysis are the many publicly available SAGE (serial analysis of gene expression) and MPSS (massively parallel signature sequencing) libraries. These technologies have never been compared for their utility in the context of deep transcriptome mining. RESULTS We used a single LongSAGE library of 503,431 tags and a "classic" MPSS library of 1,744,173 tags, both prepared from the same T cell-derived RNA sample, to compare the ability of each method to probe, at considerable depth, a human cellular transcriptome. We show that even though LongSAGE is more error-prone than MPSS, our LongSAGE library nevertheless generated 6.3-fold more genome-matching (and therefore likely error-free) tags than the MPSS library. An analysis of a set of 8,132 known genes detectable by both methods, and for which there is no ambiguity about tag matching, shows that MPSS detects only half (54%) the number of transcripts identified by SAGE (3,617 versus 1,955). Analysis of two additional MPSS libraries shows that each library samples a different subset of transcripts, and that in combination the three MPSS libraries (4,274,992 tags in total) still only detect 73% of the genes identified in our test set using SAGE. The fraction of transcripts detected by MPSS is likely to be even lower for uncharacterized transcripts, which tend to be more weakly expressed. The source of the loss of complexity in MPSS libraries compared to SAGE is unclear, but its effects become more severe with each sequencing cycle (i.e. as MPSS tag length increases). CONCLUSION We show that MPSS libraries are significantly less complex than much smaller SAGE libraries, revealing a serious bias in the generation of MPSS data unlikely to have been circumvented by later technological improvements. Our results emphasize the need for the rigorous testing of new expression profiling technologies.
Collapse
Affiliation(s)
- Lawrence Hene
- Nuffield Department of Clinical Medicine and MRC Human Immunology Unit, Weatherall Institute of Molecular Medicine, The University of Oxford, John Radcliffe Hospital, Headington, Oxford, OX3 9DS, UK
| | - Vattipally B Sreenu
- Nuffield Department of Clinical Medicine and MRC Human Immunology Unit, Weatherall Institute of Molecular Medicine, The University of Oxford, John Radcliffe Hospital, Headington, Oxford, OX3 9DS, UK
| | - Mai T Vuong
- Nuffield Department of Clinical Medicine and MRC Human Immunology Unit, Weatherall Institute of Molecular Medicine, The University of Oxford, John Radcliffe Hospital, Headington, Oxford, OX3 9DS, UK
| | - S Hussain I Abidi
- Nuffield Department of Clinical Medicine and MRC Human Immunology Unit, Weatherall Institute of Molecular Medicine, The University of Oxford, John Radcliffe Hospital, Headington, Oxford, OX3 9DS, UK
| | - Julian K Sutton
- Nuffield Department of Clinical Medicine and MRC Human Immunology Unit, Weatherall Institute of Molecular Medicine, The University of Oxford, John Radcliffe Hospital, Headington, Oxford, OX3 9DS, UK
| | - Sarah L Rowland-Jones
- Nuffield Department of Clinical Medicine and MRC Human Immunology Unit, Weatherall Institute of Molecular Medicine, The University of Oxford, John Radcliffe Hospital, Headington, Oxford, OX3 9DS, UK
| | - Simon J Davis
- Nuffield Department of Clinical Medicine and MRC Human Immunology Unit, Weatherall Institute of Molecular Medicine, The University of Oxford, John Radcliffe Hospital, Headington, Oxford, OX3 9DS, UK
| | - Edward J Evans
- Nuffield Department of Clinical Medicine and MRC Human Immunology Unit, Weatherall Institute of Molecular Medicine, The University of Oxford, John Radcliffe Hospital, Headington, Oxford, OX3 9DS, UK
| |
Collapse
|
15
|
Rivals E, Boureux A, Lejeune M, Ottones F, Pecharromàn Pérez O, Tarhio J, Pierrat F, Ruffle F, Commes T, Marti J. Transcriptome annotation using tandem SAGE tags. Nucleic Acids Res 2007; 35:e108. [PMID: 17709346 PMCID: PMC2034470 DOI: 10.1093/nar/gkm495] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Analysis of several million expressed gene signatures (tags) revealed an increasing number of different sequences, largely exceeding that of annotated genes in mammalian genomes. Serial analysis of gene expression (SAGE) can reveal new Poly(A) RNAs transcribed from previously unrecognized chromosomal regions. However, conventional SAGE tags are too short to identify unambiguously unique sites in large genomes. Here, we design a novel strategy with tags anchored on two different restrictions sites of cDNAs. New transcripts are then tentatively defined by the two SAGE tags in tandem and by the spanning sequence read on the genome between these tagged sites. Having developed a new algorithm to locate these tag-delimited genomic sequences (TDGS), we first validated its capacity to recognize known genes and its ability to reveal new transcripts with two SAGE libraries built in parallel from a single RNA sample. Our algorithm proves fast enough to experiment this strategy at a large scale. We then collected and processed the complete sets of human SAGE tags to predict yet unknown transcripts. A cross-validation with tiling arrays data shows that 47% of these TDGS overlap transcriptional active regions. Our method provides a new and complementary approach for complex transcriptome annotation.
Collapse
Affiliation(s)
- Eric Rivals
- Laboratoire d’Informatique, de Robotique et de Microélectronique, UMR 5506 CNRS – Université de Montpellier II, 161 rue Ada, 34392 Montpellier 05, Institut de Génétique Humaine, CNRS UPR 1142, 141 rue de la Cardonille, 34396 Montpellier 05, France, Helsinki University of Technology, P.O. Box 5400, FI-02015 HUT, Finland and Skuld-Tech, 134, rue du Curat – Bat. Amarante, 34090 Montpellier, France
| | - Anthony Boureux
- Laboratoire d’Informatique, de Robotique et de Microélectronique, UMR 5506 CNRS – Université de Montpellier II, 161 rue Ada, 34392 Montpellier 05, Institut de Génétique Humaine, CNRS UPR 1142, 141 rue de la Cardonille, 34396 Montpellier 05, France, Helsinki University of Technology, P.O. Box 5400, FI-02015 HUT, Finland and Skuld-Tech, 134, rue du Curat – Bat. Amarante, 34090 Montpellier, France
| | - Mireille Lejeune
- Laboratoire d’Informatique, de Robotique et de Microélectronique, UMR 5506 CNRS – Université de Montpellier II, 161 rue Ada, 34392 Montpellier 05, Institut de Génétique Humaine, CNRS UPR 1142, 141 rue de la Cardonille, 34396 Montpellier 05, France, Helsinki University of Technology, P.O. Box 5400, FI-02015 HUT, Finland and Skuld-Tech, 134, rue du Curat – Bat. Amarante, 34090 Montpellier, France
| | - Florence Ottones
- Laboratoire d’Informatique, de Robotique et de Microélectronique, UMR 5506 CNRS – Université de Montpellier II, 161 rue Ada, 34392 Montpellier 05, Institut de Génétique Humaine, CNRS UPR 1142, 141 rue de la Cardonille, 34396 Montpellier 05, France, Helsinki University of Technology, P.O. Box 5400, FI-02015 HUT, Finland and Skuld-Tech, 134, rue du Curat – Bat. Amarante, 34090 Montpellier, France
| | - Oscar Pecharromàn Pérez
- Laboratoire d’Informatique, de Robotique et de Microélectronique, UMR 5506 CNRS – Université de Montpellier II, 161 rue Ada, 34392 Montpellier 05, Institut de Génétique Humaine, CNRS UPR 1142, 141 rue de la Cardonille, 34396 Montpellier 05, France, Helsinki University of Technology, P.O. Box 5400, FI-02015 HUT, Finland and Skuld-Tech, 134, rue du Curat – Bat. Amarante, 34090 Montpellier, France
| | - Jorma Tarhio
- Laboratoire d’Informatique, de Robotique et de Microélectronique, UMR 5506 CNRS – Université de Montpellier II, 161 rue Ada, 34392 Montpellier 05, Institut de Génétique Humaine, CNRS UPR 1142, 141 rue de la Cardonille, 34396 Montpellier 05, France, Helsinki University of Technology, P.O. Box 5400, FI-02015 HUT, Finland and Skuld-Tech, 134, rue du Curat – Bat. Amarante, 34090 Montpellier, France
| | - Fabien Pierrat
- Laboratoire d’Informatique, de Robotique et de Microélectronique, UMR 5506 CNRS – Université de Montpellier II, 161 rue Ada, 34392 Montpellier 05, Institut de Génétique Humaine, CNRS UPR 1142, 141 rue de la Cardonille, 34396 Montpellier 05, France, Helsinki University of Technology, P.O. Box 5400, FI-02015 HUT, Finland and Skuld-Tech, 134, rue du Curat – Bat. Amarante, 34090 Montpellier, France
| | - Florence Ruffle
- Laboratoire d’Informatique, de Robotique et de Microélectronique, UMR 5506 CNRS – Université de Montpellier II, 161 rue Ada, 34392 Montpellier 05, Institut de Génétique Humaine, CNRS UPR 1142, 141 rue de la Cardonille, 34396 Montpellier 05, France, Helsinki University of Technology, P.O. Box 5400, FI-02015 HUT, Finland and Skuld-Tech, 134, rue du Curat – Bat. Amarante, 34090 Montpellier, France
| | - Thérèse Commes
- Laboratoire d’Informatique, de Robotique et de Microélectronique, UMR 5506 CNRS – Université de Montpellier II, 161 rue Ada, 34392 Montpellier 05, Institut de Génétique Humaine, CNRS UPR 1142, 141 rue de la Cardonille, 34396 Montpellier 05, France, Helsinki University of Technology, P.O. Box 5400, FI-02015 HUT, Finland and Skuld-Tech, 134, rue du Curat – Bat. Amarante, 34090 Montpellier, France
- *To whom correspondence should be addressed. +33 4 67 14 42 36+33 4 67 14 42 36 Correspondence may also be addressed to Jacques Marti. +334 67 144241
| | - Jacques Marti
- Laboratoire d’Informatique, de Robotique et de Microélectronique, UMR 5506 CNRS – Université de Montpellier II, 161 rue Ada, 34392 Montpellier 05, Institut de Génétique Humaine, CNRS UPR 1142, 141 rue de la Cardonille, 34396 Montpellier 05, France, Helsinki University of Technology, P.O. Box 5400, FI-02015 HUT, Finland and Skuld-Tech, 134, rue du Curat – Bat. Amarante, 34090 Montpellier, France
| |
Collapse
|
16
|
Galante PAF, Vidal DO, de Souza JE, Camargo AA, de Souza SJ. Sense-antisense pairs in mammals: functional and evolutionary considerations. Genome Biol 2007; 8:R40. [PMID: 17371592 PMCID: PMC1868933 DOI: 10.1186/gb-2007-8-3-r40] [Citation(s) in RCA: 53] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2006] [Revised: 09/04/2006] [Accepted: 03/19/2007] [Indexed: 12/25/2022] Open
Abstract
Analysis of a catalog of S-AS pairs in the human and mouse genomes revealed several putative roles for natural antisense transcripts and showed that some are artifacts of cDNA library construction. Background A significant number of genes in mammalian genomes are being found to have natural antisense transcripts (NATs). These sense-antisense (S-AS) pairs are believed to be involved in several cellular phenomena. Results Here, we generated a catalog of S-AS pairs occurring in the human and mouse genomes by analyzing different sources of expressed sequences available in the public domain plus 122 massively parallel signature sequencing (MPSS) libraries from a variety of human and mouse tissues. Using this dataset of almost 20,000 S-AS pairs in both genomes we investigated, in a computational and experimental way, several putative roles that have been assigned to NATs, including gene expression regulation. Furthermore, these global analyses allowed us to better dissect and propose new roles for NATs. Surprisingly, we found that a significant fraction of NATs are artifacts produced by genomic priming during cDNA library construction. Conclusion We propose an evolutionary and functional model in which alternative polyadenylation and retroposition account for the origin of a significant number of functional S-AS pairs in mammalian genomes.
Collapse
Affiliation(s)
- Pedro AF Galante
- Ludwig Institute for Cancer Research, São Paulo Branch, Hospital Alemão Oswaldo Cruz, Rua João Juliao 245, 1 andar, São Paulo, SP 01323-903, Brazil
- Department Of Biochemistry, University of São Paulo, Av. Prof. Lineu Prestes, 748 - sala 351, São Paulo, SP 05508-900, Brazil
| | - Daniel O Vidal
- Ludwig Institute for Cancer Research, São Paulo Branch, Hospital Alemão Oswaldo Cruz, Rua João Juliao 245, 1 andar, São Paulo, SP 01323-903, Brazil
| | - Jorge E de Souza
- Ludwig Institute for Cancer Research, São Paulo Branch, Hospital Alemão Oswaldo Cruz, Rua João Juliao 245, 1 andar, São Paulo, SP 01323-903, Brazil
| | - Anamaria A Camargo
- Ludwig Institute for Cancer Research, São Paulo Branch, Hospital Alemão Oswaldo Cruz, Rua João Juliao 245, 1 andar, São Paulo, SP 01323-903, Brazil
| | - Sandro J de Souza
- Ludwig Institute for Cancer Research, São Paulo Branch, Hospital Alemão Oswaldo Cruz, Rua João Juliao 245, 1 andar, São Paulo, SP 01323-903, Brazil
| |
Collapse
|
17
|
Liu F, Jenssen TK, Trimarchi J, Punzo C, Cepko CL, Ohno-Machado L, Hovig E, Patrick Kuo W. Comparison of hybridization-based and sequencing-based gene expression technologies on biological replicates. BMC Genomics 2007; 8:153. [PMID: 17555589 PMCID: PMC1899500 DOI: 10.1186/1471-2164-8-153] [Citation(s) in RCA: 52] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2006] [Accepted: 06/07/2007] [Indexed: 02/06/2023] Open
Abstract
Background High-throughput systems for gene expression profiling have been developed and have matured rapidly through the past decade. Broadly, these can be divided into two categories: hybridization-based and sequencing-based approaches. With data from different technologies being accumulated, concerns and challenges are raised about the level of agreement across technologies. As part of an ongoing large-scale cross-platform data comparison framework, we report here a comparison based on identical samples between one-dye DNA microarray platforms and MPSS (Massively Parallel Signature Sequencing). Results The DNA microarray platforms generally provided highly correlated data, while moderate correlations between microarrays and MPSS were obtained. Disagreements between the two types of technologies can be attributed to limitations inherent to both technologies. The variation found between pooled biological replicates underlines the importance of exercising caution in identification of differential expression, especially for the purposes of biomarker discovery. Conclusion Based on different principles, hybridization-based and sequencing-based technologies should be considered complementary to each other, rather than competitive alternatives for measuring gene expression, and currently, both are important tools for transcriptome profiling.
Collapse
Affiliation(s)
- Fang Liu
- Department of Tumor Biology, Rikshopitalet-Radiumhospitalet Medical Center, Montebello, NO-0310 Oslo, Norway
- PubGene AS, Vinderen, NO-0319 Oslo, Norway
| | | | - Jeff Trimarchi
- Department of Genetics, Harvard Medical School, Boston, MA, USA
| | - Claudio Punzo
- Department of Genetics, Harvard Medical School, Boston, MA, USA
| | - Connie L Cepko
- Department of Genetics, Harvard Medical School, Boston, MA, USA
- Howard Hughes Medical Institute, Harvard Medical School, Boston, MA, USA
| | | | - Eivind Hovig
- Department of Tumor Biology, Rikshopitalet-Radiumhospitalet Medical Center, Montebello, NO-0310 Oslo, Norway
- Department of Medical Informatics, Rikshopitalet-Radiumhospitalet Medical Center, Montebello, NO-0310 Oslo, Norway
| | - Winston Patrick Kuo
- Decision Systems Group, Brigham and Women's Hospital, Boston, MA, USA
- Department of Developmental Biology, Harvard School of Dental Medicine, Boston, MA, USA
- Department of Organismic and Evolutionary Biology/Faculty of Arts and Sciences, Harvard University, Cambridge, MA, USA
| |
Collapse
|
18
|
Keime C, Sémon M, Mouchiroud D, Duret L, Gandrillon O. Unexpected observations after mapping LongSAGE tags to the human genome. BMC Bioinformatics 2007; 8:154. [PMID: 17504516 PMCID: PMC1884178 DOI: 10.1186/1471-2105-8-154] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2007] [Accepted: 05/15/2007] [Indexed: 01/15/2023] Open
Abstract
Background SAGE has been used widely to study the expression of known transcripts, but much less to annotate new transcribed regions. LongSAGE produces tags that are sufficiently long to be reliably mapped to a whole-genome sequence. Here we used this property to study the position of human LongSAGE tags obtained from all public libraries. We focused mainly on tags that do not map to known transcripts. Results Using a published error rate in SAGE libraries, we first removed the tags likely to result from sequencing errors. We then observed that an unexpectedly large number of the remaining tags still did not match the genome sequence. Some of these correspond to parts of human mRNAs, such as polyA tails, junctions between two exons and polymorphic regions of transcripts. Another non-negligible proportion can be attributed to contamination by murine transcripts and to residual sequencing errors. After filtering out our data with these screens to ensure that our dataset is highly reliable, we studied the tags that map once to the genome. 31% of these tags correspond to unannotated transcripts. The others map to known transcribed regions, but many of them (nearly half) are located either in antisense or in new variants of these known transcripts. Conclusion We performed a comprehensive study of all publicly available human LongSAGE tags, and carefully verified the reliability of these data. We found the potential origin of many tags that did not match the human genome sequence. The properties of the remaining tags imply that the level of sequencing error may have been under-estimated. The frequency of tags matching once the genome sequence but not in an annotated exon suggests that the human transcriptome is much more complex than shown by the current human genome annotations, with many new splicing variants and antisense transcripts. SAGE data is appropriate to map new transcripts to the genome, as demonstrated by the high rate of cross-validation of the corresponding tags using other methods.
Collapse
Affiliation(s)
- Céline Keime
- Université de Lyon, Lyon, F-69003, France ; Université Lyon 1, Lyon, F-69003, France, CNRS, UMR5534, Centre de génétique moléculaire et cellulaire, Villeurbanne, F-69622, France
- Université de Lyon, Lyon, F-69003, France ; Université Lyon 1, Lyon, F-69003, France, CNRS, UMR5558, Laboratoire de Biométrie et Biologie Evolutive, Villeurbanne, F-69622, France
| | - Marie Sémon
- Smurfit Institute of Genetics, Trinity College Dublin, Dublin 2, Ireland
| | - Dominique Mouchiroud
- Université de Lyon, Lyon, F-69003, France ; Université Lyon 1, Lyon, F-69003, France, CNRS, UMR5558, Laboratoire de Biométrie et Biologie Evolutive, Villeurbanne, F-69622, France
| | - Laurent Duret
- Université de Lyon, Lyon, F-69003, France ; Université Lyon 1, Lyon, F-69003, France, CNRS, UMR5558, Laboratoire de Biométrie et Biologie Evolutive, Villeurbanne, F-69622, France
| | - Olivier Gandrillon
- Université de Lyon, Lyon, F-69003, France ; Université Lyon 1, Lyon, F-69003, France, CNRS, UMR5534, Centre de génétique moléculaire et cellulaire, Villeurbanne, F-69622, France
| |
Collapse
|
19
|
Galante PAF, Trimarchi J, Cepko CL, de Souza SJ, Ohno-Machado L, Kuo WP. Automatic correspondence of tags and genes (ACTG): a tool for the analysis of SAGE, MPSS and SBS data. Bioinformatics 2007; 23:903-5. [PMID: 17277333 DOI: 10.1093/bioinformatics/btm023] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
UNLABELLED A critical step in any SAGE, MPSS and SBS data analysis is tag-to-gene assignment. Current available tools are limited by a tag-by-tag annotation process and/or do not provide the dataset that is used to produce a complete tag-to-gene mapping. We developed ACTG, a web-based application that allows a large-scale tag-to-gene mapping using several reference datasets. ACTG can annotate SAGE (14 or 21 bp), MPSS (17 or 20 bp) and SBS (16 bp) data for both human and mouse organisms. AVAILABILITY http://retina.med.harvard.edu/ACTG/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Pedro A F Galante
- Departamento de Bioquímica, Instituto de Química, Universidade de São Paulo, São Paulo, Brazil.
| | | | | | | | | | | |
Collapse
|
20
|
Vos JB, Datson NA, Rabe KF, Hiemstra PS. Exploring host-pathogen interactions at the epithelial surface: application of transcriptomics in lung biology. Am J Physiol Lung Cell Mol Physiol 2007; 292:L367-77. [PMID: 17041013 DOI: 10.1152/ajplung.00242.2006] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
Abstract
The epithelial surface of the airways is the largest barrier-forming interface between the human body and the outside world. It is now well recognized that, at this strategic position, airway epithelial cells play an eminent role in host defense by recognizing and responding to microbial exposure. Conversely, inhaled microorganisms also respond to contact with epithelial cells. Our understanding of this cross talk is limited, requiring sophisticated experimental approaches to analyze these complex interactions. High-throughput technologies, such as DNA microarray analysis and serial analysis of gene expression (SAGE), have been developed to screen for gene expression levels at large scale within single experiments. Since their introduction, these hypothesis-generating technologies have been widely used in diverse areas such as oncology and brain research. Successful application of these genomics-based technologies has also revealed novel insights in host-pathogen interactions in both the host and pathogen. This review aims to provide an overview of the SAGE and microarray technology illustrated by their application in the analysis of host-pathogen interactions. In particular, the interactions between epithelial cells in the human lungs and clinically relevant microorganisms are the central focus of this review.
Collapse
Affiliation(s)
- Joost B Vos
- Department of Pulmonology, Leiden Amsterdam Center for Drug Research, Leiden University Medical Center, Leiden, The Netherlands
| | | | | | | |
Collapse
|
21
|
Wang SM. Understanding SAGE data. Trends Genet 2006; 23:42-50. [PMID: 17109989 DOI: 10.1016/j.tig.2006.11.001] [Citation(s) in RCA: 43] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2006] [Revised: 10/05/2006] [Accepted: 11/01/2006] [Indexed: 02/08/2023]
Abstract
Serial analysis of gene expression (SAGE) is a method for identifying and quantifying transcripts from eukaryotic genomes. Since its invention, SAGE has been widely applied to analyzing gene expression in many biological and medical studies. Vast amounts of SAGE data have been collected and more than a thousand SAGE-related studies have been published since the mid-1990s. The principle of SAGE has been developed to address specific issues such as determination of normal gene structure and identification of abnormal genome structural changes. This review focuses on the general features of SAGE data, including the specificity of SAGE tags with respect to their original transcripts, the quantitative nature of SAGE data for differentially expressed genes, the reproducibility, the comparability of SAGE with microarray and the future potential of SAGE. Understanding these basic features should aid the proper interpretation of SAGE data to address biological and medical questions.
Collapse
Affiliation(s)
- San Ming Wang
- Center for Functional Genomics, ENH Research Institute, Robert H. Lurie Comprehensive Cancer Center, Northwestern University, 1001 University Place, Evanston, IL 60201, USA.
| |
Collapse
|
22
|
Richards M, Tan SP, Chan WK, Bongso A. Reverse Serial Analysis of Gene Expression (SAGE) Characterization of Orphan SAGE Tags from Human Embryonic Stem Cells Identifies the Presence of Novel Transcripts and Antisense Transcription of Key Pluripotency Genes. Stem Cells 2006; 24:1162-73. [PMID: 16456128 DOI: 10.1634/stemcells.2005-0304] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023]
Abstract
Serial analysis of gene expression (SAGE) is a powerful technique for the analysis of gene expression. A significant portion of SAGE tags, designated as orphan tags, however, cannot be reliably assigned to known transcripts. We used an improved reverse SAGE (rSAGE) strategy to convert human embryonic stem cell (hESC)-specific orphan SAGE tags into longer 3' cDNAs. We show that the systematic analysis of these 3' cDNAs permitted the discovery of hESC-specific novel transcripts and cis-natural antisense transcripts (cis-NATs) and improved the assignment of SAGE tags that resulted from splice variants, insertion/deletion, and single-nucleotide polymorphisms. More importantly, this is the first description of cis-NATs for several key pluripotency markers in hESCs and mouse embryonic stem cells, suggesting that the formation of short interfering RNA could be an important regulatory mechanism. A systematic large-scale analysis of the remaining orphan SAGE tags in the hESC SAGE libraries by rSAGE or other 3' cDNA extension strategies should unravel additional novel transcripts and cis-NATs that are specifically expressed in hESCs. Besides contributing to the complete catalog of human transcripts, many of them should prove to be a valuable resource for the elucidation of the molecular pathways involved in the self-renewal and lineage commitment of hESCs.
Collapse
Affiliation(s)
- Mark Richards
- Department of Biological Sciences, National University of Singapore, 14 Science Drive 4, Singapore 117543
| | | | | | | |
Collapse
|
23
|
Fedurco M, Romieu A, Williams S, Lawrence I, Turcatti G. BTA, a novel reagent for DNA attachment on glass and efficient generation of solid-phase amplified DNA colonies. Nucleic Acids Res 2006; 34:e22. [PMID: 16473845 PMCID: PMC1363783 DOI: 10.1093/nar/gnj023] [Citation(s) in RCA: 157] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022] Open
Abstract
The tricarboxylate reagent benzene-1,3,5-triacetic acid (BTA) was used to attach 5′-aminated DNA primers and templates on an aminosilanized glass surface for subsequent generation of DNA colonies by in situ solid-phase amplification. We have characterized the derivatized surfaces for the chemical attachment of oligonucleotides and evaluate the properties relevant for the amplification process: surface density, thermal stability towards thermocycling, functionalization reproducibility and storage stability. The derivatization process, first developed for glass slides, was then adapted to microfabricated glass channels containing integrated fluidic connections. This implementation resulted in an important reduction of reaction times, consumption of reagents and process automation. Innovative analytical methods for the characterization of attached DNA were developed for assessing the surface immobilized DNA content after amplification. The results obtained showed that the BTA chemistry is compatible and suitable for forming highly dense arrays of DNA colonies with optimal surface coverage of about 10 million colonies/cm2 from the amplification of initial single-template DNA molecules immobilized. We also demonstrate that the dsDNA colonies generated can be quantitatively processed in situ by restriction enzymes digestion. DNA colonies generated using the BTA reagent can be used for further sequence analysis in an unprecedented parallel fashion for low-cost genomic studies.
Collapse
Affiliation(s)
| | | | | | | | - Gerardo Turcatti
- To whom correspondence should be addressed at EPFL, School of life Sciences, Station 15, AAB013, CH-1015, Lausanne Switzerland. Tel: +4121 693 9666; Fax: +4121 693 9667;
| |
Collapse
|
24
|
Abstract
The ability to form tenable hypotheses regarding the neurobiological basis of normative functions as well as mechanisms underlying neurodegenerative and neuropsychiatric disorders is often limited by the highly complex brain circuitry and the cellular and molecular mosaics therein. The brain is an intricate structure with heterogeneous neuronal and nonneuronal cell populations dispersed throughout the central nervous system. Varied and diverse brain functions are mediated through gene expression, and ultimately protein expression, within these cell types and interconnected circuits. Large-scale high-throughput analysis of gene expression in brain regions and individual cell populations using modern functional genomics technologies has enabled the simultaneous quantitative assessment of dozens to hundreds to thousands of genes. Technical and experimental advances in the accession of tissues, RNA amplification technologies, and the refinement of downstream genetic methodologies including microarray analysis and real-time quantitative PCR have generated a wellspring of informative studies pertinent to understanding brain structure and function. In this review, we outline the advantages as well as some of the potential challenges of applying high throughput functional genomics technologies toward a better understanding of brain tissues and diseases using animal models as well as human postmortem tissues.
Collapse
|
25
|
Harbers M, Carninci P. Tag-based approaches for transcriptome research and genome annotation. Nat Methods 2005; 2:495-502. [PMID: 15973418 DOI: 10.1038/nmeth768] [Citation(s) in RCA: 141] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]
Abstract
With the increasing number of whole genome sequences available, genomic research has shifted toward the annotation of functional elements and transcribed regions. Thus, the related field of transcriptome research requires accurate methods for the profiling of genes that are not biased by known sequence information, and that also allow for the identification of promoter regions. Starting with serial analysis of gene expression (SAGE), methods making use of short sequencing tags have greatly contributed to transcriptome studies. Here we review recent developments in the use of short sequencing tags in expression profiling, gene discovery and genome annotation. These tags are obtained from the 5' end of mRNAs, both terminal ends of mRNAs, or genomic regions. The 5' end-specific tags, with their ability to identify transcripts along with their transcriptional start sites, will be of particular interest for gene network studies and may become one of the most important approaches in systems biology.
Collapse
Affiliation(s)
- Matthias Harbers
- K.K. Dnaform, Tsukuba Branch, 3-1 Chuo 8-chome, Ami Machi, Inashiki Gun, Ibaraki, 300-0332, Japan.
| | | |
Collapse
|
26
|
Vos JB, van Sterkenburg MA, Rabe KF, Schalkwijk J, Hiemstra PS, Datson NA. Transcriptional response of bronchial epithelial cells to Pseudomonas aeruginosa: identification of early mediators of host defense. Physiol Genomics 2005; 21:324-36. [PMID: 15701729 DOI: 10.1152/physiolgenomics.00289.2004] [Citation(s) in RCA: 75] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023] Open
Abstract
The airway epithelium responds to microbial exposure by altering expression of a variety of genes to increase innate host defense. We aimed to delineate the early transcriptional response in human primary bronchial epithelial cells exposed for 6 h to a mixture of IL-1beta and TNF-alpha or heat-inactivated Pseudomonas aeruginosa. Because molecular mechanisms of epithelial innate host defense are not fully understood, the open-ended expression-profiling technique SAGE was applied to construct gene expression profiles covering 30,000 genes: 292 genes were found to be differentially expressed. Expression of seven genes was confirmed by real-time qPCR. Among differentially expressed genes, four classes or families were identified: keratins, proteinase inhibitors, S100 calcium-binding proteins, and IL-1 family members. Marked transcriptional changes were observed for keratins that form a key component of the cytoskeleton in epithelial cells. Expression of antimicrobial proteinase inhibitors SLPI and elafin was elevated after microbial or cytokine exposure. Interestingly, expression of numerous S100 family members was observed, and eight members, including S100A8 and S100A9, were among the most differentially expressed genes. Differential expression was also observed for the IL-1 family members IL-1beta, IL-1 receptor antagonist, and IL-1F9, a newly discovered IL-1 family member. Clustering of differentially expressed genes into biological processes revealed that the early inflammatory response in airway epithelial cells to IL-1beta-TNF-alpha and P. aeruginosa is characterized by expression of genes involved in epithelial barrier formation and host defense.
Collapse
Affiliation(s)
- Joost B Vos
- Department of Pulmonology, Leiden University Medical Center, Leiden, The Netherlands.
| | | | | | | | | | | |
Collapse
|