276
|
Quackenbush J. John Quackenbush talks about the clinical promise of genetic microarrays. Interviewed by Brian Vastag. JAMA 2003; 289:159-60, 163. [PMID: 12517207 DOI: 10.1001/jama.289.2.159] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
|
277
|
Wang HY, Malek RL, Kwitek AE, Greene AS, Luu TV, Behbahani B, Frank B, Quackenbush J, Lee NH. Assessing unmodified 70-mer oligonucleotide probe performance on glass-slide microarrays. Genome Biol 2003; 4:R5. [PMID: 12540297 PMCID: PMC151289 DOI: 10.1186/gb-2003-4-1-r5] [Citation(s) in RCA: 114] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2002] [Revised: 10/17/2002] [Accepted: 11/08/2002] [Indexed: 11/30/2022] Open
Abstract
BACKGROUND Long oligonucleotide microarrays are potentially more cost- and management-efficient than cDNA microarrays, but there is little information on the relative performance of these two probe types. The feasibility of using unmodified oligonucleotides to accurately measure changes in gene expression is also unclear. RESULTS Unmodified sense and antisense 70-mer oligonucleotides representing 75 known rat genes and 10 Arabidopsis control genes were synthesized, printed and UV cross-linked onto glass slides. Printed alongside were PCR-amplified cDNA clones corresponding to the same genes, enabling us to compare the two probe types simultaneously. Our study was designed to evaluate the mRNA profiles of heart and brain, along with Arabidopsis cRNA spiked into the labeling reaction at different relative copy number. Hybridization signal intensity did not correlate with probe type but depended on the extent of UV irradiation. To determine the effect of oligonucleotide concentration on hybridization signal, 70-mers were serially diluted. No significant change in gene-expression ratio or loss in hybridization signal was detected, even at the lowest concentration tested (6.25 microm). In many instances, signal intensity actually increased with decreasing concentration. The correlation coefficient between oligonucleotide and cDNA probes for identifying differentially expressed genes was 0.80, with an average coefficient of variation of 13.4%. Approximately 8% of the genes showed discordant results with the two probe types, and in each case the cDNA results were more accurate, as determined by real-time PCR. CONCLUSIONS Microarrays of UV cross-linked unmodified oligonucleotides provided sensitive and specific measurements for most of the genes studied.
Collapse
|
278
|
Zhu Y, King BL, Parvizi B, Brunk BP, Stoeckert CJ, Quackenbush J, Richardson J, Bult CJ. Integrating computationally assembled mouse transcript sequences with the Mouse Genome Informatics (MGI) database. Genome Biol 2003; 4:R16. [PMID: 12620126 PMCID: PMC151306 DOI: 10.1186/gb-2003-4-2-r16] [Citation(s) in RCA: 17] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2002] [Revised: 11/27/2002] [Accepted: 12/19/2002] [Indexed: 12/27/2022] Open
Abstract
Databases of experimentally generated and computationally derived transcript sequences are valuable resources for genome analysis and annotation. The utility of such databases is enhanced when the sequences they contain are integrated with such biological information as genomic location, gene function, gene expression and phenotypic variation. We present the analysis and results of a semi-automated process of connecting transcript assemblies with highly curated biological information for mouse genes that is available through the Mouse Genome Informatics (MGI) database.
Collapse
|
279
|
Yuan Q, Ouyang S, Liu J, Suh B, Cheung F, Sultana R, Lee D, Quackenbush J, Buell CR. The TIGR rice genome annotation resource: annotating the rice genome and creating resources for plant biologists. Nucleic Acids Res 2003; 31:229-33. [PMID: 12519988 PMCID: PMC165506 DOI: 10.1093/nar/gkg059] [Citation(s) in RCA: 115] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Rice is not only a major food staple for the world's population but it also is a model species for a major group of flowering plants, the monocotyledonous plants. Draft genomic sequence of two subspecies of rice, Oryza sativa spp. japonica and indica ssp. are publicly available. To provide the community with a resource to data-mine the rice genome, we have constructed an annotation resource for rice (http://www.tigr.org/tdb/e2k1/osa1/). In this resource, we have annotated the rice genome for gene content, identified motifs/domains within the predicted genes, constructed a rice repeat database, identified related sequences in other plant species, and identified syntenic sequences between rice and maize. All of the data is available through web-based interfaces, FTP downloads, and a Distributed Annotation System.
Collapse
|
280
|
Abstract
A report on the Wellcome Trust/Cold Spring Harbor Genome Informatics meeting, Cold Spring Harbor, USA, 7-11 May 2003.
Collapse
|
281
|
Okazaki Y, Furuno M, Kasukawa T, Adachi J, Bono H, Kondo S, Nikaido I, Osato N, Saito R, Suzuki H, Yamanaka I, Kiyosawa H, Yagi K, Tomaru Y, Hasegawa Y, Nogami A, Schönbach C, Gojobori T, Baldarelli R, Hill DP, Bult C, Hume DA, Quackenbush J, Schriml LM, Kanapin A, Matsuda H, Batalov S, Beisel KW, Blake JA, Bradt D, Brusic V, Chothia C, Corbani LE, Cousins S, Dalla E, Dragani TA, Fletcher CF, Forrest A, Frazer KS, Gaasterland T, Gariboldi M, Gissi C, Godzik A, Gough J, Grimmond S, Gustincich S, Hirokawa N, Jackson IJ, Jarvis ED, Kanai A, Kawaji H, Kawasawa Y, Kedzierski RM, King BL, Konagaya A, Kurochkin IV, Lee Y, Lenhard B, Lyons PA, Maglott DR, Maltais L, Marchionni L, McKenzie L, Miki H, Nagashima T, Numata K, Okido T, Pavan WJ, Pertea G, Pesole G, Petrovsky N, Pillai R, Pontius JU, Qi D, Ramachandran S, Ravasi T, Reed JC, Reed DJ, Reid J, Ring BZ, Ringwald M, Sandelin A, Schneider C, Semple CAM, Setou M, Shimada K, Sultana R, Takenaka Y, Taylor MS, Teasdale RD, Tomita M, Verardo R, Wagner L, Wahlestedt C, Wang Y, Watanabe Y, Wells C, Wilming LG, Wynshaw-Boris A, Yanagisawa M, Yang I, Yang L, Yuan Z, Zavolan M, Zhu Y, Zimmer A, Carninci P, Hayatsu N, Hirozane-Kishikawa T, Konno H, Nakamura M, Sakazume N, Sato K, Shiraki T, Waki K, Kawai J, Aizawa K, Arakawa T, Fukuda S, Hara A, Hashizume W, Imotani K, Ishii Y, Itoh M, Kagawa I, Miyazaki A, Sakai K, Sasaki D, Shibata K, Shinagawa A, Yasunishi A, Yoshino M, Waterston R, Lander ES, Rogers J, Birney E, Hayashizaki Y. Analysis of the mouse transcriptome based on functional annotation of 60,770 full-length cDNAs. Nature 2002; 420:563-73. [PMID: 12466851 DOI: 10.1038/nature01266] [Citation(s) in RCA: 1226] [Impact Index Per Article: 55.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2002] [Accepted: 10/28/2002] [Indexed: 01/10/2023]
Abstract
Only a small proportion of the mouse genome is transcribed into mature messenger RNA transcripts. There is an international collaborative effort to identify all full-length mRNA transcripts from the mouse, and to ensure that each is represented in a physical collection of clones. Here we report the manual annotation of 60,770 full-length mouse complementary DNA sequences. These are clustered into 33,409 'transcriptional units', contributing 90.1% of a newly established mouse transcriptome database. Of these transcriptional units, 4,258 are new protein-coding and 11,665 are new non-coding messages, indicating that non-coding RNA is a major component of the transcriptome. 41% of all transcriptional units showed evidence of alternative splicing. In protein-coding transcripts, 79% of splice variations altered the protein product. Whole-transcriptome analyses resulted in the identification of 2,431 sense-antisense pairs. The present work, completely supported by physical clones, provides the most comprehensive survey of a mammalian transcriptome so far, and is a valuable resource for functional genomics.
Collapse
MESH Headings
- Alternative Splicing/genetics
- Amino Acid Motifs
- Animals
- Chromosomes, Mammalian/genetics
- Cloning, Molecular
- DNA, Complementary/genetics
- Databases, Genetic
- Expressed Sequence Tags
- Genes/genetics
- Genomics/methods
- Humans
- Membrane Proteins/genetics
- Mice/genetics
- Physical Chromosome Mapping
- Protein Structure, Tertiary
- Proteome/chemistry
- Proteome/genetics
- RNA, Antisense/genetics
- RNA, Messenger/analysis
- RNA, Messenger/genetics
- RNA, Untranslated/analysis
- RNA, Untranslated/genetics
- Transcription Initiation Site
- Transcription, Genetic/genetics
Collapse
|
282
|
Abstract
Underlying every microarray experiment is an experimental question that one would like to address. Finding a useful and satisfactory answer relies on careful experimental design and the use of a variety of data-mining tools to explore the relationships between genes or reveal patterns of expression. While other sections of this issue deal with these lofty issues, this review focuses on the much more mundane but indispensable tasks of 'normalizing' data from individual hybridizations to make meaningful comparisons of expression levels, and of 'transforming' them to select genes for further analysis and data mining.
Collapse
|
283
|
Ball CA, Sherlock G, Parkinson H, Rocca-Sera P, Brooksbank C, Causton HC, Cavalieri D, Gaasterland T, Hingamp P, Holstege F, Ringwald M, Spellman P, Stoeckert CJ, Stewart JE, Taylor R, Brazma A, Quackenbush J. The underlying principles of scientific publication. Bioinformatics 2002; 18:1409. [PMID: 12424109 DOI: 10.1093/bioinformatics/18.11.1409] [Citation(s) in RCA: 33] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
|
284
|
Nene V, Lee D, Quackenbush J, Skilton R, Mwaura S, Gardner MJ, Bishop R. AvGI, an index of genes transcribed in the salivary glands of the ixodid tick Amblyomma variegatum. Int J Parasitol 2002; 32:1447-56. [PMID: 12392910 DOI: 10.1016/s0020-7519(02)00159-5] [Citation(s) in RCA: 66] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
Abstract
Random clones from a cDNA library made from mRNA purified from dissected salivary glands of feeding female Amblyomma variegatum ticks were subjected to single pass sequence analysis. A total of 3992 sequences with an average read length of 580 nucleotides have been used to construct a gene index called AvGI that consists of 2109 non-redundant sequences. A provisional gene identity has been assigned to 39% of the database entries by sequence similarity searches against a non-redundant amino acid database and a protein database that has been assigned gene ontology terms. Homologs of genes encoding basic cellular functions including previously characterised enzyme activities, such as stearoyl CoA saturase and protein phosphatase, of ixodid tick salivary glands were found. Several families of abundant cDNA sequences that may code for protein components of tick cement and A. variegatum proteins which may contribute to anti-haemostatic and anti-inflammatory responses, and, one with potential immunosuppressive activity, were also identified. Interference with the function of such proteins might disrupt the life cycle of A. variegatum and help to control this ectoparasite or to reduce its ability to transmit disease causing organisms. AvGI represents an electronic knowledge base, which can be used to launch investigations of the biology of the salivary glands of this tick species. The database may be accessed via the World Wide Web at http://www.tigr.org/tdb/tgi.shtml.
Collapse
|
285
|
Ball CA, Sherlock G, Parkinson H, Rocca-Sera P, Brooksbank C, Causton HC, Cavalieri D, Gaasterland T, Hingamp P, Holstege F, Ringwald M, Spellman P, Stoeckert CJ, Stewart JE, Taylor R, Brazma A, Quackenbush J. Standards for microarray data. Science 2002; 298:539. [PMID: 12387284 DOI: 10.1126/science.298.5593.539b] [Citation(s) in RCA: 122] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2023]
|
286
|
Malek RL, Irby RB, Guo QM, Lee K, Wong S, He M, Tsai J, Frank B, Liu ET, Quackenbush J, Jove R, Yeatman TJ, Lee NH. Identification of Src transformation fingerprint in human colon cancer. Oncogene 2002; 21:7256-65. [PMID: 12370817 DOI: 10.1038/sj.onc.1205900] [Citation(s) in RCA: 63] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2002] [Revised: 07/12/2002] [Accepted: 07/31/2002] [Indexed: 11/09/2022]
Abstract
We used a classical rodent model of transformation to understand the transcriptional processes, and hence the molecular and cellular events a given cell undergoes when progressing from a normal to a transformed phenotype. Src activation is evident in 80% of human colon cancer, yet the myriad of cellular processes effected at the level of gene expression has yet to be fully documented. We identified a Src 'transformation fingerprint' within the gene expression profiles of Src-transformed rat 3Y1 fibroblasts demonstrating a progression in transformation characteristics. To evaluate the role of this gene set in human cancer development and progression, we extracted the orthologous genes present on the Affymetrix Hu95A GeneChip (12k named genes) and compared expression profiles between the Src-induced rodent cell line model of transformation and staged colon tumors where Src is known to be activated. A similar gene expression pattern between the cell line model and staged colon tumors for components of the cell cycle, cytoskeletal associated proteins, transcription factors and lysosomal proteins suggests the need for co-regulation of several cellular processes in the progression of cancer. Genes not previously implicated in tumorigenesis were detected, as well as a set of 14 novel, highly conserved genes with here-to-fore unknown function. These studies define a set of transformation associated genes whose up-regulation has implications for understanding Src mediated transformation and strengthens the role of Src in the development and progression of human colon cancer. Supportive Supplemental Data can be viewed at http://pga.tigr.org/PGApubs.shtml.
Collapse
|
287
|
Carlton JM, Angiuoli SV, Suh BB, Kooij TW, Pertea M, Silva JC, Ermolaeva MD, Allen JE, Selengut JD, Koo HL, Peterson JD, Pop M, Kosack DS, Shumway MF, Bidwell SL, Shallom SJ, van Aken SE, Riedmuller SB, Feldblyum TV, Cho JK, Quackenbush J, Sedegah M, Shoaibi A, Cummings LM, Florens L, Yates JR, Raine JD, Sinden RE, Harris MA, Cunningham DA, Preiser PR, Bergman LW, Vaidya AB, van Lin LH, Janse CJ, Waters AP, Smith HO, White OR, Salzberg SL, Venter JC, Fraser CM, Hoffman SL, Gardner MJ, Carucci DJ. Genome sequence and comparative analysis of the model rodent malaria parasite Plasmodium yoelii yoelii. Nature 2002; 419:512-9. [PMID: 12368865 DOI: 10.1038/nature01099] [Citation(s) in RCA: 535] [Impact Index Per Article: 24.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2002] [Accepted: 08/30/2002] [Indexed: 12/18/2022]
Abstract
Species of malaria parasite that infect rodents have long been used as models for malaria disease research. Here we report the whole-genome shotgun sequence of one species, Plasmodium yoelii yoelii, and comparative studies with the genome of the human malaria parasite Plasmodium falciparum clone 3D7. A synteny map of 2,212 P. y. yoelii contiguous DNA sequences (contigs) aligned to 14 P. falciparum chromosomes reveals marked conservation of gene synteny within the body of each chromosome. Of about 5,300 P. falciparum genes, more than 3,300 P. y. yoelii orthologues of predominantly metabolic function were identified. Over 800 copies of a variant antigen gene located in subtelomeric regions were found. This is the first genome sequence of a model eukaryotic parasite, and it provides insight into the use of such systems in the modelling of Plasmodium biology and disease.
Collapse
|
288
|
Kim H, Zhao B, Snesrud EC, Haas BJ, Town CD, Quackenbush J. Use of RNA and genomic DNA references for inferred comparisons in DNA microarray analyses. Biotechniques 2002; 33:924-30. [PMID: 12398202 DOI: 10.2144/02334mt06] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open
Abstract
In most microarray assays, labeled cDNA molecules derived from reference and query RNA samples are co-hybridized to probes arrayed on a glass surface. Gene expression profiles are then calculated for each gene based on the relative hybridization intensities measured between the two samples. The most commonly used reference samples are typically isolates from a single representative RNA source (RNA-0) or pooled mixtures of RNA derived from a plurality of sources (RNA-p). Genomic DNA offers an alternative reference nucleic acid with a number of potential advantages, including stability, reproducibility, and a potentially uniform representation of all genes, as each unique gene should have equal representation in a haploid genome. Using hydrogen peroxide-treated Arabidopsis thaliana plants as a model, we evaluated genomic DNA and RNA-p as reference samples and compared expression levels inferred through the reference relative to unexposed plants with expression levels measured directly using an RNA-0 reference. Our analysis demonstrates that while genomic DNA can serve as a reasonable reference source for microarray assays, a much greater correlation with direct measurements can be achieved using an RNA-based reference sample.
Collapse
|
289
|
Fahrenkrug SC, Smith TPL, Freking BA, Cho J, White J, Vallet J, Wise T, Rohrer G, Pertea G, Sultana R, Quackenbush J, Keele JW. Porcine gene discovery by normalized cDNA-library sequencing and EST cluster assembly. Mamm Genome 2002; 13:475-8. [PMID: 12226715 DOI: 10.1007/s00335-001-2072-4] [Citation(s) in RCA: 62] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2001] [Accepted: 04/05/2002] [Indexed: 10/26/2022]
Abstract
Genetic and environmental factors affect the efficiency of pork production by influencing gene expression during porcine reproduction, tissue development, and growth. The identification and functional analysis of gene products important to these processes would be greatly enhanced by the development of a database of expressed porcine gene sequence. Two normalized porcine cDNA libraries (MARC 1PIG and MARC 2PIG), derived respectively from embryonic and reproductive tissues, were constructed, sequenced, and analyzed. A total of 66,245 clones from these two libraries were 5?-end sequenced and deposited in GenBank. Cluster analysis revealed that within-library redundancy is low, and comparison of all porcine ESTs with the human database suggests that the sequences from these two libraries represent portions of a significant number of independent pig genes. A Porcine Gene Index (PGI), comprising 15,616 tentative consensus sequences and 31,466 singletons, includes all sequences in public repositories and has been developed to facilitate further comparative map development and characterization of porcine genes (http://www.tigr.org/tdb/ssgi/). The clones and sequences from these libraries provide a catalog of expressed porcine genes and a resource for development of high-density hybridization arrays for transcriptional profiling of porcine tissues. In addition, comparison of porcine ESTs with sequences from other species serves as a valuable resource for comparative map development. Both arrayed cDNA libraries are available for unrestricted public use.
Collapse
|
290
|
Sonstegard TS, Capuco AV, White J, Van Tassell CP, Connor EE, Cho J, Sultana R, Shade L, Wray JE, Wells KD, Quackenbush J. Analysis of bovine mammary gland EST and functional annotation of the Bos taurus gene index. Mamm Genome 2002; 13:373-9. [PMID: 12140684 DOI: 10.1007/s00335-001-2145-4] [Citation(s) in RCA: 38] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2001] [Accepted: 03/13/2002] [Indexed: 11/30/2022]
Abstract
Functional genomic studies of the mammary gland require an appropriate collection of cDNA sequences to assess gene expression patterns from the different developmental and operational states of underlying cell types. To better capture the range of gene expression, a normalized cDNA library was constructed from pooled bovine mammary tissues, and 23,202 expressed sequence tags (EST) were produced and deposited into GenBank. Assembly of these EST with sequences in the Bos taurus Gene Index (BtGI) helped to form 5751 of the current 23,883 tentative consensus (TC) sequences. The majority (87%) of these 5751 assemblies contained only one to three mammary-derived EST. In contrast, 18% of the mammary EST assembled with TC sequences corresponding to 12 genes. These results suggest library normalization was only partially effective, because the reduction in EST for genes abundantly transcribed during lactation could be attributed to pooling. For better assessment of novel content in the mammary library and to add to existing annotation of all bovine sequence elements, gene ontology assignments, and comparative sequence analyses against human genome sequence, human and rodent gene indices, and an index of orthologous alignments of genes across eukaryotes (TOGA) were performed, and results were added to existing BtGI annotation. Over 35,000 of the bovine elements significantly matched human genome sequence, and the positions of some alignments (3%) were unique relative to those using human expressed sequences. Because 3445 TC sequences had no significant match with any data set, mammary-derived cDNA clones representing 23 of these elements were analyzed further for expression and novelty. Only one clone met criteria suggesting the corresponding gene was a divergent ortholog or expressed sequence unique to cattle. These results demonstrate that bovine sequence expression data serve as a resource for characterizing mammalian transcriptomes and identifying those genes potentially unique to ruminants.
Collapse
|
291
|
Andersson T, Unneberg P, Nilsson P, Odeberg J, Quackenbush J, Lundeberg J. Monitoring of representational difference analysis subtraction procedures by global microarrays. Biotechniques 2002; 32:1348-50, 1352, 1354-6, 1358. [PMID: 12074166 DOI: 10.2144/02326mt06] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open
Abstract
Various approaches to the study of differential gene expression are applied to compare cell lines and tissue samples in a wide range of biological contexts. The compromise between focusing on only the important genes in certain cellular processes and achieving a complete picture is critical for the selection of strategy. We demonstrate how global microarray technology can be used for the exploration of the differentially expressed genes extracted through representational difference analysis (RDA). The subtraction of ubiquitous gene fragments from the two samples was demonstrated using cDNA microarrays including more than 32 000 spotted, PCR-amplified human clones. Hybridizations indicated the expression of 9100 of the microarray elements in a macrophage/foam cell atherosclerosis model system, of which many were removed during the RDA process. The stepwise subtraction procedure was demonstrated to yield an efficient enrichment of gene fragments overrepresented in either sample (18% in the representations, 86% after the first subtraction, and 88% after the second subtraction), many of which were impossible to detect in the starting material. Interestingly, the method allowed for the observation of the differential expression of several members of the low-abundant nuclear receptor gene family. We also observed a certain background level in the difference products of nondifferentially expressed gene fragments, warranting a verification strategy for selected candidate genes. The differential expression of several genes was verified by real-time PCR.
Collapse
|
292
|
Agrawal D, Chen T, Irby R, Quackenbush J, Chambers AF, Szabo M, Cantor A, Coppola D, Yeatman TJ. Osteopontin identified as lead marker of colon cancer progression, using pooled sample expression profiling. J Natl Cancer Inst 2002; 94:513-21. [PMID: 11929952 DOI: 10.1093/jnci/94.7.513] [Citation(s) in RCA: 313] [Impact Index Per Article: 14.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
BACKGROUND New tumor markers and markers of tumor progression are needed for improved staging and for better assessment of treatment of many cancers. Gene expression profiling techniques offer the opportunity to discover such markers. We investigated the feasibility of sample pooling strategy in combination with a novel analysis algorithm to identify markers. METHODS Total RNA from human colon tumors (n = 60) of multiple stages (adenomas; cancers with modified Astler Collier stages B, C, and D; and liver metastases) were pooled within stages and compared with pooled normal mucosal specimens (n = 10) by using oligonucleotide expression arrays. Genes that showed consistent increases or decreases in their expression through tumor progression were identified. Northern blot analysis was used to validate the findings. All statistical tests were two-sided. RESULTS More than 300 candidate tumor markers and more than 100 markers of tumor progression were identified. Northern analysis of 11 candidate tumor markers confirmed the gene expression changes. The gene for the secreted integrin-binding protein osteopontin was most consistently differentially expressed in conjunction with tumor progression. Its potential as a progression marker was validated (Spearman's rho = 0.903; P<.001) with northern blot analysis using RNA from an independent set of 10 normal and 43 tumor samples representing all stages. Moreover, a statistically significant correlation between osteopontin protein expression and advancing tumor stage was identified with the use of 303 additional specimens (human cancer = 185, adenomas = 67, and normal mucosal specimens = 51) (Spearman's rho = 0.667; P<.001). CONCLUSIONS Sample pooling can be a powerful, cost-effective, and rapid means of identifying the most common changes in a gene expression profile. We identified osteopontin as a clinically useful marker of tumor progression by use of gene expression profiling on pooled samples.
Collapse
|
293
|
Lee Y, Sultana R, Pertea G, Cho J, Karamycheva S, Tsai J, Parvizi B, Cheung F, Antonescu V, White J, Holt I, Liang F, Quackenbush J. Cross-referencing eukaryotic genomes: TIGR Orthologous Gene Alignments (TOGA). Genome Res 2002; 12:493-502. [PMID: 11875039 PMCID: PMC155294 DOI: 10.1101/gr.212002] [Citation(s) in RCA: 122] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]
Abstract
Comparative genomics promises to rapidly accelerate the identification and functional classification of biologically important human genes. We developed the TIGR Orthologous Gene Alignment (TOGA; <http://www.tigr.org/tdb/toga/toga.shtml>) database to provide a cross-reference between fully and partially sequenced eukaryotic transcribed sequences. Starting with the assembled expressed sequence tag (EST) and gene sequences that comprise the 28 TIGR Gene Indices, we used high-stringency pair-wise sequence searches and a reflexive, transitive closure process to associate sequence-specific best hits, generating 32,652 tentative ortholog groups (TOGs). This has allowed us to identify putative orthologs and paralogs for known genes, as well as those that exist only as uncharacterized ESTs and to provide links to additional information including genome sequence and mapping data. TOGA provides an important new resource for the analysis of gene function in eukaryotes. In addition, an analysis of the most widely represented sequences can begin to provide insight into eukaryotic biological processes.
Collapse
|
294
|
Abstract
A versatile, platform independent and easy to use Java suite for large-scale gene expression analysis was developed. Genesis integrates various tools for microarray data analysis such as filters, normalization and visualization tools, distance measures as well as common clustering algorithms including hierarchical clustering, self-organizing maps, k-means, principal component analysis, and support vector machines. The results of the clustering are transparent across all implemented methods and enable the analysis of the outcome of different algorithms and parameters. Additionally, mapping of gene expression data onto chromosomal sequences was implemented to enhance promoter analysis and investigation of transcriptional control mechanisms.
Collapse
|
295
|
Yang IV, Chen E, Hasseman JP, Liang W, Frank BC, Wang S, Sharov V, Saeed AI, White J, Li J, Lee NH, Yeatman TJ, Quackenbush J. Within the fold: assessing differential expression measures and reproducibility in microarray assays. Genome Biol 2002; 3:research0062. [PMID: 12429061 PMCID: PMC133446 DOI: 10.1186/gb-2002-3-11-research0062] [Citation(s) in RCA: 131] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2002] [Revised: 08/28/2002] [Accepted: 09/19/2002] [Indexed: 11/12/2022] Open
Abstract
BACKGROUND 'Fold-change' cutoffs have been widely used in microarray assays to identify genes that are differentially expressed between query and reference samples. More accurate measures of differential expression and effective data-normalization strategies are required to identify high-confidence sets of genes with biologically meaningful changes in transcription. Further, the analysis of a large number of expression profiles is facilitated by a common reference sample, the construction of which must be carefully addressed. RESULTS We carried out a series of 'self-self' hybridizations in which aliquots of the same RNA sample were labeled separately with Cy3 and Cy5 fluorescent dyes and co-hybridized to the same microarray. From this, we can analyze the intensity-dependent behavior of microarray data, define a statistically significant measure of differential expression that exploits the structure of the fluorescent signals, and measure the inherent reproducibility of the technique. We also devised a simple procedure for identifying and eliminating low-quality data for replicates within and between slides. We examine the properties required of a universal reference RNA sample and show how pooling a small number of samples with a diverse representation of expressed genes can outperform more complex mixtures as a reference sample. CONCLUSION Analysis of cell-line samples can identify systematic structure in measured gene-expression levels. A general procedure for analyzing cDNA microarray data is proposed and validated. We show that pooled reference samples should be based not only on the expression of individual genes in each cell line but also on the expression levels of genes within cell lines.
Collapse
|
296
|
Litwin CM, Quackenbush J. Characterization of a Vibrio vulnificus LysR homologue, HupR, which regulates expression of the haem uptake outer membrane protein, HupA. Microb Pathog 2001; 31:295-307. [PMID: 11747377 DOI: 10.1006/mpat.2001.0472] [Citation(s) in RCA: 32] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
In Vibrio vulnificus, the ability to acquire iron from the host has been shown to correlate with virulence. Here, we show that the DNA upstream of hupA (haem uptake receptor) in V. vulnificus encodes a protein in the inverse orientation to hupA (named hupR). HupR shares homology with the LysR family of positive transcriptional activators. A hupA-lacZ fusion contained on a plasmid was transformed into Fur(-), Fur(+)and HupR(-)strains of V. vulnificus. The beta-galactosidase assays and Northern blot analysis showed that transcription of hupA is negatively regulated by iron and the Fur repressor in V. vulnificus. Under low-iron conditions with added haemin, the expression of hupA in the hupR mutant was significantly lower than in the wild-type. This diminished response to haem was detected by both Northern blot and hupA-lacZ fusion analysis. The haem response of hupA in the hupR mutant was restored to wild-type levels when complemented with hupR in trans. These studies suggest that HupR may act as a positive regulator of hupA transcription under low-iron conditions in the presence of haemin.
Collapse
MESH Headings
- Amino Acid Sequence
- Bacterial Outer Membrane Proteins/analysis
- Bacterial Outer Membrane Proteins/chemistry
- Bacterial Outer Membrane Proteins/genetics
- Bacterial Proteins/analysis
- Bacterial Proteins/chemistry
- Bacterial Proteins/genetics
- Base Sequence
- Blotting, Northern
- Carrier Proteins/chemistry
- Carrier Proteins/genetics
- DNA-Binding Proteins
- Dose-Response Relationship, Drug
- Electrophoresis, Polyacrylamide Gel
- Gene Expression Regulation, Bacterial/drug effects
- Gene Expression Regulation, Bacterial/genetics
- Genes, Bacterial
- Hemin/pharmacology
- Molecular Sequence Data
- Promoter Regions, Genetic
- RNA, Bacterial/analysis
- Sequence Homology, Amino Acid
- Transcription Factors/analysis
- Transcription Factors/chemistry
- Transcription Factors/genetics
- Transcription, Genetic/drug effects
- Vibrio/genetics
- Vibrio/growth & development
- Vibrio/metabolism
Collapse
|
297
|
Brazma A, Hingamp P, Quackenbush J, Sherlock G, Spellman P, Stoeckert C, Aach J, Ansorge W, Ball CA, Causton HC, Gaasterland T, Glenisson P, Holstege FC, Kim IF, Markowitz V, Matese JC, Parkinson H, Robinson A, Sarkans U, Schulze-Kremer S, Stewart J, Taylor R, Vilo J, Vingron M. Minimum information about a microarray experiment (MIAME)-toward standards for microarray data. Nat Genet 2001; 29:365-71. [PMID: 11726920 DOI: 10.1038/ng1201-365] [Citation(s) in RCA: 2652] [Impact Index Per Article: 115.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
Microarray analysis has become a widely used tool for the generation of gene expression data on a genomic scale. Although many significant results have been derived from microarray studies, one limitation has been the lack of standards for presenting and exchanging such data. Here we present a proposal, the Minimum Information About a Microarray Experiment (MIAME), that describes the minimum information required to ensure that microarray data can be easily interpreted and that results derived from its analysis can be independently verified. The ultimate goal of this work is to establish a standard for recording and reporting microarray-based gene expression data, which will in turn facilitate the establishment of databases and public repositories and enable the development of data analysis tools. With respect to MIAME, we concentrate on defining the content and structure of the necessary information rather than the technical format for capturing it.
Collapse
|
298
|
Hegde P, Qi R, Gaspard R, Abernathy K, Dharap S, Earle-Hughes J, Gay C, Nwokekeh NU, Chen T, Saeed AI, Sharov V, Lee NH, Yeatman TJ, Quackenbush J. Identification of tumor markers in models of human colorectal cancer using a 19,200-element complementary DNA microarray. Cancer Res 2001; 61:7792-7. [PMID: 11691794] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/22/2023]
Abstract
Metastasis represents a crucial transition in disease development and progression and has a profound impact on survival for a wide variety of cancers. Cell line models of metastasis have played an important role in developing our understanding of the metastatic process. We used a 19,200-element human cDNA microarray to profile transcription in three paired cell-line models of colorectal tumor metastasis. By correlating expression patterns across these cell lines, we have identified 176 genes that appear to be differentially expressed (greater than 2-fold) in all highly metastatic cell lines relative to their reference. An analysis of these genes reiterates much of our understanding of the metastatic process and suggests additional genes, many of previously uncharacterized function, that may be causatively involved in, or at least prognostic of, metastasis. Northern analysis of a limited number of these genes validates the observed pattern of expression and suggests that further investigation and functional characterization of the identified genes is warranted.
Collapse
|
299
|
Quackenbush J. The power of public access: the human genome project and the scientific process. Nat Genet 2001; 29:4-6. [PMID: 11528377 DOI: 10.1038/ng0901-4] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
The scientific process, and scientific progress, require a critical examination of all published reports. Recent publications detailing errors in the draft human genome sequence are an integral part of our quest to better understand nature and demonstrate the value of free access to scientific data.
Collapse
|
300
|
Kappe SH, Gardner MJ, Brown SM, Ross J, Matuschewski K, Ribeiro JM, Adams JH, Quackenbush J, Cho J, Carucci DJ, Hoffman SL, Nussenzweig V. Exploring the transcriptome of the malaria sporozoite stage. Proc Natl Acad Sci U S A 2001; 98:9895-900. [PMID: 11493695 PMCID: PMC55549 DOI: 10.1073/pnas.171185198] [Citation(s) in RCA: 105] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2001] [Indexed: 11/18/2022] Open
Abstract
Most studies of gene expression in Plasmodium have been concerned with asexual and/or sexual erythrocytic stages. Identification and cloning of genes expressed in the preerythrocytic stages lag far behind. We have constructed a high quality cDNA library of the Plasmodium sporozoite stage by using the rodent malaria parasite P. yoelii, an important model for malaria vaccine development. The technical obstacles associated with limited amounts of RNA material were overcome by PCR-amplifying the transcriptome before cloning. Contamination with mosquito RNA was negligible. Generation of 1,972 expressed sequence tags (EST) resulted in a total of 1,547 unique sequences, allowing insight into sporozoite gene expression. The circumsporozoite protein (CS) and the sporozoite surface protein 2 (SSP2) are well represented in the data set. A BLASTX search with all tags of the nonredundant protein database gave only 161 unique significant matches (P(N) < or = 10(-4)), whereas 1,386 of the unique sequences represented novel sporozoite-expressed genes. We identified ESTs for three proteins that may be involved in host cell invasion and documented their expression in sporozoites. These data should facilitate our understanding of the preerythrocytic Plasmodium life cycle stages and the development of preerythrocytic vaccines.
Collapse
|