151
|
Mar JC, Matigian NA, Quackenbush J, Wells CA. attract: A method for identifying core pathways that define cellular phenotypes. PLoS One 2011; 6:e25445. [PMID: 22022396 PMCID: PMC3194807 DOI: 10.1371/journal.pone.0025445] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2011] [Accepted: 09/05/2011] [Indexed: 11/23/2022] Open
Abstract
attract is a knowledge-driven analytical approach for identifying and annotating the gene-sets that best discriminate between cell phenotypes. attract finds distinguishing patterns within pathways, decomposes pathways into meta-genes representative of these patterns, and then generates synexpression groups of highly correlated genes from the entire transcriptome dataset. attract can be applied to a wide range of biological systems and is freely available as a Bioconductor package and has been incorporated into the MeV software system.
Collapse
|
152
|
Abstract
Summary: RNA-Seq is an exciting methodology that leverages the power of high-throughput sequencing to measure RNA transcript counts at an unprecedented accuracy. However, the data generated from this process are extremely large and biologist-friendly tools with which to analyze it are sorely lacking. MultiExperiment Viewer (MeV) is a Java-based desktop application that allows advanced analysis of gene expression data through an intuitive graphical user interface. Here, we report a significant enhancement to MeV that allows analysis of RNA-Seq data with these familiar, powerful tools. We also report the addition to MeV of several RNA-Seq-specific functions, addressing the differences in analysis requirements between this data type and traditional gene expression data. These tools include automatic conversion functions from raw count data to processed RPKM or FPKM values and differential expression detection and functional annotation enrichment detection based on published methods. Availability: MeV version 4.7 is written in Java and is freely available for download under the terms of the open-source Artistic License version 2.0. The website (http://mev.tm4.org/) hosts a full user manual as well as a short quick-start guide suitable for new users. Contact:johnq@jimmy.harvard.edu
Collapse
|
153
|
Quackenbush J. SP 130 Driving discovery through data integration and analysis. Eur J Cancer 2011. [DOI: 10.1016/s0959-8049(11)72607-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/15/2022]
|
154
|
Schröder MS, Culhane AC, Quackenbush J, Haibe-Kains B. survcomp: an R/Bioconductor package for performance assessment and comparison of survival models. ACTA ACUST UNITED AC 2011; 27:3206-8. [PMID: 21903630 DOI: 10.1093/bioinformatics/btr511] [Citation(s) in RCA: 319] [Impact Index Per Article: 24.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
SUMMARY The survcomp package provides functions to assess and statistically compare the performance of survival/risk prediction models. It implements state-of-the-art statistics to (i) measure the performance of risk prediction models; (ii) combine these statistical estimates from multiple datasets using a meta-analytical framework; and (iii) statistically compare the performance of competitive models.
Collapse
|
155
|
Mar JC, Matigian NA, Mackay-Sim A, Mellick GD, Sue CM, Silburn PA, McGrath JJ, Quackenbush J, Wells CA. Variance of gene expression identifies altered network constraints in neurological disease. PLoS Genet 2011; 7:e1002207. [PMID: 21852951 PMCID: PMC3154954 DOI: 10.1371/journal.pgen.1002207] [Citation(s) in RCA: 95] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2011] [Accepted: 06/11/2011] [Indexed: 11/26/2022] Open
Abstract
Gene expression analysis has become a ubiquitous tool for studying a wide range of human diseases. In a typical analysis we compare distinct phenotypic groups and attempt to identify genes that are, on average, significantly different between them. Here we describe an innovative approach to the analysis of gene expression data, one that identifies differences in expression variance between groups as an informative metric of the group phenotype. We find that genes with different expression variance profiles are not randomly distributed across cell signaling networks. Genes with low-expression variance, or higher constraint, are significantly more connected to other network members and tend to function as core members of signal transduction pathways. Genes with higher expression variance have fewer network connections and also tend to sit on the periphery of the cell. Using neural stem cells derived from patients suffering from Schizophrenia (SZ), Parkinson's disease (PD), and a healthy control group, we find marked differences in expression variance in cell signaling pathways that shed new light on potential mechanisms associated with these diverse neurological disorders. In particular, we find that expression variance of core networks in the SZ patient group was considerably constrained, while in contrast the PD patient group demonstrated much greater variance than expected. One hypothesis is that diminished variance in SZ patients corresponds to an increased degree of constraint in these pathways and a corresponding reduction in robustness of the stem cell networks. These results underscore the role that variation plays in biological systems and suggest that analysis of expression variance is far more important in disease than previously recognized. Furthermore, modeling patterns of variability in gene expression could fundamentally alter the way in which we think about how cellular networks are affected by disease processes. Genes are a repository of information that provides the framework for cellular processes, with the flow of information from gene (DNA) to phenotype via an intermediate molecule—the messenger RNA. We understand that sequence variations in a gene may lead to phenotypic variations, but less well understood is how variation in the information flow itself might also impact on phenotype. In this study we demonstrated that disease phenotypes were correlated with expression variance. A change in expression variance might infer that the genetic networks representing information flow were less robust—surprisingly, we found that too little and too much variance were equally detrimental in the context of neurological disease.
Collapse
|
156
|
Sathirapongsasuti JF, Lee H, Horst BAJ, Brunner G, Cochran AJ, Binder S, Quackenbush J, Nelson SF. Exome sequencing-based copy-number variation and loss of heterozygosity detection: ExomeCNV. ACTA ACUST UNITED AC 2011; 27:2648-54. [PMID: 21828086 DOI: 10.1093/bioinformatics/btr462] [Citation(s) in RCA: 300] [Impact Index Per Article: 23.1] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
MOTIVATION The ability to detect copy-number variation (CNV) and loss of heterozygosity (LOH) from exome sequencing data extends the utility of this powerful approach that has mainly been used for point or small insertion/deletion detection. RESULTS We present ExomeCNV, a statistical method to detect CNV and LOH using depth-of-coverage and B-allele frequencies, from mapped short sequence reads, and we assess both the method's power and the effects of confounding variables. We apply our method to a cancer exome resequencing dataset. As expected, accuracy and resolution are dependent on depth-of-coverage and capture probe design. AVAILABILITY CRAN package 'ExomeCNV'. CONTACT fsathira@fas.harvard.edu; snelson@ucla.edu SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
|
157
|
Marko NF, Quackenbush J, Weil RJ. Why is there a lack of consensus on molecular subgroups of glioblastoma? Understanding the nature of biological and statistical variability in glioblastoma expression data. PLoS One 2011; 6:e20826. [PMID: 21829433 PMCID: PMC3145641 DOI: 10.1371/journal.pone.0020826] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2011] [Accepted: 05/09/2011] [Indexed: 12/31/2022] Open
Abstract
INTRODUCTION Gene expression patterns characterizing clinically-relevant molecular subgroups of glioblastoma are difficult to reproduce. We suspect a combination of biological and analytic factors confounds interpretation of glioblastoma expression data. We seek to clarify the nature and relative contributions of these factors, to focus additional investigations, and to improve the accuracy and consistency of translational glioblastoma analyses. METHODS We analyzed gene expression and clinical data for 340 glioblastomas in The Cancer Genome Atlas (TCGA). We developed a logic model to analyze potential sources of biological, technical, and analytic variability and used standard linear classifiers and linear dimensional reduction algorithms to investigate the nature and relative contributions of each factor. RESULTS Commonly-described sources of classification error, including individual sample characteristics, batch effects, and analytic and technical noise make measurable but proportionally minor contributions to inconsistent molecular classification. Our analysis suggests that three, previously underappreciated factors may account for a larger fraction of classification errors: inherent non-linear/non-orthogonal relationships among the genes used in conjunction with classification algorithms that assume linearity; skewed data distributions assumed to be Gaussian; and biologic variability (noise) among tumors, of which we propose three types. CONCLUSIONS Our analysis of the TCGA data demonstrates a contributory role for technical factors in molecular classification inconsistencies in glioblastoma but also suggests that biological variability, abnormal data distribution, and non-linear relationships among genes may be responsible for a proportionally larger component of classification error. These findings may have important implications for both glioblastoma research and for translational application of other large-volume biological databases.
Collapse
|
158
|
Jirawatnotai S, Hu Y, Michowski W, Elias JE, Becks L, Bienvenu F, Zagozdzon A, Goswami T, Wang YE, Clark AB, Kunkel TA, van Harn T, Xia B, Correll M, Quackenbush J, Livingston DM, Gygi SP, Sicinski P. A function for cyclin D1 in DNA repair uncovered by protein interactome analyses in human cancers. Nature 2011; 474:230-4. [PMID: 21654808 DOI: 10.1038/nature10155] [Citation(s) in RCA: 250] [Impact Index Per Article: 19.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2010] [Accepted: 04/21/2011] [Indexed: 12/16/2022]
Abstract
Cyclin D1 is a component of the core cell cycle machinery. Abnormally high levels of cyclin D1 are detected in many human cancer types. To elucidate the molecular functions of cyclin D1 in human cancers, we performed a proteomic screen for cyclin D1 protein partners in several types of human tumours. Analyses of cyclin D1 interactors revealed a network of DNA repair proteins, including RAD51, a recombinase that drives the homologous recombination process. We found that cyclin D1 directly binds RAD51, and that cyclin D1-RAD51 interaction is induced by radiation. Like RAD51, cyclin D1 is recruited to DNA damage sites in a BRCA2-dependent fashion. Reduction of cyclin D1 levels in human cancer cells impaired recruitment of RAD51 to damaged DNA, impeded the homologous recombination-mediated DNA repair, and increased sensitivity of cells to radiation in vitro and in vivo. This effect was seen in cancer cells lacking the retinoblastoma protein, which do not require D-cyclins for proliferation. These findings reveal an unexpected function of a core cell cycle protein in DNA repair and suggest that targeting cyclin D1 may be beneficial also in retinoblastoma-negative cancers which are currently thought to be unaffected by cyclin D1 inhibition.
Collapse
|
159
|
Cassidy PB, Quackenbush J, Campbell J, Moos PJ, Leachman SA. Abstract 1867: The natural product sulforaphane protects epidermal tissue from individuals at increased risk for melanoma from the effects of UV radiation. Cancer Res 2011. [DOI: 10.1158/1538-7445.am2011-1867] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Abstract
Our study focuses on the design and testing of melanoma-prevention strategies for use in high-risk populations. Individuals who have germline loss-of-function (LOF) mutations at the highly polymorphic MC1R locus are at 4-fold increased risk for melanoma. Much of this increase likely arises from the loss of MC1R-mediated upregulation of antioxidant, DNA repair, pigment synthesis and anti-apoptotic pathways that protect epidermal melanocytes from the mutagenic effects of UV radiation. We have found that the natural product sulforaphane (SF) protects normal human melanocytes in culture from UV-induced apoptosis and oxidative stress in the absence of MC1R stimulation by its ligand α-MSH. This led us to propose that SF might be useful for protecting the skin of human subjects with LOF MC1R mutations from the harmful effects of UV light. We have sequenced MC1R in volunteers both with and without the red hair phenotype that is typical for humans with LOF mutations. Sequencing was necessary in all volunteers because many individuals with heterozygous LOF mutations do not have the signature pigmentary phenotype, but are still at increased (2-fold) risk for melanoma. Shave biopsies were harvested from those with either wild-type MC1R or two LOF alleles. The epidermal tissues were treated ex vivo with UV and/or SF then analyzed histologically and by qPCR. We found that SF ameliorated many of the effects of UV radiation in tissues from donors with LOF MC1R. This study demonstrates the promise of SF as an agent for protection of human skin in individuals with high-risk MC1R genotypes from the harmful effects of UV radiation.
Citation Format: {Authors}. {Abstract title} [abstract]. In: Proceedings of the 102nd Annual Meeting of the American Association for Cancer Research; 2011 Apr 2-6; Orlando, FL. Philadelphia (PA): AACR; Cancer Res 2011;71(8 Suppl):Abstract nr 1867. doi:10.1158/1538-7445.AM2011-1867
Collapse
|
160
|
Mar JC, Wells CA, Quackenbush J. Defining an informativeness metric for clustering gene expression data. ACTA ACUST UNITED AC 2011; 27:1094-100. [PMID: 21330289 PMCID: PMC3072547 DOI: 10.1093/bioinformatics/btr074] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Motivation: Unsupervised ‘cluster’ analysis is an invaluable tool for exploratory microarray data analysis, as it organizes the data into groups of genes or samples in which the elements share common patterns. Once the data are clustered, finding the optimal number of informative subgroups within a dataset is a problem that, while important for understanding the underlying phenotypes, is one for which there is no robust, widely accepted solution. Results: To address this problem we developed an ‘informativeness metric’ based on a simple analysis of variance statistic that identifies the number of clusters which best separate phenotypic groups. The performance of the informativeness metric has been tested on both experimental and simulated datasets, and we contrast these results with those obtained using alternative methods such as the gap statistic. Availability: The method has been implemented in the Bioconductor R package attract; it is also freely available from http://compbio.dfci.harvard.edu/pubs/attract_1.0.1.zip. Contact:jess@jimmy.harvard.edu; johnq@jimmy.harvard.edu Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
|
161
|
Liu F, White JA, Antonescu C, Gusenleitner D, Quackenbush J. GCOD - GeneChip Oncology Database. BMC Bioinformatics 2011; 12:46. [PMID: 21291543 PMCID: PMC3045303 DOI: 10.1186/1471-2105-12-46] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2010] [Accepted: 02/03/2011] [Indexed: 11/10/2022] Open
Abstract
Background DNA microarrays have become a nearly ubiquitous tool for the study of human disease, and nowhere is this more true than in cancer. With hundreds of studies and thousands of expression profiles representing the majority of human cancers completed and in public databases, the challenge has been effectively accessing and using this wealth of data. Description To address this issue we have collected published human cancer gene expression datasets generated on the Affymetrix GeneChip platform, and carefully annotated those studies with a focus on providing accurate sample annotation. To facilitate comparison between datasets, we implemented a consistent data normalization and transformation protocol and then applied stringent quality control procedures to flag low-quality assays. Conclusion The resulting resource, the GeneChip Oncology Database, is available through a publicly accessible website that provides several query options and analytical tools through an intuitive interface.
Collapse
|
162
|
Chervitz SA, Deutsch EW, Field D, Parkinson H, Quackenbush J, Rocca-Serra P, Sansone SA, Stoeckert CJ, Taylor CF, Taylor R, Ball CA. Data standards for Omics data: the basis of data sharing and reuse. Methods Mol Biol 2011; 719:31-69. [PMID: 21370078 PMCID: PMC4152841 DOI: 10.1007/978-1-61779-027-0_2] [Citation(s) in RCA: 49] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/04/2022]
Abstract
To facilitate sharing of Omics data, many groups of scientists have been working to establish the relevant data standards. The main components of data sharing standards are experiment description standards, data exchange standards, terminology standards, and experiment execution standards. Here we provide a survey of existing and emerging standards that are intended to assist the free and open exchange of large-format data.
Collapse
|
163
|
Cannon EKS, Birkett SM, Braun BL, Kodavali S, Jennewein DM, Yilmaz A, Antonescu V, Antonescu C, Harper LC, Gardiner JM, Schaeffer ML, Campbell DA, Andorf CM, Andorf D, Lisch D, Koch KE, McCarty DR, Quackenbush J, Grotewold E, Lushbough CM, Sen TZ, Lawrence CJ. POPcorn: An Online Resource Providing Access to Distributed and Diverse Maize Project Data. INTERNATIONAL JOURNAL OF PLANT GENOMICS 2011; 2011:923035. [PMID: 22253616 PMCID: PMC3255282 DOI: 10.1155/2011/923035] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/16/2011] [Accepted: 11/29/2011] [Indexed: 05/21/2023]
Abstract
The purpose of the online resource presented here, POPcorn (Project Portal for corn), is to enhance accessibility of maize genetic and genomic resources for plant biologists. Currently, many online locations are difficult to find, some are best searched independently, and individual project websites often degrade over time-sometimes disappearing entirely. The POPcorn site makes available (1) a centralized, web-accessible resource to search and browse descriptions of ongoing maize genomics projects, (2) a single, stand-alone tool that uses web Services and minimal data warehousing to search for sequence matches in online resources of diverse offsite projects, and (3) a set of tools that enables researchers to migrate their data to the long-term model organism database for maize genetic and genomic information: MaizeGDB. Examples demonstrating POPcorn's utility are provided herein.
Collapse
|
164
|
Chittenden TW, Pak J, Rubio R, Cheng H, Holton K, Prendergast N, Glinskii V, Cai Y, Culhane A, Bentink S, Schwede M, Mar JC, Howe EA, Aryee M, Sultana R, Lanahan AA, Taylor JM, Holmes C, Hahn WC, Zhao JJ, Iglehart JD, Quackenbush J. Therapeutic implications of GIPC1 silencing in cancer. PLoS One 2010; 5:e15581. [PMID: 21209904 PMCID: PMC3012716 DOI: 10.1371/journal.pone.0015581] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2010] [Accepted: 11/12/2010] [Indexed: 12/31/2022] Open
Abstract
GIPC1 is a cytoplasmic scaffold protein that interacts with numerous receptor signaling complexes, and emerging evidence suggests that it plays a role in tumorigenesis. GIPC1 is highly expressed in a number of human malignancies, including breast, ovarian, gastric, and pancreatic cancers. Suppression of GIPC1 in human pancreatic cancer cells inhibits in vivo tumor growth in immunodeficient mice. To better understand GIPC1 function, we suppressed its expression in human breast and colorectal cancer cell lines and human mammary epithelial cells (HMECs) and assayed both gene expression and cellular phenotype. Suppression of GIPC1 promotes apoptosis in MCF-7, MDA-MD231, SKBR-3, SW480, and SW620 cells and impairs anchorage-independent colony formation of HMECs. These observations indicate GIPC1 plays an essential role in oncogenic transformation, and its expression is necessary for the survival of human breast and colorectal cancer cells. Additionally, a GIPC1 knock-down gene signature was used to interrogate publically available breast and ovarian cancer microarray datasets. This GIPC1 signature statistically correlates with a number of breast and ovarian cancer phenotypes and clinical outcomes, including patient survival. Taken together, these data indicate that GIPC1 inhibition may represent a new target for therapeutic development for the treatment of human cancers.
Collapse
|
165
|
Tanaka N, Huttenhower C, Nosho K, Baba Y, Shima K, Quackenbush J, Haigis KM, Giovannucci E, Fuchs CS, Ogino S. Novel application of structural equation modeling to correlation structure analysis of CpG island methylation in colorectal cancer. THE AMERICAN JOURNAL OF PATHOLOGY 2010; 177:2731-40. [PMID: 21037082 DOI: 10.2353/ajpath.2010.100361] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
Abstract
The CpG island methylator phenotype (CIMP-high, CIMP1) is a distinct phenotype associated with microsatellite instability (MSI) and BRAF mutation in colon cancer. Recent evidence suggests the presence of KRAS mutation-associated CIMP subtype (CIMP-low, CIMP2). We used cluster analysis, principal component analysis (PCA), and structural equation modeling (SEM), a novel strategy, to decipher the correlation structure of CpG island hypermethylation. Using a database of 861 colon and rectal cancers, DNA methylation at 16 CpG islands [CACNA1G, CDKN2A (p16/ink4a), CHFR, CRABP1, HIC1, IGF2, IGFBP3, MGMT, MINT-1, MINT-31, MLH1, NEUROG1, p14 (CDKN2A/arf), RUNX3, SOCS1, and WRN] was quantified by real-time PCR. Tumors were categorized into three groups: Group 1 with wild-type KRAS/BRAF (N = 440); Group 2 with mutant KRAS and wild-type BRAF (N = 308); and Group 3 with wild-type KRAS and mutant BRAF (N = 107). Tumors with mutant KRAS/BRAF (N = 6) were excluded. In unsupervised hierarchical clustering analysis, all but six markers (CACNA1G, IGF2, RUNX3, MGMT, MINT-1, and SOCS1) were differentially clustered with CIMP-high and CIMP-low according to KRAS and BRAF status. In SEM, the correlation structures between CIMP, locus-specific CpG island methylation, and MSI differed according to KRAS and BRAF status, which was consistent with PCA results. In conclusion, KRAS and BRAF mutations appear to differentially influence correlation structure of CpG island methylation. Our novel data suggest two distinct perturbations, resulting in differential locus-specific propensity of CpG methylation.
Collapse
|
166
|
Ilhan A, Wagner L, Maj M, Woehrer A, Czech T, Heinzl H, Marosi C, Base W, Preusser M, Jeuken JW, Navis AC, Sijben A, Boots-Sprenger SH, Bleeker FE, Gijtenbeek JM, Wesseling P, Seyed Sadr E, Tessier A, Seyed Sadr M, Alshami J, Anan M, Sabau C, Del Maestro R, Agnihotri S, Gajadhar A, Wolf A, Mischel PM, Hawkins C, Guha A, Guan X, Chance MR, Barnholtz-Sloan JS, Larson JD, Rodriguez FJ, Demer AM, Sarver AL, Dubac A, Jenkins RB, Dupuy AJ, Copeland NG, Jenkins NA, Taylor MD, Largaespada DA, Lusis EA, Stuart JE, Scheck AC, Coons SW, Lal A, Perry A, Gutmann DH, Barnholtz-Sloan JS, Adams MD, Cohen M, Devine K, Wolinsky Y, Bambakidis N, Selman W, Miller R, Sloan AE, Suchorska B, Mehrkens JH, Eigenbrod S, Eroes CA, Tonn JC, Kretzschmar HA, Kreth FW, Buczkowicz P, Bartels U, Morrison A, Zarghooni M, Bouffet E, Hawkins C, Kollmeyer TM, Wrensch M, Decker PA, Xiao Y, Rynearson AL, Fink S, Kosel ML, Johnson DR, Lachance DH, Yang P, Fridley BL, Wiemels J, Wiencke J, Jenkins RB, Zhou YH, Hess KR, Yu L, Raj VR, Liu L, Alfred Yung WK, Hutchins LF, Linskey ME, Roldan G, Kachra R, McIntyre JB, Magliocco A, Easaw J, Hamilton M, Northcott PA, Van Meter T, Eberhart C, Weiss W, Rutka JT, Gupta N, Korshunov A, French P, Kros J, Michiels E, Kloosterhof N, Hauser P, Montange MF, Jouvet A, Bouffet E, Jung S, Kim SK, Wang KC, Cho BK, Di Rocco C, Massimi L, Leonard J, Scheurlen W, Pfister S, Robinson S, Yang SH, Yoo JY, Cho DG, Kim HK, Kim SW, Lee SW, Fink S, Kollmeyer T, Rynearson A, Decker P, Sicotte H, Yang P, Jenkins R, Lai A, Kharbanda S, Tran A, Pope W, Solis O, Peale F, Forrest W, Purjara K, Carrillo J, Pandita A, Ellingson B, Bowers C, Soriano R, Mohan S, Yong W, Aldape K, Mischel P, Liau L, Nghiemphu P, James CD, Prados M, Westphal M, Lamszus K, Cloughesy T, Phillips H, Thon N, Kreth S, Eigenbrod S, Lutz J, Ledderose C, Tonn JC, Kretzschmar H, Kreth FW, Mokhtari K, Ducray F, Kros JM, Gorlia T, Idbaih A, Marie Y, Taphoorn M, Wesseling P, Brandes AA, Hoang-Xuan K, Delattre JY, Van den Bent M, Sanson M, Lavon I, Shahar T, Granit A, Smith Y, Nossek E, Siegal T, Ram Z, Marko NF, Quackenbush J, Weil RJ, Ducray F, Criniere E, Idbaih A, Paris S, Marie Y, Carpentier C, Houillier C, Dieme M, Adam C, Hoang-Xuan K, Delattre JY, Duyckaerts C, Sanson M, Mokhtari K, Zinn PO, Kozono D, Kasper EM, Warnke PC, Chin L, Chen CC, Saito K, Mukasa A, Saito N, Stieber D, Lenkiewicz E, Evers L, Vallar L, Bjerkvig R, Barrett M, Niclou SP, Gorlia T, Brandes A, Stupp R, Rampling R, Fumoleau P, Dittrich C, Campone M, Twelves C, Raymond E, Lacombe D, van den Bent MJ, Potter N, Ashmore S, Karakoula K, Ward S, Suarez-Merino B, Luxsuwong M, Thomas DG, Darling J, Warr T, Gutman DA, Cooper L, Kong J, Chisolm C, Van Meir EG, Saltz JH, Moreno CS, Brat DJ, Brennan CW, Brat DJ, Aldape KD, Cohen M, Lehman NL, McLendon RE, Miller R, Schniederjan M, Vandenberg SR, Weaver K, Phillips S, Pierce L, Christensen B, Smith A, Zheng S, Koestler D, Houseman EA, Marsit CJ, Wiemels JL, Nelson HH, Karagas MR, Wrensch MR, Kelsey KT, Wiencke JK, Al-Nedawi K, Meehan B, Micallef J, Guha A, Rak J. -Omics and Prognostic Markers. Neuro Oncol 2010. [DOI: 10.1093/neuonc/noq116.s8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
|
167
|
Antonescu C, Antonescu V, Sultana R, Quackenbush J. Using the DFCI gene index databases for biological discovery. ACTA ACUST UNITED AC 2010; Chapter 1:1.6.1-1.6.36. [PMID: 20205187 DOI: 10.1002/0471250953.bi0106s29] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
The DFCI Gene Index Web pages provide access to analyses of ESTs and gene sequences for nearly 114 species, as well as a number of resources derived from these. Each species-specific database is presented using a common format with a home page. A variety of methods exist that allow users to search each species-specific database. Methods implemented currently include nucleotide or protein sequence queries using WU-BLAST, text-based searches using various sequence identifiers, searches by gene, tissue and library name, and searches using functional classes through Gene Ontology assignments. This protocol provides guidance for using the Gene Index Databases to extract information.
Collapse
|
168
|
Colak D, Chishti MA, Al-Bakheet AB, Al-Qahtani A, Shoukri MM, Goyns MH, Ozand PT, Quackenbush J, Park BH, Kaya N. Integrative and comparative genomics analysis of early hepatocellular carcinoma differentiated from liver regeneration in young and old. Mol Cancer 2010; 9:146. [PMID: 20540791 PMCID: PMC2898705 DOI: 10.1186/1476-4598-9-146] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2009] [Accepted: 06/12/2010] [Indexed: 02/08/2023] Open
Abstract
BACKGROUND Hepatocellular carcinoma (HCC) is the third-leading cause of cancer-related deaths worldwide. It is often diagnosed at an advanced stage, and hence typically has a poor prognosis. To identify distinct molecular mechanisms for early HCC we developed a rat model of liver regeneration post-hepatectomy, as well as liver cells undergoing malignant transformation and compared them to normal liver using a microarray approach. Subsequently, we performed cross-species comparative analysis coupled with copy number alterations (CNA) of independent early human HCC microarray studies to facilitate the identification of critical regulatory modules conserved across species. RESULTS We identified 35 signature genes conserved across species, and shared among different types of early human HCCs. Over 70% of signature genes were cancer-related, and more than 50% of the conserved genes were mapped to human genomic CNA regions. Functional annotation revealed genes already implicated in HCC, as well as novel genes which were not previously reported in liver tumors. A subset of differentially expressed genes was validated using quantitative RT-PCR. Concordance was also confirmed for a significant number of genes and pathways in five independent validation microarray datasets. Our results indicated alterations in a number of cancer related pathways, including p53, p38 MAPK, ERK/MAPK, PI3K/AKT, and TGF-beta signaling pathways, and potential critical regulatory role of MYC, ERBB2, HNF4A, and SMAD3 for early HCC transformation. CONCLUSIONS The integrative analysis of transcriptional deregulation, genomic CNA and comparative cross species analysis brings new insights into the molecular profile of early hepatoma formation. This approach may lead to robust biomarkers for the detection of early human HCC.
Collapse
|
169
|
Tseveleki V, Rubio R, Vamvakas SS, White J, Taoufik E, Petit E, Quackenbush J, Probert L. Comparative gene expression analysis in mouse models for multiple sclerosis, Alzheimer's disease and stroke for identifying commonly regulated and disease-specific gene changes. Genomics 2010; 96:82-91. [PMID: 20435134 DOI: 10.1016/j.ygeno.2010.04.004] [Citation(s) in RCA: 66] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2010] [Revised: 04/22/2010] [Accepted: 04/22/2010] [Indexed: 12/17/2022]
Abstract
The brain responds to injury and infection by activating innate defense and tissue repair mechanisms. Working upon the hypothesis that the brain defense response involves common genes and pathways across diverse pathologies, we analysed global gene expression in brain from mouse models representing three major central nervous system disorders, cerebral stroke, multiple sclerosis and Alzheimer's disease compared to normal brain using DNA microarray expression profiling. A comparison of dysregulated genes across disease models revealed common genes and pathways including key components of estrogen and TGF-beta signaling pathways that have been associated with neuroprotection as well as a neurodegeneration mediator, TRPM7. Further, for each disease model, we discovered collections of differentially expressed genes that provide novel insight into the individual pathology and its associated mechanisms. Our data provide a resource for exploring the complex molecular mechanisms that underlie brain neurodegeneration and a new approach for identifying generic and disease-specific targets for therapy.
Collapse
|
170
|
Wang ZC, Culhane A, Drapkin R, Fatima A, Tian R, Daniels KE, Kantoff E, Liu J, Quackenbush J, Richardson AL, Berkowitz RS, Iglehart JD, Matulonis UA. Abstract 2135: Genetic relationships between ovarian and breast cancers. Cancer Res 2010. [DOI: 10.1158/1538-7445.am10-2135] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Abstract
Background: Ovarian cancer is the most fatal gynecological cancer, and breast cancer is the most common malignant disease in women. The existence of a common etiology to these diseases is suggested by the fact that BRCA1 or BRCA2 germline mutation-associated tumors are primarily restricted to the breast and the ovary. While in the ovary, BRCA mutations are associated with high-grade serous carcinomas (HGSC), and in the breast, BRCA1 mutation carriers can develop tumors that are phenotypic copies of sporadic basal-like breast cancer (BLBC).
Hypothesis/objective: We hypothesize that HGSC and BLBC's are similar with respect to genetic instability and chromosomal aberrations and that these similarities may reflect their common pathogenesis. The objective of our proposed project is to identify the common patterns of loss of heterozygosity (LOH) and regions of gain or amplification shared by the two diseases.
Materials and Methods: Fifty two HGSC were collected and tumor cells enriched by needle microdissection from frozen tissue sections if available to remove stroma before DNA isolation. DNA was subjected to Affymetrix 250K SNP array analysis. SNP array data from a DFCI-BWH cohort of breast cancers were analyzed for comparison. dChip software was used for LOH and copy number analysis.
Results: LOH pattern-based hierarchical clustering defined two subgroups in HGSC, the major subgroup (∼60% of the cases) with high-level LOH [mean fraction of LOH/genome (FLOH) 36.6%] and the minor subgroup (∼40% of the cases) with lower levels of LOH (mean FLOH 13%). These data suggest the differences in chromosomal instability in the two subgroups. The major group of ovarian tumors shared similar high levels of LOH with BRCA1-associated or sporadic BLBCs. In contrast, the lower frequency of LOH in the minor subgroup of serous tumors is similar to that in HER2 positive or estrogen receptor positive luminal breast tumors. LOH on chromosome 17 is the most common alteration in ∼78% of the cases. However, frequent LOH on chromosomes 4, 5, 9, 13, 22, and X is characteristic of the major ovarian cancer subgroup defined by this analysis. These features are also found in BLBC, but not in non-basal-like breast tumors. Chromosomal copy gain/amplification is most common on chromosome 8q (>40%), and frequent on 3q, 19q12, 19q13, and 20q (>15%) in two subgroups of serous ovarian cancer. These changes on 8q and 20q are shared by a proportion of breast cancers.
Conclusion: A major subset of serous ovarian cancer, like breast BLBCs, reveals a high degree of chromosomal instability with high levels of LOH. This suggests a defect(s) in maintenance of genome stability. Another subset of serous cancer, like non-basal-like breast tumors, has a lower degree of genome instability with low levels of LOH. We propose that these features may reflect a different pathogenesis of the two subgroups of serous ovarian tumors, and possibly are associated with differences in response to chemotherapy.
Citation Format: {Authors}. {Abstract title} [abstract]. In: Proceedings of the 101st Annual Meeting of the American Association for Cancer Research; 2010 Apr 17-21; Washington, DC. Philadelphia (PA): AACR; Cancer Res 2010;70(8 Suppl):Abstract nr 2135.
Collapse
|
171
|
Ogino S, Tanaka N, Nosho K, Huttenhower C, Quackenbush J, Haigis K, Fuchs C. Abstract 4922: KRAS and BRAF mutations influence correlation structure of locus-specific CpG island methylation and CIMP in colorectal cancer. Cancer Res 2010. [DOI: 10.1158/1538-7445.am10-4922] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Abstract
Background: The CpG island methylator phenotype (CIMP-high or CIMP1) is a distinct phenotype associated with microsatellite instability (MSI) and BRAF mutation in colon cancer. Recent investigations suggested that a separate KRAS mutation-associated CIMP subtype (CIMP-low or CIMP2) exists. However, no study has deciphered the interrelationship between KRAS, BRAF, CIMP and locus-specific CpG island methylation, utilizing causal modeling by structure equation model (SEM) analysis.
Design: Utilizing the database of 861 colorectal cancers, DNA methylation at 16 CpG islands [CACNA1G, CDKN2A (p16), CHFR, CRABP1, HIC1, IGF2, IGFBP3, MGMT, MINT-1, MINT-31, MLH1, NEUROG1, p14 (CDKN2A/ARF), RUNX3, SOCS1, and WRN] was quantified by real-time PCR (MethyLight). We also examined MSI status and DNMT3B expression. To evaluate the effect of KRAS and BRAF mutation on the correlation structure of the CpG island markers and MSI, we categorized tumors into 4 groups; Group 1 (N=440) with wild-type KRAS and BRAF; Group 2 (N=308) with mutant KRAS and wild-type BRAF; Group 3 (N=107) with wild-type KRAS and mutant BRAF; and Group 4 (N=6) with mutant KRAS and BRAF, which was excluded from further analyses. We analyzed data using clustering analysis, principal component analysis (PCA) and SEM.
Results: In unsupervised disjoint clustering analysis of the methylation markers, only 4 markers (CACNA1G, IGF2, RUNX3, SOCS1), were clustered with CIMP-high regardless of KRAS or BRAF mutation. In Groups 1 and 2, two latent CIMP classes, CIMP-low and CIMP-high, were assumed. However, in Group 3, CIMP-high status and CIMP-low status were clustered together so that only one latent class, “CIMP”, was assumed. PCA yielded similar findings to clustering. Based on the results of the cluster analysis and PCA and the literature data, we applied causal modeling by SEM analysis. In SEM analyses, the correlation structures between CIMP, locus-specific CpG island methylation and MSI differed according to KRAS and BRAF status, which was in agreement with results by PCA analyses. Differences between Group 1, Group 2 and Group 3 were highly significant p<0.0001).
Conclusions: KRAS and BRAF mutations influence correlation structures between locus-specific CpG island methylation, and CIMP status in colorectal cancer. Our data suggest the role of KRAS and BRAF mutations in modifying propensity of CpG island methylation in a locus-specific manner.
Citation Format: {Authors}. {Abstract title} [abstract]. In: Proceedings of the 101st Annual Meeting of the American Association for Cancer Research; 2010 Apr 17-21; Washington, DC. Philadelphia (PA): AACR; Cancer Res 2010;70(8 Suppl):Abstract nr 4922.
Collapse
|
172
|
Marko NF, Quackenbush J, Weil RJ. Abstract B8: Why is there no consensus on GBM subgroups? Understanding the nature of biological and statistical variability in The Cancer Genome Atlas GBM data and the implications for molecular tumor classification. Clin Cancer Res 2010. [DOI: 10.1158/1078-0432.tcme10-b8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Abstract
Introduction: Clinical differences among patients with glioblastoma (GBM) suggest the existence of discrete subgroups of this disease. Such groups are not recognized histologically, engendering interest in molecular classification strategies for GBM. Numerous studies have described molecular fingerprints characteristic of GBM subclasses with unique genotypes and phenotypes, but these classifications have been difficult to reproduce. Accordingly, there remains little consensus regarding GBM subclasses, their characteristic molecular signatures, and the clinically-relevant genotype-phenotype correlations of the putative classes. We hypothesize that a combination of biological and mathematical factors confound interpretation of this data, and we have undertaken a comprehensive investigation into the nature and relative contributions of these factors to inconsistencies in molecular classification of GBMs.
Methods: We analyzed gene expression and clinical data for all 340 GBMs in The Cancer Genome Atlas (TCGA) profiled using the Affymetrix HT-HG-U133A platform. We created a logic model for systematically analyzing the sources of biological, technical, and mathematical variability inherent in this dataset and in the analytic strategies commonly employed in its interpretation. We then used standard linear classifiers and linear dimensionality reduction algorithms in conjunction with our logic model to investigate the nature and relative contributions of each factor.
Results: Gene expression data can be used in conjunction with unbiased linear classifiers to distinguish GBMs from other tumors, suggesting that valid biological data is contained in the dataset. However, the same classifiers fail to segregate GBMs into clinically-relevant molecular subgroups. Further investigation reveals that commonly-described sources of classification error, including individual sample characteristics, batch effects, and analytic and platform (technical) noise make a measurable but proportionally minor contribution to inaccurate classification. Instead, our analysis suggests that three, previously underappreciated classes of variability may account for a larger fraction of classification errors: biologic variability (noise) among tumors, of which we describe three types; skewed data distributions incorrectly assumed to be normal; and inherent nonlinear/nonorthogonal relationships among the variables (genes) used in conjunction with classification algorithms that assume linearity.
Conclusions: Technical sources of variability are often assumed to be the primary source of inaccurate molecular classification of GBMs. Our analysis of the TCGA data suggests a contributory role for these factors, and we believe that additional research in modeling this error is critical to improving classification accuracy. Notwithstanding, our analysis also suggests that three, rarely-discussed factors, biological variability, abnormal data distribution, and nonlinear relationships among genes, may, together, be responsible for a proportionally larger component of classification error. Additional research is necessary to better characterize the nature and relative magnitude of each of these effects. Subsequent efforts can then be made to develop strategies capable of more appropriately identifying and addressing these factors, thereby improving the accuracy and precision of future molecular classifiers for GBM.
Citation Information: Clin Cancer Res 2010;16(7 Suppl):B8
Collapse
|
173
|
Aryee MJ, Gutiérrez-Pabello JA, Kramnik I, Maiti T, Quackenbush J. An improved empirical bayes approach to estimating differential gene expression in microarray time-course data: BETR (Bayesian Estimation of Temporal Regulation). BMC Bioinformatics 2009; 10:409. [PMID: 20003283 PMCID: PMC2801687 DOI: 10.1186/1471-2105-10-409] [Citation(s) in RCA: 84] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2009] [Accepted: 12/10/2009] [Indexed: 12/02/2022] Open
Abstract
Background Microarray gene expression time-course experiments provide the opportunity to observe the evolution of transcriptional programs that cells use to respond to internal and external stimuli. Most commonly used methods for identifying differentially expressed genes treat each time point as independent and ignore important correlations, including those within samples and between sampling times. Therefore they do not make full use of the information intrinsic to the data, leading to a loss of power. Results We present a flexible random-effects model that takes such correlations into account, improving our ability to detect genes that have sustained differential expression over more than one time point. By modeling the joint distribution of the samples that have been profiled across all time points, we gain sensitivity compared to a marginal analysis that examines each time point in isolation. We assign each gene a probability of differential expression using an empirical Bayes approach that reduces the effective number of parameters to be estimated. Conclusions Based on results from theory, simulated data, and application to the genomic data presented here, we show that BETR has increased power to detect subtle differential expression in time-series data. The open-source R package betr is available through Bioconductor. BETR has also been incorporated in the freely-available, open-source MeV software tool available from http://www.tm4.org/mev.html.
Collapse
|
174
|
April C, Klotzle B, Royce T, Wickham-Garcia E, Boyaniwsky T, Izzo J, Cox D, Jones W, Rubio R, Holton K, Matulonis U, Quackenbush J, Fan JB. Whole-genome gene expression profiling of formalin-fixed, paraffin-embedded tissue samples. PLoS One 2009; 4:e8162. [PMID: 19997620 PMCID: PMC2780295 DOI: 10.1371/journal.pone.0008162] [Citation(s) in RCA: 99] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2009] [Accepted: 10/30/2009] [Indexed: 11/30/2022] Open
Abstract
Background We have developed a gene expression assay (Whole-Genome DASL®), capable of generating whole-genome gene expression profiles from degraded samples such as formalin-fixed, paraffin-embedded (FFPE) specimens. Methodology/Principal Findings We demonstrated a similar level of sensitivity in gene detection between matched fresh-frozen (FF) and FFPE samples, with the number and overlap of probes detected in the FFPE samples being approximately 88% and 95% of that in the corresponding FF samples, respectively; 74% of the differentially expressed probes overlapped between the FF and FFPE pairs. The WG-DASL assay is also able to detect 1.3–1.5 and 1.5–2 -fold changes in intact and FFPE samples, respectively. The dynamic range for the assay is ∼3 logs. Comparing the WG-DASL assay with an in vitro transcription-based labeling method yielded fold-change correlations of R2 ∼0.83, while fold-change comparisons with quantitative RT-PCR assays yielded R2∼0.86 and R2∼0.55 for intact and FFPE samples, respectively. Additionally, the WG-DASL assay yielded high self-correlations (R2>0.98) with low intact RNA inputs ranging from 1 ng to 100 ng; reproducible expression profiles were also obtained with 250 pg total RNA (R2∼0.92), with ∼71% of the probes detected in 100 ng total RNA also detected at the 250 pg level. When FFPE samples were assayed, 1 ng total RNA yielded self-correlations of R2∼0.80, while still maintaining a correlation of R2∼0.75 with standard FFPE inputs (200 ng). Conclusions/Significance Taken together, these results show that WG-DASL assay provides a reliable platform for genome-wide expression profiling in archived materials. It also possesses utility within clinical settings where only limited quantities of samples may be available (e.g. microdissected material) or when minimally invasive procedures are performed (e.g. biopsied specimens).
Collapse
|
175
|
Abstract
Genomic data often persist far beyond the initial study in which they were generated. But the true value of the data is tied to their being both used and useful, and the usefulness of the data relies intimately on how well annotated they are. While standards such as MIAME have been in existence for nearly a decade, we cannot think that the problem is solved or that we can ignore the need to develop better, more effective methods for capturing the essence of the meta-data that is ultimately required to guarantee utility of the data.
Collapse
|