51
|
Zhong Y, Wan YW, Pang K, Chow LML, Liu Z. Digital sorting of complex tissues for cell type-specific gene expression profiles. BMC Bioinformatics 2013; 14:89. [PMID: 23497278 PMCID: PMC3626856 DOI: 10.1186/1471-2105-14-89] [Citation(s) in RCA: 139] [Impact Index Per Article: 12.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2012] [Accepted: 02/14/2013] [Indexed: 11/29/2022] Open
Abstract
Background Cellular heterogeneity is present in almost all gene expression profiles. However, transcriptome analysis of tissue specimens often ignores the cellular heterogeneity present in these samples. Standard deconvolution algorithms require prior knowledge of the cell type frequencies within a tissue or their in vitro expression profiles. Furthermore, these algorithms tend to report biased estimations. Results Here, we describe a Digital Sorting Algorithm (DSA) for extracting cell-type specific gene expression profiles from mixed tissue samples that is unbiased and does not require prior knowledge of cell type frequencies. Conclusions The results suggest that DSA is a specific and sensitivity algorithm in gene expression profile deconvolution and will be useful in studying individual cell types of complex tissues.
Collapse
Affiliation(s)
- Yi Zhong
- Department of Pediatrics, Neurological Research Institute, Baylor College of Medicine, Houston, TX, USA
| | | | | | | | | |
Collapse
|
52
|
PERT: a method for expression deconvolution of human blood samples from varied microenvironmental and developmental conditions. PLoS Comput Biol 2012; 8:e1002838. [PMID: 23284283 PMCID: PMC3527275 DOI: 10.1371/journal.pcbi.1002838] [Citation(s) in RCA: 82] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2012] [Accepted: 10/26/2012] [Indexed: 12/30/2022] Open
Abstract
The cellular composition of heterogeneous samples can be predicted using an expression deconvolution algorithm to decompose their gene expression profiles based on pre-defined, reference gene expression profiles of the constituent populations in these samples. However, the expression profiles of the actual constituent populations are often perturbed from those of the reference profiles due to gene expression changes in cells associated with microenvironmental or developmental effects. Existing deconvolution algorithms do not account for these changes and give incorrect results when benchmarked against those measured by well-established flow cytometry, even after batch correction was applied. We introduce PERT, a new probabilistic expression deconvolution method that detects and accounts for a shared, multiplicative perturbation in the reference profiles when performing expression deconvolution. We applied PERT and three other state-of-the-art expression deconvolution methods to predict cell frequencies within heterogeneous human blood samples that were collected under several conditions (uncultured mono-nucleated and lineage-depleted cells, and culture-derived lineage-depleted cells). Only PERT's predicted proportions of the constituent populations matched those assigned by flow cytometry. Genes associated with cell cycle processes were highly enriched among those with the largest predicted expression changes between the cultured and uncultured conditions. We anticipate that PERT will be widely applicable to expression deconvolution strategies that use profiles from reference populations that vary from the corresponding constituent populations in cellular state but not cellular phenotypic identity.
Collapse
|
53
|
Kuhn A, Kumar A, Beilina A, Dillman A, Cookson MR, Singleton AB. Cell population-specific expression analysis of human cerebellum. BMC Genomics 2012; 13:610. [PMID: 23145530 PMCID: PMC3561119 DOI: 10.1186/1471-2164-13-610] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2012] [Accepted: 10/09/2012] [Indexed: 11/10/2022] Open
Abstract
Background Interpreting gene expression profiles obtained from heterogeneous samples can be difficult because bulk gene expression measures are not resolved to individual cell populations. We have recently devised Population-Specific Expression Analysis (PSEA), a statistical method that identifies individual cell types expressing genes of interest and achieves quantitative estimates of cell type-specific expression levels. This procedure makes use of marker gene expression and circumvents the need for additional experimental information like tissue composition. Results To systematically assess the performance of statistical deconvolution, we applied PSEA to gene expression profiles from cerebellum tissue samples and compared with parallel, experimental separation methods. Owing to the particular histological organization of the cerebellum, we could obtain cellular expression data from in situ hybridization and laser-capture microdissection experiments and successfully validated computational predictions made with PSEA. Upon statistical deconvolution of whole tissue samples, we identified a set of transcripts showing age-related expression changes in the astrocyte population. Conclusions PSEA can predict cell-type specific expression levels from tissues homogenates on a genome-wide scale. It thus represents a computational alternative to experimental separation methods and allowed us to identify age-related expression changes in the astrocytes of the cerebellum. These molecular changes might underlie important physiological modifications previously observed in the aging brain.
Collapse
Affiliation(s)
- Alexandre Kuhn
- Laboratory of Neurogenetics, National Institute on Aging, National Institutes of Health, Bethesda, MD, USA.
| | | | | | | | | | | |
Collapse
|
54
|
Jia Z, Wang Y, Hu Y, McLaren C, Yu Y, Ye K, Xia XQ, Koziol JA, Lernhardt W, McClelland M, Mercola D. A sample selection strategy to boost the statistical power of signature detection in cancer expression profile studies. Anticancer Agents Med Chem 2012; 13:203-11. [PMID: 22934703 DOI: 10.2174/1871520611313020004] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2012] [Revised: 05/01/2012] [Accepted: 05/05/2012] [Indexed: 11/22/2022]
Abstract
In case-control profiling studies, increasing the sample size does not always improve statistical power because the variance may also be increased if samples are highly heterogeneous. For instance, tumor samples used for gene expression assay are often heterogeneous in terms of tissue composition or mechanism of progression, or both; however, such variation is rarely taken into account in expression profiles analysis. We use a prostate cancer prognosis study as an example to demonstrate that solely recruiting more patient samples may not increase power for biomarker detection at all. In response to the heterogeneity due to mixed tissue, we developed a sample selection strategy termed Stepwise Enrichment by which samples are systematically culled based on tumor content and analyzed with t-test to determine an optimal threshold for tissue percentage. The selected tissue-percentage threshold identified the most significant data by balancing the sample size and the sample homogeneity; therefore, the power is substantially increased for identifying the prognostic biomarkers in prostate tumor epithelium cells as well as in prostate stroma cells. This strategy can be generally applied to profiling studies where the level of sample heterogeneity can be measured or estimated.
Collapse
Affiliation(s)
- Zhenyu Jia
- Department of Pathology and Laboratory Medicine, University of California, Irvine, CA 92697, USA.
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
55
|
Shannon CP, Hollander Z, Wilson-McManus J, Balshaw R, Ng RT, McMaster R, McManus BM, Keown PA, Tebbutt SJ. White blood cell differentials enrich whole blood expression data in the context of acute cardiac allograft rejection. Bioinform Biol Insights 2012; 6:49-61. [PMID: 22550401 PMCID: PMC3329187 DOI: 10.4137/bbi.s9197] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
Acute cardiac allograft rejection is a serious complication of heart transplantation. Investigating molecular processes in whole blood via microarrays is a promising avenue of research in transplantation, particularly due to the non-invasive nature of blood sampling. However, whole blood is a complex tissue and the consequent heterogeneity in composition amongst samples is ignored in traditional microarray analysis. This complicates the biological interpretation of microarray data. Here we have applied a statistical deconvolution approach, cell-specific significance analysis of microarrays (csSAM), to whole blood samples from subjects either undergoing acute heart allograft rejection (AR) or not (NR). We identified eight differentially expressed probe-sets significantly correlated to monocytes (mapping to 6 genes, all down-regulated in ARs versus NRs) at a false discovery rate (FDR) ≤ 15%. None of the genes identified are present in a biomarker panel of acute heart rejection previously published by our group and discovered in the same data***.
Collapse
|
56
|
Optimal deconvolution of transcriptional profiling data using quadratic programming with application to complex clinical blood samples. PLoS One 2011; 6:e27156. [PMID: 22110609 PMCID: PMC3217948 DOI: 10.1371/journal.pone.0027156] [Citation(s) in RCA: 94] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2011] [Accepted: 10/11/2011] [Indexed: 11/19/2022] Open
Abstract
Large-scale molecular profiling technologies have assisted the identification of disease biomarkers and facilitated the basic understanding of cellular processes. However, samples collected from human subjects in clinical trials possess a level of complexity, arising from multiple cell types, that can obfuscate the analysis of data derived from them. Failure to identify, quantify, and incorporate sources of heterogeneity into an analysis can have widespread and detrimental effects on subsequent statistical studies.We describe an approach that builds upon a linear latent variable model, in which expression levels from mixed cell populations are modeled as the weighted average of expression from different cell types. We solve these equations using quadratic programming, which efficiently identifies the globally optimal solution while preserving non-negativity of the fraction of the cells. We applied our method to various existing platforms to estimate proportions of different pure cell or tissue types and gene expression profilings of distinct phenotypes, with a focus on complex samples collected in clinical trials. We tested our methods on several well controlled benchmark data sets with known mixing fractions of pure cell or tissue types and mRNA expression profiling data from samples collected in a clinical trial. Accurate agreement between predicted and actual mixing fractions was observed. In addition, our method was able to predict mixing fractions for more than ten species of circulating cells and to provide accurate estimates for relatively rare cell types (<10% total population). Furthermore, accurate changes in leukocyte trafficking associated with Fingolomid (FTY720) treatment were identified that were consistent with previous results generated by both cell counts and flow cytometry. These data suggest that our method can solve one of the open questions regarding the analysis of complex transcriptional data: namely, how to identify the optimal mixing fractions in a given experiment.
Collapse
|
57
|
Investigation of variation in gene expression profiling of human blood by extended principle component analysis. PLoS One 2011; 6:e26905. [PMID: 22046403 PMCID: PMC3203156 DOI: 10.1371/journal.pone.0026905] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2011] [Accepted: 10/06/2011] [Indexed: 01/08/2023] Open
Abstract
Background Human peripheral blood is a promising material for biomedical research. However, various kinds of biological and technological factors result in a large degree of variation in blood gene expression profiles. Methodology/Principal Findings Human peripheral blood samples were drawn from healthy volunteers and analysed using the Human Genome U133Plus2 Microarray. We applied a novel approach using the Principle Component Analysis and Eigen-R2 methods to dissect the overall variation of blood gene expression profiles with respect to the interested biological and technological factors. The results indicated that the predominating sources of the variation could be traced to the individual heterogeneity of the relative proportions of different blood cell types (leukocyte subsets and erythrocytes). The physiological factors like age, gender and BMI were demonstrated to be associated with 5.3% to 9.2% of the total variation in the blood gene expression profiles. We investigated the gene expression profiles of samples from the same donors but with different levels of RNA quality. Although the proportion of variation associated to the RNA Integrity Number was mild (2.1%), the significant impact of RNA quality on the expression of individual genes was observed. Conclusions By characterizing the major sources of variation in blood gene expression profiles, such variability can be minimized by modifications to study designs. Increasing sample size, balancing confounding factors between study groups, using rigorous selection criteria for sample quality, and well controlled experimental processes will significantly improve the accuracy and reproducibility of blood transcriptome study.
Collapse
|
58
|
Population-specific expression analysis (PSEA) reveals molecular changes in diseased brain. Nat Methods 2011; 8:945-7. [PMID: 21983921 DOI: 10.1038/nmeth.1710] [Citation(s) in RCA: 127] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2011] [Accepted: 08/04/2011] [Indexed: 11/08/2022]
Abstract
Human diseases are often accompanied by histological changes that confound interpretation of molecular analyses and identification of disease-related effects. We developed population-specific expression analysis (PSEA), a computational method of analyzing gene expression in samples of varying composition that can improve analyses of quantitative molecular data in many biological contexts. PSEA of brains from individuals with Huntington's disease revealed myelin-related abnormalities that were undetected using standard differential expression analysis.
Collapse
|
59
|
Gaujoux R, Seoighe C. Semi-supervised Nonnegative Matrix Factorization for gene expression deconvolution: a case study. INFECTION GENETICS AND EVOLUTION 2011; 12:913-21. [PMID: 21930246 DOI: 10.1016/j.meegid.2011.08.014] [Citation(s) in RCA: 81] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/03/2011] [Revised: 08/10/2011] [Accepted: 08/11/2011] [Indexed: 10/17/2022]
Abstract
Heterogeneity in sample composition is an inherent issue in many gene expression studies and, in many cases, should be taken into account in the downstream analysis to enable correct interpretation of the underlying biological processes. Typical examples are infectious diseases or immunology-related studies using blood samples, where, for example, the proportions of lymphocyte sub-populations are expected to vary between cases and controls. Nonnegative Matrix Factorization (NMF) is an unsupervised learning technique that has been applied successfully in several fields, notably in bioinformatics where its ability to extract meaningful information from high-dimensional data such as gene expression microarrays has been demonstrated. Very recently, it has been applied to biomarker discovery and gene expression deconvolution in heterogeneous tissue samples. Being essentially unsupervised, standard NMF methods are not guaranteed to find components corresponding to the cell types of interest in the sample, which may jeopardize the correct estimation of cell proportions. We have investigated the use of prior knowledge, in the form of a set of marker genes, to improve gene expression deconvolution with NMF algorithms. We found that this improves the consistency with which both cell type proportions and cell type gene expression signatures are estimated. The proposed method was tested on a microarray dataset consisting of pure cell types mixed in known proportions. Pearson correlation coefficients between true and estimated cell type proportions improved substantially (typically from about 0.5 to approximately 0.8) with the semi-supervised (marker-guided) versions of commonly used NMF algorithms. Furthermore known marker genes associated with each cell type were assigned to the correct cell type more frequently for the guided versions. We conclude that the use of marker genes improves the accuracy of gene expression deconvolution using NMF and suggest modifications to how the marker gene information is used that may lead to further improvements.
Collapse
Affiliation(s)
- Renaud Gaujoux
- Computational Biology Group, Institute of Infectious Disease and Molecular Medicine, University of Cape Town, South Africa.
| | | |
Collapse
|
60
|
Miller JA, Cai C, Langfelder P, Geschwind DH, Kurian SM, Salomon DR, Horvath S. Strategies for aggregating gene expression data: the collapseRows R function. BMC Bioinformatics 2011; 12:322. [PMID: 21816037 PMCID: PMC3166942 DOI: 10.1186/1471-2105-12-322] [Citation(s) in RCA: 229] [Impact Index Per Article: 17.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2011] [Accepted: 08/04/2011] [Indexed: 12/19/2022] Open
Abstract
Background Genomic and other high dimensional analyses often require one to summarize multiple related variables by a single representative. This task is also variously referred to as collapsing, combining, reducing, or aggregating variables. Examples include summarizing several probe measurements corresponding to a single gene, representing the expression profiles of a co-expression module by a single expression profile, and aggregating cell-type marker information to de-convolute expression data. Several standard statistical summary techniques can be used, but network methods also provide useful alternative methods to find representatives. Currently few collapsing functions are developed and widely applied. Results We introduce the R function collapseRows that implements several collapsing methods and evaluate its performance in three applications. First, we study a crucial step of the meta-analysis of microarray data: the merging of independent gene expression data sets, which may have been measured on different platforms. Toward this end, we collapse multiple microarray probes for a single gene and then merge the data by gene identifier. We find that choosing the probe with the highest average expression leads to best between-study consistency. Second, we study methods for summarizing the gene expression profiles of a co-expression module. Several gene co-expression network analysis applications show that the optimal collapsing strategy depends on the analysis goal. Third, we study aggregating the information of cell type marker genes when the aim is to predict the abundance of cell types in a tissue sample based on gene expression data ("expression deconvolution"). We apply different collapsing methods to predict cell type abundances in peripheral human blood and in mixtures of blood cell lines. Interestingly, the most accurate prediction method involves choosing the most highly connected "hub" marker gene. Finally, to facilitate biological interpretation of collapsed gene lists, we introduce the function userListEnrichment, which assesses the enrichment of gene lists for known brain and blood cell type markers, and for other published biological pathways. Conclusions The R function collapseRows implements several standard and network-based collapsing methods. In various genomic applications we provide evidence that both types of methods are robust and biologically relevant tools.
Collapse
Affiliation(s)
- Jeremy A Miller
- Interdepartmental Program for Neuroscience, UCLA, Los Angeles, California, USA
| | | | | | | | | | | | | |
Collapse
|
61
|
Elloumi F, Hu Z, Li Y, Parker JS, Gulley ML, Amos KD, Troester MA. Systematic bias in genomic classification due to contaminating non-neoplastic tissue in breast tumor samples. BMC Med Genomics 2011; 4:54. [PMID: 21718502 PMCID: PMC3151208 DOI: 10.1186/1755-8794-4-54] [Citation(s) in RCA: 70] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2011] [Accepted: 06/30/2011] [Indexed: 12/15/2022] Open
Abstract
Background Genomic tests are available to predict breast cancer recurrence and to guide clinical decision making. These predictors provide recurrence risk scores along with a measure of uncertainty, usually a confidence interval. The confidence interval conveys random error and not systematic bias. Standard tumor sampling methods make this problematic, as it is common to have a substantial proportion (typically 30-50%) of a tumor sample comprised of histologically benign tissue. This "normal" tissue could represent a source of non-random error or systematic bias in genomic classification. Methods To assess the performance characteristics of genomic classification to systematic error from normal contamination, we collected 55 tumor samples and paired tumor-adjacent normal tissue. Using genomic signatures from the tumor and paired normal, we evaluated how increasing normal contamination altered recurrence risk scores for various genomic predictors. Results Simulations of normal tissue contamination caused misclassification of tumors in all predictors evaluated, but different breast cancer predictors showed different types of vulnerability to normal tissue bias. While two predictors had unpredictable direction of bias (either higher or lower risk of relapse resulted from normal contamination), one signature showed predictable direction of normal tissue effects. Due to this predictable direction of effect, this signature (the PAM50) was adjusted for normal tissue contamination and these corrections improved sensitivity and negative predictive value. For all three assays quality control standards and/or appropriate bias adjustment strategies can be used to improve assay reliability. Conclusions Normal tissue sampled concurrently with tumor is an important source of bias in breast genomic predictors. All genomic predictors show some sensitivity to normal tissue contamination and ideal strategies for mitigating this bias vary depending upon the particular genes and computational methods used in the predictor.
Collapse
Affiliation(s)
- Fathi Elloumi
- Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | | | | | | | | | | | | |
Collapse
|
62
|
Rivas LA, Aguirre J, Blanco Y, González-Toril E, Parro V. Graph-based deconvolution analysis of multiplex sandwich microarray immunoassays: applications for environmental monitoring. Environ Microbiol 2011; 13:1421-32. [PMID: 21401847 DOI: 10.1111/j.1462-2920.2011.02442.x] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
The sandwich microarray immunoassay (SMI) is a powerful technique for the analysis and characterization of environmental samples, from the identification of microorganisms to specific bioanalytes. As the number of antibodies increases, however, unspecific binding and cross-reactivity can become a problem. To cope with such difficulties, we present here the concept of antibody graph associated to a sandwich antibody microarray. Antibody graphs give valuable information about the antibody cross-reactivity network and all the players involved in the sandwich format: capturing and tracer antibodies, the antigenic sample and the degree of cross-reactivity between antibodies. Making use of the information contained in the antibody graph, we have developed a deconvolution method that disentangles the antibody cross-reactivity events and gives qualitative information about the composition of the experimental sample under study. We have validated the method by using a 66 antibody-containing microarray to describe known antigenic mixtures as well as natural environmental samples characterized by 16S-RNA gene phylogenetic analysis. The application of our antibody graph and deconvolution method allowed us to discriminate between true specific antigen-antibody reactions and spurious signals on a microarray designed for environmental monitoring.
Collapse
Affiliation(s)
- Luis A Rivas
- Department of Molecular Evolution, Centro de Astrobiología (INTA-CSIC), Madrid, Spain.
| | | | | | | | | |
Collapse
|
63
|
|
64
|
Abstract
Cell type heterogeneity may have a substantial effect on gene expression profiling of human tissue. Several in silico methods for deconvoluting a gene expression profile into cell-type-specific subprofiles have been published but not widely used. Here, we consider recent methods and the experimental validations available for them. Shen-Orr et al. recently developed an approach called cell-type-specific significance analysis of microarray for deconvoluting gene expression. This method requires the measurement of the proportion of each cell type in each sample and the expression profiles of the heterogeneous samples. It determines how gene expression varies among pre-defined phenotypes for each cell type. Gene expression can vary substantially among cell types and sample heterogeneity can mask the identification of biologically important phenotypic correlations. Consequently, the deconvolution approach can be useful in the analysis of mixtures of cell populations in clinical samples.
Collapse
Affiliation(s)
- Yingdong Zhao
- Biometric Research Branch, Division of Cancer Treatment and Diagnosis, National Cancer Institute, National Institutes of Health, Bethesda, Maryland, USA.
| | | |
Collapse
|
65
|
Camp JT, Elloumi F, Roman-Perez E, Rein J, Stewart DA, Harrell JC, Perou CM, Troester MA. Interactions with fibroblasts are distinct in Basal-like and luminal breast cancers. Mol Cancer Res 2010; 9:3-13. [PMID: 21131600 DOI: 10.1158/1541-7786.mcr-10-0372] [Citation(s) in RCA: 85] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
Abstract
Basal-like breast cancers have several well-characterized distinguishing molecular features, but most of these are features of the cancer cells themselves. The unique stromal-epithelial interactions, and more generally, microenvironmental features of basal-like breast cancers have not been well characterized. To identify characteristic microenvironment features of basal-like breast cancer, we performed cocultures of several basal-like breast cancer cell lines with fibroblasts and compared these with cocultures of luminal breast cancer cell lines with fibroblasts. Interactions between basal-like cancer cells and fibroblasts induced expression of numerous interleukins and chemokines, including IL-6, IL-8, CXCL1, CXCL3, and TGFβ. Under the influence of fibroblasts, basal-like breast cancer cell lines also showed increased migration in vitro. Migration was less pronounced for luminal lines; but, these lines were more likely to have altered proliferation. These differences were relevant to tumor biology in vivo, as the gene set that distinguished luminal and basal-like stromal interactions in coculture also distinguishes basal-like from luminal tumors with 98% accuracy in 10-fold cross-validation and 100% accuracy in an independent test set. However, comparisons between cocultures where cells were in direct contact and cocultures where interaction was solely through soluble factors suggest that there is an important impact of direct cell-to-cell contact. The phenotypes and gene expression changes invoked by cancer cell interactions with fibroblasts support the microenvironment and cell-cell interactions as intrinsic features of breast cancer subtypes.
Collapse
Affiliation(s)
- J Terese Camp
- Department of Epidemiology, University of North Carolina at Chapel Hill, Campus Box 7435, 135 Dauer Ln, Chapel Hill, NC 27599, USA
| | | | | | | | | | | | | | | |
Collapse
|
66
|
Chaussabel D, Pascual V, Banchereau J. Assessing the human immune system through blood transcriptomics. BMC Biol 2010; 8:84. [PMID: 20619006 PMCID: PMC2895587 DOI: 10.1186/1741-7007-8-84] [Citation(s) in RCA: 174] [Impact Index Per Article: 12.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2010] [Accepted: 06/15/2010] [Indexed: 02/07/2023] Open
Abstract
Blood is the pipeline of the immune system. Assessing changes in transcript abundance in blood on a genome-wide scale affords a comprehensive view of the status of the immune system in health and disease. This review summarizes the work that has used this approach to identify therapeutic targets and biomarker signatures in the field of autoimmunity and infectious disease. Recent technological and methodological advances that will carry the blood transcriptome research field forward are also discussed.
Collapse
Affiliation(s)
- Damien Chaussabel
- Baylor Institute for Immunology Research and Baylor Research Institute, 3434 Live Oak, Dallas, TX 75204, USA
| | - Virginia Pascual
- Baylor Institute for Immunology Research and Baylor Research Institute, 3434 Live Oak, Dallas, TX 75204, USA
| | - Jacques Banchereau
- Baylor Institute for Immunology Research and Baylor Research Institute, 3434 Live Oak, Dallas, TX 75204, USA
| |
Collapse
|
67
|
Shen-Orr SS, Tibshirani R, Khatri P, Bodian DL, Staedtler F, Perry NM, Hastie T, Sarwal MM, Davis MM, Butte AJ. Cell type-specific gene expression differences in complex tissues. Nat Methods 2010; 7:287-9. [PMID: 20208531 DOI: 10.1038/nmeth.1439] [Citation(s) in RCA: 350] [Impact Index Per Article: 25.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2009] [Accepted: 01/09/2010] [Indexed: 12/13/2022]
Abstract
We describe cell type-specific significance analysis of microarrays (csSAM) for analyzing differential gene expression for each cell type in a biological sample from microarray data and relative cell-type frequencies. First, we validated csSAM with predesigned mixtures and then applied it to whole-blood gene expression datasets from stable post-transplant kidney transplant recipients and those experiencing acute transplant rejection, which revealed hundreds of differentially expressed genes that were otherwise undetectable.
Collapse
Affiliation(s)
- Shai S Shen-Orr
- Department of Pediatrics, Stanford University School of Medicine, Stanford, California, USA
| | | | | | | | | | | | | | | | | | | |
Collapse
|
68
|
Abstract
MOTIVATION Global expression patterns within cells are used for purposes ranging from the identification of disease biomarkers to basic understanding of cellular processes. Unfortunately, tissue samples used in cancer studies are usually composed of multiple cell types and the non-cancerous portions can significantly affect expression profiles. This severely limits the conclusions that can be made about the specificity of gene expression in the cell-type of interest. However, statistical analysis can be used to identify differentially expressed genes that are related to the biological question being studied. RESULTS We propose a statistical approach to expression deconvolution from mixed tissue samples in which the proportion of each component cell type is unknown. Our method estimates the proportion of each component in a mixed tissue sample; this estimate can be used to provide estimates of gene expression from each component. We demonstrate our technique on xenograft samples from breast cancer research and publicly available experimental datasets found in the National Center for Biotechnology Information Gene Expression Omnibus repository. AVAILABILITY R code (http://www.r-project.org/) for estimating sample proportions is freely available to non-commercial users and available at http://www.med.miami.edu/medicine/x2691.xml.
Collapse
Affiliation(s)
- Jennifer Clarke
- Department of Medicine, University of Miami, 1120 NW 14th St, Suite 611, Miami, FL 33136, USA.
| | | | | |
Collapse
|
69
|
Repsilber D, Kern S, Telaar A, Walzl G, Black GF, Selbig J, Parida SK, Kaufmann SHE, Jacobsen M. Biomarker discovery in heterogeneous tissue samples -taking the in-silico deconfounding approach. BMC Bioinformatics 2010; 11:27. [PMID: 20070912 PMCID: PMC3098067 DOI: 10.1186/1471-2105-11-27] [Citation(s) in RCA: 69] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2009] [Accepted: 01/14/2010] [Indexed: 11/24/2022] Open
Abstract
Background For heterogeneous tissues, such as blood, measurements of gene expression are confounded by relative proportions of cell types involved. Conclusions have to rely on estimation of gene expression signals for homogeneous cell populations, e.g. by applying micro-dissection, fluorescence activated cell sorting, or in-silico deconfounding. We studied feasibility and validity of a non-negative matrix decomposition algorithm using experimental gene expression data for blood and sorted cells from the same donor samples. Our objective was to optimize the algorithm regarding detection of differentially expressed genes and to enable its use for classification in the difficult scenario of reversely regulated genes. This would be of importance for the identification of candidate biomarkers in heterogeneous tissues. Results Experimental data and simulation studies involving noise parameters estimated from these data revealed that for valid detection of differential gene expression, quantile normalization and use of non-log data are optimal. We demonstrate the feasibility of predicting proportions of constituting cell types from gene expression data of single samples, as a prerequisite for a deconfounding-based classification approach. Classification cross-validation errors with and without using deconfounding results are reported as well as sample-size dependencies. Implementation of the algorithm, simulation and analysis scripts are available. Conclusions The deconfounding algorithm without decorrelation using quantile normalization on non-log data is proposed for biomarkers that are difficult to detect, and for cases where confounding by varying proportions of cell types is the suspected reason. In this case, a deconfounding ranking approach can be used as a powerful alternative to, or complement of, other statistical learning approaches to define candidate biomarkers for molecular diagnosis and prediction in biomedicine, in realistically noisy conditions and with moderate sample sizes.
Collapse
Affiliation(s)
- Dirk Repsilber
- Department of Genetics and Biometry, Research Institute for Biology of Farm Animals, Wilhelm-Stahl Allee 2, D 18196 Dummerstorf, Germany.
| | | | | | | | | | | | | | | | | |
Collapse
|
70
|
Siegal-Gaskins D, Ash JN, Crosson S. Model-based deconvolution of cell cycle time-series data reveals gene expression details at high resolution. PLoS Comput Biol 2009; 5:e1000460. [PMID: 19680537 PMCID: PMC2718844 DOI: 10.1371/journal.pcbi.1000460] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2009] [Accepted: 07/08/2009] [Indexed: 11/23/2022] Open
Abstract
In both prokaryotic and eukaryotic cells, gene expression is regulated across the cell cycle to ensure “just-in-time” assembly of select cellular structures and molecular machines. However, present in all time-series gene expression measurements is variability that arises from both systematic error in the cell synchrony process and variance in the timing of cell division at the level of the single cell. Thus, gene or protein expression data collected from a population of synchronized cells is an inaccurate measure of what occurs in the average single-cell across a cell cycle. Here, we present a general computational method to extract “single-cell”-like information from population-level time-series expression data. This method removes the effects of 1) variance in growth rate and 2) variance in the physiological and developmental state of the cell. Moreover, this method represents an advance in the deconvolution of molecular expression data in its flexibility, minimal assumptions, and the use of a cross-validation analysis to determine the appropriate level of regularization. Applying our deconvolution algorithm to cell cycle gene expression data from the dimorphic bacterium Caulobacter crescentus, we recovered critical features of cell cycle regulation in essential genes, including ctrA and ftsZ, that were obscured in population-based measurements. In doing so, we highlight the problem with using population data alone to decipher cellular regulatory mechanisms and demonstrate how our deconvolution algorithm can be applied to produce a more realistic picture of temporal regulation in a cell. Time-series analyses of cellular regulatory processes have successfully drawn attention to the importance of temporal regulation in biological systems. A number of model systems can be synchronized such that data collected on cell populations better reflect the dynamic properties of the individual cell. However, experimental synchronization is never perfect, and the degree of synchrony that does exist at the outset of an experiment is quickly lost over time as cells grow at different rates and enter different developmental or physiological states on cell division. Thus, data collected from a population of synchronized cells can lead to incorrect models of temporal regulation. Here we demonstrate that the problem of relating population data to the individual cell can be resolved with a computational method that effectively removes the effects of both imperfect synchrony and time-dependent loss of synchrony. Application of this deconvolution algorithm to a cell cycle time-series data set from the model bacterium Caulobacter crescentus uncovers critical temporal details in the expression of essential genes that are not evident in the raw population-based data. The deconvolution routine presented here is a robust and general tool for extracting biochemical parameters of the average single cell from population time-series data.
Collapse
Affiliation(s)
- Dan Siegal-Gaskins
- Mathematical Biosciences Institute, Ohio State University, Columbus, OH, USA.
| | | | | |
Collapse
|
71
|
Abbas AR, Wolslegel K, Seshasayee D, Modrusan Z, Clark HF. Deconvolution of blood microarray data identifies cellular activation patterns in systemic lupus erythematosus. PLoS One 2009; 4:e6098. [PMID: 19568420 PMCID: PMC2699551 DOI: 10.1371/journal.pone.0006098] [Citation(s) in RCA: 293] [Impact Index Per Article: 19.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2009] [Accepted: 06/02/2009] [Indexed: 02/04/2023] Open
Abstract
Systemic Lupus Erythematosus (SLE) is a systemic autoimmune disease with a complex spectrum of cellular and molecular characteristics including several dramatic changes in the populations of peripheral leukocytes. These changes include general leukopenia, activation of B and T cells, and maturation of granulocytes. The manifestation of SLE in peripheral blood is central to the disease but is incompletely understood. A technique for rigorously characterizing changes in mixed populations of cells, microarray expression deconvolution, has been applied to several areas of biology but not to SLE or to blood. Here we demonstrate that microarray expression deconvolution accurately quantifies the constituents of real blood samples and mixtures of immune-derived cell lines. We characterize a broad spectrum of peripheral leukocyte cell types and states in SLE to uncover novel patterns including: specific activation of NK and T helper lymphocytes, relationships of these patterns to each other, and correlations to clinical variables and measures. The expansion and activation of monocytes, NK cells, and T helper cells in SLE at least partly underlie this disease's prominent interferon signature. These and other patterns of leukocyte dynamics uncovered here correlate with disease severity and treatment, suggest potential new treatments, and extend our understanding of lupus pathology as a complex autoimmune disease involving many arms of the immune system.
Collapse
Affiliation(s)
- Alexander R Abbas
- Department of Bioinformatics, Genentech Inc, South San Francisco, CA, USA.
| | | | | | | | | |
Collapse
|
72
|
Mizuno H, Nakanishi Y, Ishii N, Sarai A, Kitada K. A signature-based method for indexing cell cycle phase distribution from microarray profiles. BMC Genomics 2009; 10:137. [PMID: 19331659 PMCID: PMC2676301 DOI: 10.1186/1471-2164-10-137] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2008] [Accepted: 03/30/2009] [Indexed: 12/31/2022] Open
Abstract
Background The cell cycle machinery interprets oncogenic signals and reflects the biology of cancers. To date, various methods for cell cycle phase estimation such as mitotic index, S phase fraction, and immunohistochemistry have provided valuable information on cancers (e.g. proliferation rate). However, those methods rely on one or few measurements and the scope of the information is limited. There is a need for more systematic cell cycle analysis methods. Results We developed a signature-based method for indexing cell cycle phase distribution from microarray profiles under consideration of cycling and non-cycling cells. A cell cycle signature masterset, composed of genes which express preferentially in cycling cells and in a cell cycle-regulated manner, was created to index the proportion of cycling cells in the sample. Cell cycle signature subsets, composed of genes whose expressions peak at specific stages of the cell cycle, were also created to index the proportion of cells in the corresponding stages. The method was validated using cell cycle datasets and quiescence-induced cell datasets. Analyses of a mouse tumor model dataset and human breast cancer datasets revealed variations in the proportion of cycling cells. When the influence of non-cycling cells was taken into account, "buried" cell cycle phase distributions were depicted that were oncogenic-event specific in the mouse tumor model dataset and were associated with patients' prognosis in the human breast cancer datasets. Conclusion The signature-based cell cycle analysis method presented in this report, would potentially be of value for cancer characterization and diagnostics.
Collapse
Affiliation(s)
- Hideaki Mizuno
- Kamakura Research Laboratories, Chugai Pharmaceutical Co Ltd, Kamakura, Kanagawa, Japan.
| | | | | | | | | |
Collapse
|
73
|
Papanikolaou NA, Papavassiliou AG. Protein complex, gene, and regulatory modules in cancer heterogeneity. Mol Med 2008; 14:543-5. [PMID: 18654660 DOI: 10.2119/2008-00083.papanikolaou] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2008] [Accepted: 07/18/2008] [Indexed: 11/06/2022] Open
|
74
|
Buess M, Nuyten DSA, Hastie T, Nielsen T, Pesich R, Brown PO. Characterization of heterotypic interaction effects in vitro to deconvolute global gene expression profiles in cancer. Genome Biol 2008; 8:R191. [PMID: 17868458 PMCID: PMC2375029 DOI: 10.1186/gb-2007-8-9-r191] [Citation(s) in RCA: 66] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2007] [Revised: 06/14/2007] [Accepted: 09/14/2007] [Indexed: 01/10/2023] Open
Abstract
In an effort to deconvolute global gene-expression profiles, an interaction between some breast cancer cells and stromal fibroblasts was found to induce an interferon response, which may be associated with a greater propensity for tumor progression. Background Perturbations in cell-cell interactions are a key feature of cancer. However, little is known about the systematic effects of cell-cell interaction on global gene expression in cancer. Results We used an ex vivo model to simulate tumor-stroma interaction by systematically co-cultivating breast cancer cells with stromal fibroblasts and determined associated gene expression changes with cDNA microarrays. In the complex picture of epithelial-mesenchymal interaction effects, a prominent characteristic was an induction of interferon-response genes (IRGs) in a subset of cancer cells. In close proximity to these cancer cells, the fibroblasts secreted type I interferons, which, in turn, induced expression of the IRGs in the tumor cells. Paralleling this model, immunohistochemical analysis of human breast cancer tissues showed that STAT1, the key transcriptional activator of the IRGs, and itself an IRG, was expressed in a subset of the cancers, with a striking pattern of elevated expression in the cancer cells in close proximity to the stroma. In vivo, expression of the IRGs was remarkably coherent, providing a basis for segregation of 295 early-stage breast cancers into two groups. Tumors with high compared to low expression levels of IRGs were associated with significantly shorter overall survival; 59% versus 80% at 10 years (log-rank p = 0.001). Conclusion In an effort to deconvolute global gene expression profiles of breast cancer by systematic characterization of heterotypic interaction effects in vitro, we found that an interaction between some breast cancer cells and stromal fibroblasts can induce an interferon-response, and that this response may be associated with a greater propensity for tumor progression.
Collapse
Affiliation(s)
- Martin Buess
- Department of Biochemistry, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Dimitry SA Nuyten
- Departments of Radiation Oncology and Diagnostic Oncology, Netherlands Cancer Institute, 1066 CX Amsterdam, The Netherlands
| | - Trevor Hastie
- Department of Statistics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Torsten Nielsen
- Department of Pathology and Laboratory Medicine, University of British Columbia, Vancouver, BC, Canada, V5Z 1M9
| | - Robert Pesich
- Department of Biochemistry, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Patrick O Brown
- Department of Biochemistry, Stanford University School of Medicine, Stanford, CA 94305, USA
- Howard Hughes Medical Institute, Stanford University School of Medicine, Stanford, CA 94305, USA
| |
Collapse
|
75
|
Jacobsen M, Mattow J, Repsilber D, Kaufmann SH. Novel strategies to identify biomarkers in tuberculosis. Biol Chem 2008; 389:487-95. [DOI: 10.1515/bc.2008.053] [Citation(s) in RCA: 46] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Abstract
The more we learn about the immune response against tuberculosis (TB) and particularly about the features which distinguish protective immunity, disease susceptibility and pathology, the better we can define biomarkers which correlate with these different stages of infection. The most widely used biomarker in TB, which without a doubt is an important component of protective immunity, is IFNγ secreted by antigen-specific CD4 T-cells. However, the complexity of the immune response against TB makes it more than likely that additional biomarkers are required for a reliable correlate of protection. As a corollary, we assume that a set of biomarkers will be required, termed a biosignature.
Collapse
|
76
|
Gosink MM, Petrie HT, Tsinoremas NF. Electronically subtracting expression patterns from a mixed cell population. ACTA ACUST UNITED AC 2007; 23:3328-34. [PMID: 17956877 DOI: 10.1093/bioinformatics/btm508] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Abstract
MOTIVATION Biological samples frequently contain multiple cell-types that each can play a crucial role in the development and/or regulation of adjacent cells or tissues. The search for biomarkers, or expression patterns of, one cell-type in those samples can be a complex and time-consuming process. Ordinarily, extensive laboratory bench work must be performed to separate the mixed cell population into its subcomponents, such that each can be accurately characterized. RESULTS We have developed a methodology to electronically subtract gene expression in one or more components of a mixed cell population from a mixture, to reveal the expression patterns of other minor or difficult to isolate components. Examination of simulated data indicates that this procedure can reliably determine the expression patterns in cell-types that contribute as little as 5% of the total expression in a mixed cell population. We re-analyzed microarray expression data from the viral infection of macrophages and from the T-cells of wild type and Foxp3 deletion mice. Using our subtraction methodology, we were able to substantially improve the identification of genes involved in processes of subcomponent portions of these samples.
Collapse
Affiliation(s)
- Mark M Gosink
- Scientific Computing, Scripps Florida, 5353 Parkside Dr Jupiter, FL 33458, USA.
| | | | | |
Collapse
|
77
|
Li JZ, Meng F, Tsavaler L, Evans SJ, Choudary PV, Tomita H, Vawter MP, Walsh D, Shokoohi V, Chung T, Bunney WE, Jones EG, Akil H, Watson SJ, Myers RM. Sample matching by inferred agonal stress in gene expression analyses of the brain. BMC Genomics 2007; 8:336. [PMID: 17892578 PMCID: PMC2213675 DOI: 10.1186/1471-2164-8-336] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2007] [Accepted: 09/24/2007] [Indexed: 12/26/2022] Open
Abstract
Background Gene expression patterns in the brain are strongly influenced by the severity and duration of physiological stress at the time of death. This agonal effect, if not well controlled, can lead to spurious findings and diminished statistical power in case-control comparisons. While some recent studies match samples by tissue pH and clinically recorded agonal conditions, we found that these indicators were sometimes at odds with observed stress-related gene expression patterns, and that matching by these criteria still sometimes results in identifying case-control differences that are primarily driven by residual agonal effects. This problem is analogous to the one encountered in genetic association studies, where self-reported race and ethnicity are often imprecise proxies for an individual's actual genetic ancestry. Results We developed an Agonal Stress Rating (ASR) system that evaluates each sample's degree of stress based on gene expression data, and used ASRs in post hoc sample matching or covariate analysis. While gene expression patterns are generally correlated across different brain regions, we found strong region-region differences in empirical ASRs in many subjects that likely reflect inter-individual variabilities in local structure or function, resulting in region-specific vulnerability to agonal stress. Conclusion Variation of agonal stress across different brain regions differs between individuals, revealing a new level of complexity for gene expression studies of brain tissues. The Agonal Stress Ratings quantitatively assess each sample's extent of regulatory response to agonal stress, and allow a strong control of this important confounder.
Collapse
Affiliation(s)
- Jun Z Li
- Stanford Human Genome Center, Stanford University, Palo Alto, CA, USA
- Department of Genetics, Stanford University, Palo Alto, CA, USA
| | - Fan Meng
- Molecular & Behavioral Neuroscience Institute, University of Michigan, Ann Arbor, MI, USA
| | - Larisa Tsavaler
- Stanford Human Genome Center, Stanford University, Palo Alto, CA, USA
| | - Simon J Evans
- Molecular & Behavioral Neuroscience Institute, University of Michigan, Ann Arbor, MI, USA
| | | | - Hiroaki Tomita
- Department of Psychiatry & Human Behavior, University of California, Irvine, CA, USA
| | - Marquis P Vawter
- Department of Psychiatry & Human Behavior, University of California, Irvine, CA, USA
| | - David Walsh
- Department of Psychiatry & Human Behavior, University of California, Irvine, CA, USA
| | - Vida Shokoohi
- Stanford Human Genome Center, Stanford University, Palo Alto, CA, USA
| | - Tisha Chung
- Stanford Human Genome Center, Stanford University, Palo Alto, CA, USA
| | - William E Bunney
- Department of Psychiatry & Human Behavior, University of California, Irvine, CA, USA
| | - Edward G Jones
- Center for Neuroscience, University of California, Davis, CA, USA
| | - Huda Akil
- Molecular & Behavioral Neuroscience Institute, University of Michigan, Ann Arbor, MI, USA
| | - Stanley J Watson
- Molecular & Behavioral Neuroscience Institute, University of Michigan, Ann Arbor, MI, USA
| | - Richard M Myers
- Stanford Human Genome Center, Stanford University, Palo Alto, CA, USA
- Department of Genetics, Stanford University, Palo Alto, CA, USA
| |
Collapse
|
78
|
Abstract
DNA microarrays make it possible, for the first time, to record the complete genomic signals that guide the progression of cellular processes. Future discovery in biology and medicine will come from the mathematical modeling of these data, which hold the key to fundamental understanding of life on the molecular level, as well as answers to questions regarding diagnosis, treatment, and drug development. This chapter reviews the first data-driven models that were created from these genome-scale data, through adaptations and generalizations of mathematical frameworks from matrix algebra that have proven successful in describing the physical world, in such diverse areas as mechanics and perception: the singular value decomposition model, the generalized singular value decomposition model comparative model, and the pseudoinverse projection integrative model. These models provide mathematical descriptions of the genetic networks that generate and sense the measured data, where the mathematical variables and operations represent biological reality. The variables, patterns uncovered in the data, correlate with activities of cellular elements such as regulators or transcription factors that drive the measured signals and cellular states where these elements are active. The operations, such as data reconstruction, rotation, and classification in subspaces of selected patterns, simulate experimental observation of only the cellular programs that these patterns represent. These models are illustrated in the analyses of RNA expression data from yeast and human during their cell cycle programs and DNA-binding data from yeast cell cycle transcription factors and replication initiation proteins. Two alternative pictures of RNA expression oscillations during the cell cycle that emerge from these analyses, which parallel well-known designs of physical oscillators, convey the capacity of the models to elucidate the design principles of cellular systems, as well as guide the design of synthetic ones. In these analyses, the power of the models to predict previously unknown biological principles is demonstrated with a prediction of a novel mechanism of regulation that correlates DNA replication initiation with cell cycle-regulated RNA transcription in yeast. These models may become the foundation of a future in which biological systems are modeled as physical systems are today.
Collapse
Affiliation(s)
- Orly Alter
- Department of Biomedical Engineering, Institute for Cellular and Molecular Biology and Institute for Computational Engineering and Sciences, University of Texas at Austin, Austin, TX, USA
| |
Collapse
|
79
|
Hoffmann M, Pohlers D, Koczan D, Thiesen HJ, Wölfl S, Kinne RW. Robust computational reconstitution - a new method for the comparative analysis of gene expression in tissues and isolated cell fractions. BMC Bioinformatics 2006; 7:369. [PMID: 16889662 PMCID: PMC1574358 DOI: 10.1186/1471-2105-7-369] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2006] [Accepted: 08/04/2006] [Indexed: 11/29/2022] Open
Abstract
Background Biological tissues consist of various cell types that differentially contribute to physiological and pathophysiological processes. Determining and analyzing cell type-specific gene expression under diverse conditions is therefore a central aim of biomedical research. The present study compares gene expression profiles in whole tissues and isolated cell fractions purified from these tissues in patients with rheumatoid arthritis and osteoarthritis. Results The expression profiles of the whole tissues were compared to computationally reconstituted expression profiles that combine the expression profiles of the isolated cell fractions (macrophages, fibroblasts, and non-adherent cells) according to their relative mRNA proportions in the tissue. The mRNA proportions were determined by trimmed robust regression using only the most robustly-expressed genes (1/3 to 1/2 of all measured genes), i.e. those showing the most similar expression in tissue and isolated cell fractions. The relative mRNA proportions were determined using several different chip evaluation methods, among which the MAS 5.0 signal algorithm appeared to be most robust. The computed mRNA proportions agreed well with the cell proportions determined by immunohistochemistry except for a minor number of outliers. Genes that were either regulated (i.e. differentially-expressed in tissue and isolated cell fractions) or robustly-expressed in all patients were identified using different test statistics. Conclusion Robust Computational Reconstitution uses an intermediate number of robustly-expressed genes to estimate the relative mRNA proportions. This avoids both the exclusive dependence on the robust expression of individual, highly cell type-specific marker genes and the bias towards an equal distribution upon inclusion of all genes for computation.
Collapse
Affiliation(s)
- Martin Hoffmann
- Leibniz Institute for Natural Products Research and Infection Biology – Hans Knöll Institute, Beutenbergstr. 11a, Jena, Germany
| | - Dirk Pohlers
- Experimental Rheumatology Unit, Department of Orthopedics, Friedrich Schiller University Jena, Jena, Germany
| | - Dirk Koczan
- Institute of Immunology, University of Rostock, Rostock, Germany
| | | | - Stefan Wölfl
- Department of Pharmacy and Molecular Biotechnology, Ruprecht Karls University Heidelberg, Heidelberg, Germany
| | - Raimund W Kinne
- Experimental Rheumatology Unit, Department of Orthopedics, Friedrich Schiller University Jena, Jena, Germany
| |
Collapse
|
80
|
Wentzell PD, Karakach TK, Roy S, Martinez MJ, Allen CP, Werner-Washburne M. Multivariate curve resolution of time course microarray data. BMC Bioinformatics 2006; 7:343. [PMID: 16839419 PMCID: PMC1539028 DOI: 10.1186/1471-2105-7-343] [Citation(s) in RCA: 91] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2006] [Accepted: 07/13/2006] [Indexed: 11/17/2022] Open
Abstract
Background Modeling of gene expression data from time course experiments often involves the use of linear models such as those obtained from principal component analysis (PCA), independent component analysis (ICA), or other methods. Such methods do not generally yield factors with a clear biological interpretation. Moreover, implicit assumptions about the measurement errors often limit the application of these methods to log-transformed data, destroying linear structure in the untransformed expression data. Results In this work, a method for the linear decomposition of gene expression data by multivariate curve resolution (MCR) is introduced. The MCR method is based on an alternating least-squares (ALS) algorithm implemented with a weighted least squares approach. The new method, MCR-WALS, extracts a small number of basis functions from untransformed microarray data using only non-negativity constraints. Measurement error information can be incorporated into the modeling process and missing data can be imputed. The utility of the method is demonstrated through its application to yeast cell cycle data. Conclusion Profiles extracted by MCR-WALS exhibit a strong correlation with cell cycle-associated genes, but also suggest new insights into the regulation of those genes. The unique features of the MCR-WALS algorithm are its freedom from assumptions about the underlying linear model other than the non-negativity of gene expression, its ability to analyze non-log-transformed data, and its use of measurement error information to obtain a weighted model and accommodate missing measurements.
Collapse
Affiliation(s)
- Peter D Wentzell
- Department of Chemistry, Dalhousie University, Halifax, NS B3H 4J3, Canada
| | - Tobias K Karakach
- Department of Chemistry, Dalhousie University, Halifax, NS B3H 4J3, Canada
| | - Sushmita Roy
- Department of Biology, University of New Mexico, Albuquerque, NM 87131, USA
| | - M Juanita Martinez
- Department of Biology, University of New Mexico, Albuquerque, NM 87131, USA
| | | | | |
Collapse
|
81
|
Wang M, Master SR, Chodosh LA. Computational expression deconvolution in a complex mammalian organ. BMC Bioinformatics 2006; 7:328. [PMID: 16817968 PMCID: PMC1559723 DOI: 10.1186/1471-2105-7-328] [Citation(s) in RCA: 65] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2006] [Accepted: 07/03/2006] [Indexed: 11/28/2022] Open
Abstract
Background Microarray expression profiling has been widely used to identify differentially expressed genes in complex cellular systems. However, while such methods can be used to directly infer intracellular regulation within homogeneous cell populations, interpretation of in vivo gene expression data derived from complex organs composed of multiple cell types is more problematic. Specifically, observed changes in gene expression may be due either to changes in gene regulation within a given cell type or to changes in the relative abundance of expressing cell types. Consequently, bona fide changes in intrinsic gene regulation may be either mimicked or masked by changes in the relative proportion of different cell types. To date, few analytical approaches have addressed this problem. Results We have chosen to apply a computational method for deconvoluting gene expression profiles derived from intact tissues by using reference expression data for purified populations of the constituent cell types of the mammary gland. These data were used to estimate changes in the relative proportions of different cell types during murine mammary gland development and Ras-induced mammary tumorigenesis. These computational estimates of changing compartment sizes were then used to enrich lists of differentially expressed genes for transcripts that change as a function of intrinsic intracellular regulation rather than shifts in the relative abundance of expressing cell types. Using this approach, we have demonstrated that adjusting mammary gene expression profiles for changes in three principal compartments – epithelium, white adipose tissue, and brown adipose tissue – is sufficient both to reduce false-positive changes in gene expression due solely to changes in compartment sizes and to reduce false-negative changes by unmasking genuine alterations in gene expression that were otherwise obscured by changes in compartment sizes. Conclusion By adjusting gene expression values for changes in the sizes of cell type-specific compartments, this computational deconvolution method has the potential to increase both the sensitivity and specificity of differential gene expression experiments performed on complex tissues. Given the necessity for understanding complex biological processes such as development and carcinogenesis within the context of intact tissues, this approach offers substantial utility and should be broadly applicable to identifying gene expression changes in tissues composed of multiple cell types.
Collapse
Affiliation(s)
- Min Wang
- Departments of Cancer Biology, Medicine, and Cell & Developmental Biology, and the Abramson Family Cancer Research Institute, University of Pennsylvania, 612 BRB II/III, 421 Curie Blvd, Philadelphia, PA 19104, USA
| | - Stephen R Master
- Departments of Cancer Biology, Medicine, and Cell & Developmental Biology, and the Abramson Family Cancer Research Institute, University of Pennsylvania, 612 BRB II/III, 421 Curie Blvd, Philadelphia, PA 19104, USA
- Department of Pathology and Laboratory Medicine, University of Pennsylvania, 613A Stellar-Chance Labs, 422 Curie Blvd., Philadelphia, PA 19104, USA
| | - Lewis A Chodosh
- Departments of Cancer Biology, Medicine, and Cell & Developmental Biology, and the Abramson Family Cancer Research Institute, University of Pennsylvania, 612 BRB II/III, 421 Curie Blvd, Philadelphia, PA 19104, USA
| |
Collapse
|
82
|
Lu P, Rangan A, Chan SY, Appling DR, Hoffman DW, Marcotte EM. Global metabolic changes following loss of a feedback loop reveal dynamic steady states of the yeast metabolome. Metab Eng 2006; 9:8-20. [PMID: 17049899 DOI: 10.1016/j.ymben.2006.06.003] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2005] [Revised: 05/27/2006] [Accepted: 06/20/2006] [Indexed: 11/16/2022]
Abstract
Metabolic enzymes control cellular metabolite concentrations dynamically in response to changing environmental and intracellular conditions. Such real-time feedback regulation suggests the global metabolome may sample distinct dynamic steady states, forming "basins of stability" in the energy landscape of possible metabolite concentrations and enzymatic activities. Using metabolite, protein and transcriptional profiling, we characterize three dynamic steady states of the yeast metabolome that form by perturbing synthesis of the universal methyl donor S-adenosylmethionine (AdoMet). Conversion between these states is driven by replacement of serine with glycine+formate in the media, loss of feedback inhibition control by the metabolic enzyme Met13, or both. The latter causes hyperaccumulation of methionine and AdoMet, and dramatic global compensatory changes in the metabolome, including differences in amino acid and sugar metabolism, and possibly in the global nitrogen balance, ultimately leading to a G1/S phase cell cycle delay. Global metabolic changes are not necessarily accompanied by global transcriptional changes, and metabolite-controlled post-transcriptional regulation of metabolic enzymes is clearly evident.
Collapse
Affiliation(s)
- Peng Lu
- Center for Systems and Synthetic Biology, University of Texas, 1 University Station, Austin, TX 78712-0159, USA
| | | | | | | | | | | |
Collapse
|
83
|
Fannin RD, Auman JT, Bruno ME, Sieber SO, Ward SM, Tucker CJ, Merrick BA, Paules RS. Differential gene expression profiling in whole blood during acute systemic inflammation in lipopolysaccharide-treated rats. Physiol Genomics 2006; 21:92-104. [PMID: 15781589 DOI: 10.1152/physiolgenomics.00190.2004] [Citation(s) in RCA: 34] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023] Open
Abstract
Microarrays have been used to evaluate the expression of thousands of genes in various tissues. However, few studies have investigated the change in gene expression profiles in one of the most easily accessible tissues, whole blood. We utilized an acute inflammation model to investigate the possibility of using a cDNA microarray to measure the gene expression profile in the cells of whole blood. Blood was collected from male Sprague-Dawley rats at 2 and 6 h after treatment with 5 mg/kg (ip) LPS. Hematology showed marked neutrophilia accompanied by lymphopenia at both time points. TNF-alpha and IL-6 levels were markedly elevated at 2 h, indicating acute inflammation, but by 6 h the levels had declined. Total RNA was isolated from whole blood and hybridized to the National Institute of Environmental Health Sciences Rat Chip v.3.0. LPS treatment caused 226 and 180 genes to be differentially expressed at 2 and 6 h, respectively. Many of the differentially expressed genes are involved in inflammation and the acute phase response, but differential expression was also noted in genes involved in the cytoskeleton, cell adhesion, oxidative respiration, and transcription. Real-time RT-PCR confirmed the differential regulation of a representative subset of genes. Principal component analysis of gene expression discriminated between the acute inflammatory response apparent at 2 h and the observed recovery underway at 6 h. These studies indicate that, in whole blood, changes in gene expression profiles can be detected that are reflective of inflammation, despite the adaptive shifts in leukocyte populations that accompany such inflammatory processes.
Collapse
Affiliation(s)
- Rick D Fannin
- National Center for Toxicogenomics, National Institute of Environmental Health Sciences, National Institutes of Health, Research Triangle Park, North Carolina 27709, USA.
| | | | | | | | | | | | | | | |
Collapse
|
84
|
Bowers PM, O'Connor BD, Cokus SJ, Sprinzak E, Yeates TO, Eisenberg D. Utilizing logical relationships in genomic data to decipher cellular processes. FEBS J 2005; 272:5110-8. [PMID: 16218945 DOI: 10.1111/j.1742-4658.2005.04946.x] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]
Abstract
The wealth of available genomic data has spawned a corresponding interest in computational methods that can impart biological meaning and context to these experiments. Traditional computational methods have drawn relationships between pairs of proteins or genes based on notions of equality or similarity between their patterns of occurrence or behavior. For example, two genes displaying similar variation in expression, over a number of experiments, may be predicted to be functionally related. We have introduced a natural extension of these approaches, instead identifying logical relationships involving triplets of proteins. Triplets provide for various discrete kinds of logic relationships, leading to detailed inferences about biological associations. For instance, a protein C might be encoded within an organism if, and only if, two other proteins A and B are also both encoded within the organism, thus suggesting that gene C is functionally related to genes A and B. The method has been applied fruitfully to both phylogenetic and microarray expression data, and has been used to associate logical combinations of protein activity with disease state phenotypes, revealing previously unknown ternary relationships among proteins, and illustrating the inherent complexities that arise in biological data.
Collapse
Affiliation(s)
- Peter M Bowers
- Howard Hughes Medical Institute, University of California, Los Angeles, CA 90095, USA
| | | | | | | | | | | |
Collapse
|
85
|
Segal E, Friedman N, Kaminski N, Regev A, Koller D. From signatures to models: understanding cancer using microarrays. Nat Genet 2005; 37 Suppl:S38-45. [PMID: 15920529 DOI: 10.1038/ng1561] [Citation(s) in RCA: 283] [Impact Index Per Article: 14.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023]
Abstract
Genomics has the potential to revolutionize the diagnosis and management of cancer by offering an unprecedented comprehensive view of the molecular underpinnings of pathology. Computational analysis is essential to transform the masses of generated data into a mechanistic understanding of disease. Here we review current research aimed at uncovering the modular organization and function of transcriptional networks and responses in cancer. We first describe how methods that analyze biological processes in terms of higher-level modules can identify robust signatures of disease mechanisms. We then discuss methods that aim to identify the regulatory mechanisms underlying these modules and processes. Finally, we show how comparative analysis, combining human data with model organisms, can lead to more robust findings. We conclude by discussing the challenges of generalizing these methods from cells to tissues and the opportunities they offer to improve cancer diagnosis and management.
Collapse
Affiliation(s)
- Eran Segal
- Center for Studies in Physics and Biology, Rockefeller University, New York, USA
| | | | | | | | | |
Collapse
|
86
|
de Ridder D, van der Linden CE, Schonewille T, Dik WA, Reinders MJT, van Dongen JJM, Staal FJT. Purity for clarity: the need for purification of tumor cells in DNA microarray studies. Leukemia 2005; 19:618-27. [PMID: 15744349 DOI: 10.1038/sj.leu.2403685] [Citation(s) in RCA: 30] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
It is now well established that gene expression profiling using DNA microarrays can provide novel information about various types of hematological malignancies, which may lead to identification of novel diagnostic markers. However, to successfully use microarrays for this purpose, the quality and reproducibility of the procedure need to be guaranteed. The quality of microarray analyses may be severely reduced, if variable frequencies of nontarget cells are present in the starting material. To systematically investigate the influence of different types of impurity, we determined gene expression profiles of leukemic samples containing different percentages of nonleukemic leukocytes. Furthermore, we used computer simulations to study the effect of different kinds of impurity as an alternative to conducting hundreds of microarray experiments on samples with various levels of purity. As expected, the percentage of erroneously identified genes rose with the increase of contaminating nontarget cells in the samples. The simulations demonstrated that a tumor load of less than 75% can lead to up to 25% erroneously identified genes. A tumor load of at least 90% leads to identification of at most 5% false-positive genes. We therefore propose that in order to draw well-founded conclusions, the percentage of target cells in microarray experiment samples should be at least 90%.
Collapse
Affiliation(s)
- D de Ridder
- Department of Immunology, Erasmus MC, University Medical Center Rotterdam, Rotterdam, The Netherlands
| | | | | | | | | | | | | |
Collapse
|
87
|
Lähdesmäki H, Shmulevich L, Dunmire V, Yli-Harja O, Zhang W. In silico microdissection of microarray data from heterogeneous cell populations. BMC Bioinformatics 2005; 6:54. [PMID: 15766384 PMCID: PMC1274251 DOI: 10.1186/1471-2105-6-54] [Citation(s) in RCA: 66] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2004] [Accepted: 03/14/2005] [Indexed: 11/10/2022] Open
Abstract
Background Very few analytical approaches have been reported to resolve the variability in microarray measurements stemming from sample heterogeneity. For example, tissue samples used in cancer studies are usually contaminated with the surrounding or infiltrating cell types. This heterogeneity in the sample preparation hinders further statistical analysis, significantly so if different samples contain different proportions of these cell types. Thus, sample heterogeneity can result in the identification of differentially expressed genes that may be unrelated to the biological question being studied. Similarly, irrelevant gene combinations can be discovered in the case of gene expression based classification. Results We propose a computational framework for removing the effects of sample heterogeneity by "microdissecting" microarray data in silico. The computational method provides estimates of the expression values of the pure (non-heterogeneous) cell samples. The inversion of the sample heterogeneity can be facilitated by providing accurate estimates of the mixing percentages of different cell types in each measurement. For those cases where no such information is available, we develop an optimization-based method for joint estimation of the mixing percentages and the expression values of the pure cell samples. We also consider the problem of selecting the correct number of cell types. Conclusion The efficiency of the proposed methods is illustrated by applying them to a carefully controlled cDNA microarray data obtained from heterogeneous samples. The results demonstrate that the methods are capable of reconstructing both the sample and cell type specific expression values from heterogeneous mixtures and that the mixing percentages of different cell types can also be estimated. Furthermore, a general purpose model selection method can be used to select the correct number of cell types.
Collapse
Affiliation(s)
- Harri Lähdesmäki
- Institute of Signal Processing, Tampere University of Technology, P.O.Box 553, 33101 Tampere, Finland
| | - llya Shmulevich
- Cancer Genomics Laboratory, University of Texas M. D. Anderson Cancer Center, 1515 Holcombe Blvd., Box 85, Houston, TX 77030, USA
| | - Valerie Dunmire
- Cancer Genomics Laboratory, University of Texas M. D. Anderson Cancer Center, 1515 Holcombe Blvd., Box 85, Houston, TX 77030, USA
| | - Olli Yli-Harja
- Institute of Signal Processing, Tampere University of Technology, P.O.Box 553, 33101 Tampere, Finland
| | - Wei Zhang
- Cancer Genomics Laboratory, University of Texas M. D. Anderson Cancer Center, 1515 Holcombe Blvd., Box 85, Houston, TX 77030, USA
| |
Collapse
|
88
|
Han CB, Mao XY, Xin Y, Wang SC, Ma JM, Zhao YJ. Quantitative analysis of tumor mitochondrial RNA using microarray. World J Gastroenterol 2005; 11:36-40. [PMID: 15609393 PMCID: PMC4205380 DOI: 10.3748/wjg.v11.i1.36] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
AIM: To design a novel method to rapidly detect the quantitative alteration of mtRNA in patients with tumors.
METHODS: Oligo 6.22 and Primer Premier 5.0 bio-soft were used to design 15 pairs of primers of mtRNA cDNA probes in light of the functional and structural property of mtDNA, and then RT-PCR amplification was used to produce 15 probes of mtRNA from one normal gastric mucosal tissue. Total RNA extracted from 9 gastric cancers and corresponding normal gastric mucosal tissues was reverse transcribed into cDNA labeled with fluorescein. The spotted mtDNA microarrays were made and hybridized. Finally, the microarrays were scanned with a GeneTACTM laser scanner to get the hybridized results. Northern blot was used to confirm the microarray results.
RESULTS: The hybridized spots were distinct with clear and consistent backgrounds. After data was standardized according to the housekeeping genes, the results showed that the expression levels of some mitochondrial genes in gastric carcinoma were different from those in the corresponding non-cancerous regions.
CONCLUSION: The mtDNA expression microarray can rapidly, massively and exactly detect the quantity of mtRNA in tissues and cells. In addition, the whole expressive information of mtRNA from a tumor patient on just one slide can be obtained using this method, providing an effective method to investigate the relationship between mtDNA expression and tumorigenesis.
Collapse
Affiliation(s)
- Cheng-Bo Han
- Cancer Institute, First Affiliated Hospital, China Medical University, Shenyang 110001, Liaoning Province, China.
| | | | | | | | | | | |
Collapse
|
89
|
Alter O, Golub GH. Integrative analysis of genome-scale data by using pseudoinverse projection predicts novel correlation between DNA replication and RNA transcription. Proc Natl Acad Sci U S A 2004; 101:16577-82. [PMID: 15545604 PMCID: PMC534520 DOI: 10.1073/pnas.0406767101] [Citation(s) in RCA: 58] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
We describe an integrative data-driven mathematical framework that formulates any number of genome-scale molecular biological data sets in terms of one chosen set of data samples, or of profiles extracted mathematically from data samples, designated the "basis" set. By using pseudoinverse projection, the molecular biological profiles of the data samples are least-squares-approximated as superpositions of the basis profiles. Reconstruction of the data in the basis simulates experimental observation of only the cellular states manifest in the data that correspond to those of the basis. Classification of the data samples according to their reconstruction in the basis, rather than their overall measured profiles, maps the cellular states of the data onto those of the basis and gives a global picture of the correlations and possibly also causal coordination of these two sets of states. We illustrate this framework with an integration of yeast genome-scale proteins' DNA-binding data with cell cycle mRNA expression time course data. Novel correlation between DNA replication initiation and RNA transcription during the yeast cell cycle, which might be due to a previously unknown mechanism of regulation, is predicted.
Collapse
Affiliation(s)
- Orly Alter
- Department of Biomedical Engineering and Institute for Cellular and Molecular Biology, University of Texas, Austin, TX 78712, USA.
| | | |
Collapse
|
90
|
Abstract
In many of the model organisms used to study development, it is becoming relatively routine to carry out global analyses of gene function. These analyses take many forms, from microarray analyses to the construction of physical interaction maps to the systematic analyses of loss-of-function phenotypes. Such large-scale datasets can be integrated to generate complex gene networks, and we explore how these gene networks can contribute to an understanding of developmental pathways. In particular, we examine how combining large-scale expression experiments and gene networks may move us towards a molecular description of the events of development, embodied in a succession of stage-specific subnetworks sampled from an organism's overall gene network.
Collapse
Affiliation(s)
- Andrew G Fraser
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK.
| | | |
Collapse
|
91
|
Prince JT, Carlson MW, Wang R, Lu P, Marcotte EM. The need for a public proteomics repository. Nat Biotechnol 2004; 22:471-2. [PMID: 15085804 DOI: 10.1038/nbt0404-471] [Citation(s) in RCA: 128] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Affiliation(s)
- John T Prince
- Center for Systems and Synthetic Biology & Institute for Cellular & Molecular Biology, University of Texas at Austin, Austin, Texas 78712, USA
| | | | | | | | | |
Collapse
|
92
|
Stuart RO, Wachsman W, Berry CC, Wang-Rodriguez J, Wasserman L, Klacansky I, Masys D, Arden K, Goodison S, McClelland M, Wang Y, Sawyers A, Kalcheva I, Tarin D, Mercola D. In silico dissection of cell-type-associated patterns of gene expression in prostate cancer. Proc Natl Acad Sci U S A 2004; 101:615-20. [PMID: 14722351 PMCID: PMC327196 DOI: 10.1073/pnas.2536479100] [Citation(s) in RCA: 158] [Impact Index Per Article: 7.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Prostate tumors are complex entities composed of malignant cells mixed and interacting with nonmalignant cells. However, molecular analyses by standard gene expression profiling are limited because spatial information and nontumor cell types are lost in sample preparation. We scored 88 prostate specimens for relative content of tumor, benign hyperplastic epithelium, stroma, and dilated cystic glands. The proportions of these cell types were then linked in silico to gene expression levels determined by microarray analysis, revealing unique cell-specific profiles. Gene expression differences for malignant and nonmalignant epithelial cells (tumor versus benign hyperplastic epithelium) could be identified without being confounded by contributions from stroma that dominate many samples or sacrificing possible paracrine influences. Cell-specific expression of selected genes was validated by immunohistochemistry and quantitative PCR. The results provide patterns of gene expression for these three lineages with relevance to pathogenetic, diagnostic, and therapeutic considerations.
Collapse
Affiliation(s)
- Robert O Stuart
- Veterans Affairs San Diego Healthcare System, and Department of Medicine and John and Rebecca Moores UCSD Cancer Center, University of California at San Diego, La Jolla, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
93
|
Kawahara N, Wang Y, Mukasa A, Furuya K, Shimizu T, Hamakubo T, Aburatani H, Kodama T, Kirino T. Genome-wide gene expression analysis for induced ischemic tolerance and delayed neuronal death following transient global ischemia in rats. J Cereb Blood Flow Metab 2004; 24:212-23. [PMID: 14747748 DOI: 10.1097/01.wcb.0000106012.33322.a2] [Citation(s) in RCA: 88] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
Genome-wide gene expression analysis of the hippocampal CA1 region was conducted in a rat global ischemia model for delayed neuronal death and induced ischemic tolerance using an oligonucleotide-based DNA microarray containing 8,799 probes. The results showed that expression levels of 246 transcripts were increased and 213 were decreased following ischemia, corresponding to 5.1% of the represented probe sets. These changes were divided into seven expression clusters using hierarchical cluster analysis, each with distinct conditions and time-specific patterns. Ischemic tolerance was associated with transient up-regulation of transcription factors (c-Fos, JunB Egr-1, -2, -4, NGFI-B), Hsp70 and MAP kinase cascade-related genes (MKP-1), which are implicated cell survival. Delayed neuronal death exhibited complex long-lasting changes of expression, such as up-regulation of proapoptotic genes (GADD153, Smad2, Dral, Caspase-2 and -3) and down-regulation of genes implicated in survival signaling (MKK2, and PI4 kinase, DAG/PKC signaling pathways), suggesting an imbalance between death and survival signals. Our study provides a differential gene expression profile between delayed neuronal death and induced ischemic tolerance in a genome-wide analysis, and contributes to further understanding of the complex molecular pathophysiology in cerebral ischemia.
Collapse
Affiliation(s)
- Nobutaka Kawahara
- Department of Neurosurgery, Faculty of Medicine, University of Tokyo, Japan.
| | | | | | | | | | | | | | | | | |
Collapse
|