1
|
Hu P, Wang X, Haitsma JJ, Furmli S, Masoom H, Liu M, Imai Y, Slutsky AS, Beyene J, Greenwood CMT, dos Santos C. Microarray meta-analysis identifies acute lung injury biomarkers in donor lungs that predict development of primary graft failure in recipients. PLoS One 2012; 7:e45506. [PMID: 23071521 PMCID: PMC3470558 DOI: 10.1371/journal.pone.0045506] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2011] [Accepted: 08/23/2012] [Indexed: 11/19/2022] Open
Abstract
Objectives To perform a meta-analysis of gene expression microarray data from animal studies of lung injury, and to identify an injury-specific gene expression signature capable of predicting the development of lung injury in humans. Methods We performed a microarray meta-analysis using 77 microarray chips across six platforms, two species and different animal lung injury models exposed to lung injury with or/and without mechanical ventilation. Individual gene chips were classified and grouped based on the strategy used to induce lung injury. Effect size (change in gene expression) was calculated between non-injurious and injurious conditions comparing two main strategies to pool chips: (1) one-hit and (2) two-hit lung injury models. A random effects model was used to integrate individual effect sizes calculated from each experiment. Classification models were built using the gene expression signatures generated by the meta-analysis to predict the development of lung injury in human lung transplant recipients. Results Two injury-specific lists of differentially expressed genes generated from our meta-analysis of lung injury models were validated using external data sets and prospective data from animal models of ventilator-induced lung injury (VILI). Pathway analysis of gene sets revealed that both new and previously implicated VILI-related pathways are enriched with differentially regulated genes. Classification model based on gene expression signatures identified in animal models of lung injury predicted development of primary graft failure (PGF) in lung transplant recipients with larger than 80% accuracy based upon injury profiles from transplant donors. We also found that better classifier performance can be achieved by using meta-analysis to identify differentially-expressed genes than using single study-based differential analysis. Conclusion Taken together, our data suggests that microarray analysis of gene expression data allows for the detection of “injury" gene predictors that can classify lung injury samples and identify patients at risk for clinically relevant lung injury complications.
Collapse
Affiliation(s)
- Pingzhao Hu
- The Centre for Applied Genomics, Hospital for Sick Children, Toronto, Ontario, Canada
| | - Xinchen Wang
- Keenan Research Center at the Li Ka Shing Knowledge Institute of St. Michael's Hospital, Interdepartmental Division of Critical Care, University of Toronto, Toronto, Ontario, Canada
| | - Jack J. Haitsma
- Keenan Research Center at the Li Ka Shing Knowledge Institute of St. Michael's Hospital, Interdepartmental Division of Critical Care, University of Toronto, Toronto, Ontario, Canada
| | - Suleiman Furmli
- Keenan Research Center at the Li Ka Shing Knowledge Institute of St. Michael's Hospital, Interdepartmental Division of Critical Care, University of Toronto, Toronto, Ontario, Canada
| | - Hussain Masoom
- Keenan Research Center at the Li Ka Shing Knowledge Institute of St. Michael's Hospital, Interdepartmental Division of Critical Care, University of Toronto, Toronto, Ontario, Canada
| | - Mingyao Liu
- Thoracic Surgery Research Laboratory, Toronto General Research Institute, University Health Network, Toronto, Ontario, Canada
| | - Yumiko Imai
- Biological Informatics and Experimental Therapeutics Akita University Graduate School of Medicine, Akita City, Akita, Japan
| | - Arthur S. Slutsky
- Keenan Research Center at the Li Ka Shing Knowledge Institute of St. Michael's Hospital, Interdepartmental Division of Critical Care, University of Toronto, Toronto, Ontario, Canada
| | - Joseph Beyene
- Program in Population Genomics, Department of Clinical Epidemiology and Biostatistics, McMaster University, Hamilton, Ontario, Canada
| | - Celia M. T. Greenwood
- Centre for Clinical Epidemiology, Lady Davis Institute and Department of Epidemiology, Biostatistics and Occupational Health, McGill University, Montreal, Quebec, Canada
| | - Claudia dos Santos
- Keenan Research Center at the Li Ka Shing Knowledge Institute of St. Michael's Hospital, Interdepartmental Division of Critical Care, University of Toronto, Toronto, Ontario, Canada
- * E-mail:
| |
Collapse
|
2
|
Tian S, Krueger JG, Li K, Jabbari A, Brodmerkel C, Lowes MA, Suárez-Fariñas M. Meta-analysis derived (MAD) transcriptome of psoriasis defines the "core" pathogenesis of disease. PLoS One 2012; 7:e44274. [PMID: 22957057 PMCID: PMC3434204 DOI: 10.1371/journal.pone.0044274] [Citation(s) in RCA: 131] [Impact Index Per Article: 10.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2012] [Accepted: 07/31/2012] [Indexed: 12/14/2022] Open
Abstract
The cause of psoriasis, a common chronic inflammatory skin disease, is not fully understood. Microarray experiments have been widely used in recent years to identify genes associated with psoriasis pathology, by comparing expression levels of lesional (LS) with adjacent non-lesional (NL) skin. It is commonly observed that the differentially expressed genes (DEGs) differ greatly across experiments, due to variations introduced in the microarray experiment pipeline. Therefore, a statistically based meta-analytic approach, which combines the results of individual studies, is warranted. In this study, a meta-analysis was conducted on 5 microarray data sets, including 193 LS and NL pairs. We termed this the Meta-Analysis Derived (MAD) transcriptome. In “MAD-5” transcriptome, 677 genes were up-regulated and 443 were down-regulated in LS skin compared to NL skin. This represents a much larger set than the intersection of DEGs of these 5 studies, which consisted of 100 DEGs. We also analyzed 3 of the studies conducted on the Affymetrix hgu133plus2 chips and found a greater number of DEGs (1084 up- and 748 down-regulated). Top canonical pathways over-represented in the MAD transcriptome include Atherosclerosis Signaling and Fatty Acid Metabolism, while several “new” genes identified are involved in Cardiovascular Development and Lipid Metabolism. These findings highlight the relationship between psoriasis and systemic manifestations such as the metabolic syndrome and cardiovascular disease. Then, the Meta Threshold Gradient Descent Regularization (MTGDR) algorithm was used to select potential markers distinguishing LS and NL skin. The resulting set (20 genes) contained many genes that were part of the residual disease genomic profile (RDGP) or “molecular scar” after successful treatment, and also genes subject to differential methylation in LS tissues. To conclude, this MAD transcriptome yielded a reference list of reliable psoriasis DEGs, and represents a robust pool of candidates for further discovery of pathogenesis and treatment evaluation.
Collapse
Affiliation(s)
- Suyan Tian
- Center for Clinical and Translational Science, The Rockefeller University, New York, New York, United States of America
| | - James G. Krueger
- Laboratory for Investigative Dermatology, The Rockefeller University, New York, New York, United States of America
| | - Katherine Li
- Immunology & Biomarkers, Janssen Research & Development, Radnor, Pennsylvania, United States of America
| | - Ali Jabbari
- Laboratory for Investigative Dermatology, The Rockefeller University, New York, New York, United States of America
| | - Carrie Brodmerkel
- Immunology & Biomarkers, Janssen Research & Development, Radnor, Pennsylvania, United States of America
| | - Michelle A. Lowes
- Laboratory for Investigative Dermatology, The Rockefeller University, New York, New York, United States of America
| | - Mayte Suárez-Fariñas
- Center for Clinical and Translational Science, The Rockefeller University, New York, New York, United States of America
- Laboratory for Investigative Dermatology, The Rockefeller University, New York, New York, United States of America
- * E-mail:
| |
Collapse
|
3
|
Ma S, Dai Y, Huang J, Xie Y. Identification of Breast Cancer Prognosis Markers via Integrative Analysis. Comput Stat Data Anal 2012; 56:2718-2728. [PMID: 22773869 DOI: 10.1016/j.csda.2012.02.017] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
In breast cancer research, it is of great interest to identify genomic markers associated with prognosis. Multiple gene profiling studies have been conducted for such a purpose. Genomic markers identified from the analysis of single datasets often do not have satisfactory reproducibility. Among the multiple possible reasons, the most important one is the small sample sizes of individual studies. A cost-effective solution is to pool data from multiple comparable studies and conduct integrative analysis. In this study, we collect four breast cancer prognosis studies with gene expression measurements. We describe the relationship between prognosis and gene expressions using the accelerated failure time (AFT) models. We adopt a 2-norm group bridge penalization approach for marker identification. This integrative analysis approach can effectively identify markers with consistent effects across multiple datasets and naturally accommodate the heterogeneity among studies. Statistical and simulation studies demonstrate satisfactory performance of this approach. Breast cancer prognosis markers identified using this approach have sound biological implications and satisfactory prediction performance.
Collapse
|
4
|
Song R, Huang J, Ma S. Integrative prescreening in analysis of multiple cancer genomic studies. BMC Bioinformatics 2012; 13:168. [PMID: 22799431 PMCID: PMC3436748 DOI: 10.1186/1471-2105-13-168] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2011] [Accepted: 05/18/2012] [Indexed: 12/03/2022] Open
Abstract
BACKGROUND In high throughput cancer genomic studies, results from the analysis of single datasets often suffer from a lack of reproducibility because of small sample sizes. Integrative analysis can effectively pool and analyze multiple datasets and provides a cost effective way to improve reproducibility. In integrative analysis, simultaneously analyzing all genes profiled may incur high computational cost. A computationally affordable remedy is prescreening, which fits marginal models, can be conducted in a parallel manner, and has low computational cost. RESULTS An integrative prescreening approach is developed for the analysis of multiple cancer genomic datasets. Simulation shows that the proposed integrative prescreening has better performance than alternatives, particularly including prescreening with individual datasets, an intensity approach and meta-analysis. We also analyze multiple microarray gene profiling studies on liver and pancreatic cancers using the proposed approach. CONCLUSIONS The proposed integrative prescreening provides an effective way to reduce the dimensionality in cancer genomic studies. It can be coupled with existing analysis methods to identify cancer markers.
Collapse
Affiliation(s)
- Rui Song
- Department of Statistics, Colorado State University, Fort Collins, USA
| | - Jian Huang
- Department of Statistics and Actuarial Science, University of Iowa, Iowa City, USA
| | - Shuangge Ma
- School of Public Health, Yale University, New Haven, USA
| |
Collapse
|
5
|
Tsoi LC, Qin T, Slate EH, Zheng WJ. Consistent Differential Expression Pattern (CDEP) on microarray to identify genes related to metastatic behavior. BMC Bioinformatics 2011; 12:438. [PMID: 22078224 PMCID: PMC3251006 DOI: 10.1186/1471-2105-12-438] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2011] [Accepted: 11/11/2011] [Indexed: 01/03/2023] Open
Abstract
Background To utilize the large volume of gene expression information generated from different microarray experiments, several meta-analysis techniques have been developed. Despite these efforts, there remain significant challenges to effectively increasing the statistical power and decreasing the Type I error rate while pooling the heterogeneous datasets from public resources. The objective of this study is to develop a novel meta-analysis approach, Consistent Differential Expression Pattern (CDEP), to identify genes with common differential expression patterns across different datasets. Results We combined False Discovery Rate (FDR) estimation and the non-parametric RankProd approach to estimate the Type I error rate in each microarray dataset of the meta-analysis. These Type I error rates from all datasets were then used to identify genes with common differential expression patterns. Our simulation study showed that CDEP achieved higher statistical power and maintained low Type I error rate when compared with two recently proposed meta-analysis approaches. We applied CDEP to analyze microarray data from different laboratories that compared transcription profiles between metastatic and primary cancer of different types. Many genes identified as differentially expressed consistently across different cancer types are in pathways related to metastatic behavior, such as ECM-receptor interaction, focal adhesion, and blood vessel development. We also identified novel genes such as AMIGO2, Gem, and CXCL11 that have not been shown to associate with, but may play roles in, metastasis. Conclusions CDEP is a flexible approach that borrows information from each dataset in a meta-analysis in order to identify genes being differentially expressed consistently. We have shown that CDEP can gain higher statistical power than other existing approaches under a variety of settings considered in the simulation study, suggesting its robustness and insensitivity to data variation commonly associated with microarray experiments. Availability: CDEP is implemented in R and freely available at: http://genomebioinfo.musc.edu/CDEP/ Contact: zhengw@musc.edu
Collapse
Affiliation(s)
- Lam C Tsoi
- Department of Biochemistry and Molecular Biology, Medical University of South Carolina, 135 Cannon St, Charleston, SC 29425, USA
| | | | | | | |
Collapse
|
6
|
Huang Y, Huang J, Shia BC, Ma S. Identification of cancer genomic markers via integrative sparse boosting. Biostatistics 2011; 13:509-22. [PMID: 22045909 DOI: 10.1093/biostatistics/kxr033] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
In high-throughput cancer genomic studies, markers identified from the analysis of single data sets often suffer a lack of reproducibility because of the small sample sizes. An ideal solution is to conduct large-scale prospective studies, which are extremely expensive and time consuming. A cost-effective remedy is to pool data from multiple comparable studies and conduct integrative analysis. Integrative analysis of multiple data sets is challenging because of the high dimensionality of genomic measurements and heterogeneity among studies. In this article, we propose a sparse boosting approach for marker identification in integrative analysis of multiple heterogeneous cancer diagnosis studies with gene expression measurements. The proposed approach can effectively accommodate the heterogeneity among multiple studies and identify markers with consistent effects across studies. Simulation shows that the proposed approach has satisfactory identification results and outperforms alternatives including an intensity approach and meta-analysis. The proposed approach is used to identify markers of pancreatic cancer and liver cancer.
Collapse
Affiliation(s)
- Yuan Huang
- Department of Statistics, Penn State University, 301 Thomas Building, State College, PA 16801, USA
| | | | | | | |
Collapse
|
7
|
Ma S, Huang J, Wei F, Xie Y, Fang K. Integrative analysis of multiple cancer prognosis studies with gene expression measurements. Stat Med 2011; 30:3361-71. [PMID: 22105693 DOI: 10.1002/sim.4337] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2010] [Accepted: 06/07/2011] [Indexed: 11/11/2022]
Abstract
Although in cancer research microarray gene profiling studies have been successful in identifying genetic variants predisposing to the development and progression of cancer, the identified markers from analysis of single datasets often suffer low reproducibility. Among multiple possible causes, the most important one is the small sample size hence the lack of power of single studies. Integrative analysis jointly considers multiple heterogeneous studies, has a significantly larger sample size, and can improve reproducibility. In this article, we focus on cancer prognosis studies, where the response variables are progression-free, overall, or other types of survival. A group minimax concave penalty (GMCP) penalized integrative analysis approach is proposed for analyzing multiple heterogeneous cancer prognosis studies with microarray gene expression measurements. An efficient group coordinate descent algorithm is developed. The GMCP can automatically accommodate the heterogeneity across multiple datasets, and the identified markers have consistent effects across multiple studies. Simulation studies show that the GMCP provides significantly improved selection results as compared with the existing meta-analysis approaches, intensity approaches, and group Lasso penalized integrative analysis. We apply the GMCP to four microarray studies and identify genes associated with the prognosis of breast cancer.
Collapse
Affiliation(s)
- Shuangge Ma
- School of Public Health, Yale University, New Haven, CT, USA.
| | | | | | | | | |
Collapse
|
8
|
Ma S, Huang J, Song X. Integrative analysis and variable selection with multiple high-dimensional data sets. Biostatistics 2011; 12:763-75. [PMID: 21415015 DOI: 10.1093/biostatistics/kxr004] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
In high-throughput -omics studies, markers identified from analysis of single data sets often suffer from a lack of reproducibility because of sample limitation. A cost-effective remedy is to pool data from multiple comparable studies and conduct integrative analysis. Integrative analysis of multiple -omics data sets is challenging because of the high dimensionality of data and heterogeneity among studies. In this article, for marker selection in integrative analysis of data from multiple heterogeneous studies, we propose a 2-norm group bridge penalization approach. This approach can effectively identify markers with consistent effects across multiple studies and accommodate the heterogeneity among studies. We propose an efficient computational algorithm and establish the asymptotic consistency property. Simulations and applications in cancer profiling studies show satisfactory performance of the proposed approach.
Collapse
Affiliation(s)
- Shuangge Ma
- School of Public Health, Yale University, 60 College Street, New Haven, CT 06520, USA.
| | | | | |
Collapse
|
9
|
Campain A, Yang YH. Comparison study of microarray meta-analysis methods. BMC Bioinformatics 2010; 11:408. [PMID: 20678237 PMCID: PMC2922198 DOI: 10.1186/1471-2105-11-408] [Citation(s) in RCA: 71] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2010] [Accepted: 08/03/2010] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Meta-analysis methods exist for combining multiple microarray datasets. However, there are a wide range of issues associated with microarray meta-analysis and a limited ability to compare the performance of different meta-analysis methods. RESULTS We compare eight meta-analysis methods, five existing methods, two naive methods and a novel approach (mDEDS). Comparisons are performed using simulated data and two biological case studies with varying degrees of meta-analysis complexity. The performance of meta-analysis methods is assessed via ROC curves and prediction accuracy where applicable. CONCLUSIONS Existing meta-analysis methods vary in their ability to perform successful meta-analysis. This success is very dependent on the complexity of the data and type of analysis. Our proposed method, mDEDS, performs competitively as a meta-analysis tool even as complexity increases. Because of the varying abilities of compared meta-analysis methods, care should be taken when considering the meta-analysis method used for particular research.
Collapse
Affiliation(s)
- Anna Campain
- School of Mathematics and Statistics, Center of Mathematical Biology, University of Sydney, F07 Sydney, NSW 2006, Australia.
| | | |
Collapse
|
10
|
Dreyfuss JM, Johnson MD, Park PJ. Meta-analysis of glioblastoma multiforme versus anaplastic astrocytoma identifies robust gene markers. Mol Cancer 2009; 8:71. [PMID: 19732454 PMCID: PMC2743637 DOI: 10.1186/1476-4598-8-71] [Citation(s) in RCA: 48] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2009] [Accepted: 09/04/2009] [Indexed: 01/14/2023] Open
Abstract
Background Anaplastic astrocytoma (AA) and its more aggressive counterpart, glioblastoma multiforme (GBM), are the most common intrinsic brain tumors in adults and are almost universally fatal. A deeper understanding of the molecular relationship of these tumor types is necessary to derive insights into the diagnosis, prognosis, and treatment of gliomas. Although genomewide profiling of expression levels with microarrays can be used to identify differentially expressed genes between these tumor types, comparative studies so far have resulted in gene lists that show little overlap. Results To achieve a more accurate and stable list of the differentially expressed genes and pathways between primary GBM and AA, we performed a meta-analysis using publicly available genome-scale mRNA data sets. There were four data sets with sufficiently large sample sizes of both GBMs and AAs, all of which coincidentally used human U133 platforms from Affymetrix, allowing for easier and more precise integration of data. After scoring genes and pathways within each data set, we combined the statistics across studies using the nonparametric rank sum method to identify the features that differentiate GBMs and AAs. We found >900 statistically significant probe sets after correction for multiple testing from the >22,000 tested. We also used the rank sum approach to select >20 significant Biocarta pathways after correction for multiple testing out of >175 pathways examined. The most significant pathway was the hypoxia-inducible factor (HIF) pathway. Our analysis suggests that many of the most statistically significant genes work together in a HIF1A/VEGF-regulated network to increase angiogenesis and invasion in GBM when compared to AA. Conclusion We have performed a meta-analysis of genome-scale mRNA expression data for 289 human malignant gliomas and have identified a list of >900 probe sets and >20 pathways that are significantly different between GBM and AA. These feature lists could be utilized to aid in diagnosis, prognosis, and grade reduction of high-grade gliomas and to identify genes that were not previously suspected of playing an important role in glioma biology. More generally, this approach suggests that combined analysis of existing data sets can reveal new insights and that the large amount of publicly available cancer data sets should be further utilized in a similar manner.
Collapse
Affiliation(s)
- Jonathan M Dreyfuss
- Partners HealthCare Center for Personalized Genetic Medicine, Brigham and Women's Hospital, Boston, MA 02115, USA.
| | | | | |
Collapse
|
11
|
Ma S, Huang J. Regularized gene selection in cancer microarray meta-analysis. BMC Bioinformatics 2009; 10:1. [PMID: 19118496 PMCID: PMC2631520 DOI: 10.1186/1471-2105-10-1] [Citation(s) in RCA: 140] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2008] [Accepted: 01/01/2009] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND In cancer studies, it is common that multiple microarray experiments are conducted to measure the same clinical outcome and expressions of the same set of genes. An important goal of such experiments is to identify a subset of genes that can potentially serve as predictive markers for cancer development and progression. Analyses of individual experiments may lead to unreliable gene selection results because of the small sample sizes. Meta analysis can be used to pool multiple experiments, increase statistical power, and achieve more reliable gene selection. The meta analysis of cancer microarray data is challenging because of the high dimensionality of gene expressions and the differences in experimental settings amongst different experiments. RESULTS We propose a Meta Threshold Gradient Descent Regularization (MTGDR) approach for gene selection in the meta analysis of cancer microarray data. The MTGDR has many advantages over existing approaches. It allows different experiments to have different experimental settings. It can account for the joint effects of multiple genes on cancer, and it can select the same set of cancer-associated genes across multiple experiments. Simulation studies and analyses of multiple pancreatic and liver cancer experiments demonstrate the superior performance of the MTGDR. CONCLUSION The MTGDR provides an effective way of analyzing multiple cancer microarray studies and selecting reliable cancer-associated genes.
Collapse
Affiliation(s)
- Shuangge Ma
- Department of Epidemiology and Public Health, Yale University, New Haven, CT 06520, USA.
| | | |
Collapse
|
12
|
Novak JP, Kim SY, Xu J, Modlich O, Volsky DJ, Honys D, Slonczewski JL, Bell DA, Blattner FR, Blumwald E, Boerma M, Cosio M, Gatalica Z, Hajduch M, Hidalgo J, McInnes RR, Miller III MC, Penkowa M, Rolph MS, Sottosanto J, St-Arnaud R, Szego MJ, Twell D, Wang C. Generalization of DNA microarray dispersion properties: microarray equivalent of t-distribution. Biol Direct 2006; 1:27. [PMID: 16959036 PMCID: PMC1586001 DOI: 10.1186/1745-6150-1-27] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2006] [Accepted: 09/07/2006] [Indexed: 01/12/2023] Open
Abstract
Background DNA microarrays are a powerful technology that can provide a wealth of gene expression data for disease studies, drug development, and a wide scope of other investigations. Because of the large volume and inherent variability of DNA microarray data, many new statistical methods have been developed for evaluating the significance of the observed differences in gene expression. However, until now little attention has been given to the characterization of dispersion of DNA microarray data. Results Here we examine the expression data obtained from 682 Affymetrix GeneChips® with 22 different types and we demonstrate that the Gaussian (normal) frequency distribution is characteristic for the variability of gene expression values. However, typically 5 to 15% of the samples deviate from normality. Furthermore, it is shown that the frequency distributions of the difference of expression in subsets of ordered, consecutive pairs of genes (consecutive samples) in pair-wise comparisons of replicate experiments are also normal. We describe a consecutive sampling method, which is employed to calculate the characteristic function approximating standard deviation and show that the standard deviation derived from the consecutive samples is equivalent to the standard deviation obtained from individual genes. Finally, we determine the boundaries of probability intervals and demonstrate that the coefficients defining the intervals are independent of sample characteristics, variability of data, laboratory conditions and type of chips. These coefficients are very closely correlated with Student's t-distribution. Conclusion In this study we ascertained that the non-systematic variations possess Gaussian distribution, determined the probability intervals and demonstrated that the Kα coefficients defining these intervals are invariant; these coefficients offer a convenient universal measure of dispersion of data. The fact that the Kα distributions are so close to t-distribution and independent of conditions and type of arrays suggests that the quantitative data provided by Affymetrix technology give "true" representation of physical processes, involved in measurement of RNA abundance. Reviewers This article was reviewed by Yoav Gilad (nominated by Doron Lancet), Sach Mukherjee (nominated by Sandrine Dudoit) and Amir Niknejad and Shmuel Friedland (nominated by Neil Smalheiser).
Collapse
Affiliation(s)
- Jaroslav P Novak
- McGill University and Genome Québec Innovation Centre, 740 Docteur Penfield Avenue, Montreal, Québec, H3A 1A4, Canada
| | - Seon-Young Kim
- Human Genomics Laboratory, Genome Research Center, 52 Eoeun-dong, Yuseong-gu, Daejon, 305-333, Korea
| | - Jun Xu
- Transcriptional Genomics Core, Cedars-Sinai Medical Center, Los Angeles, CA 90048, USA
| | - Olga Modlich
- Institut fur Onkologische Chemie, Heinrich Heine Universitat Dusseldorf, Moorenstr. 5, D-40225 Dusseldorf, Germany
| | - David J Volsky
- St. Luke's-Roosevelt Hospital Center and Columbia University, Molecular Virology Division, 432 West 58th Street, Antenucci Building, Room 709, New York, NY 10019, USA
| | - David Honys
- Institute of Experimental Botany AS CR, Rozvojová 135, CZ-165 02, Praha 6, Czech Republic and Charles University in Prague, Department of Plant Physiology, Viničná 5, 12844, Praha 2, Czech Republic
| | - Joan L Slonczewski
- Department of Biology, Higley Hall, 202 N. College Dr., Kenyon College, Gambier, OH 43022, USA
| | - Douglas A Bell
- Environmental Genomics Section, C3-03, PO Box 12233, National Institute of Environmental Health Sciences, Research Triangle Park, NC 27709, USA
| | - Fred R Blattner
- Department of Genetics, 425 Henry Mall, University of Wisconsin, Madison, WI 53706, USA
| | - Eduardo Blumwald
- Department of Plant Sciences, University of California, One Shields Ave, Davis, CA 95616, USA
| | - Marjan Boerma
- Department of Pharmaceutical Sciences, University of Arkansas for Medical Sciences, 4301 West Markham, Slot 522-3, Little Rock AR 72205, USA
| | - Manuel Cosio
- Respiratory Division, Department of Medicine, McGill University, Montreal, Quebec, Canada
| | - Zoran Gatalica
- Department of Pathology, Creighton University School of Medicine, 601 North 30th Street, Omaha, NE, 68131-2197, USA
| | - Marian Hajduch
- Laboratory of Experimental Medicine, Department of Pediatrics, Faculty of Medicine and Dentistry, Palacky University in Olomouc, Puskinova 6, 775 20 Olomouc, Czech Republic
| | - Juan Hidalgo
- Institute of Neurosciences and Department of Cellular Biology, Physiology and Immunology, Animal Physiology unit, Faculty of Sciences, Autonomous University of Barcelona, Bellaterra, Barcelona, 08193, Spain
| | - Roderick R McInnes
- Programs in Genetics and Developmental Biology, The Research Institute, The Hospital for Sick Children, Toronto, Canada M5G 1X8; Departments of Molecular and Medical Genetics and Pediatrics, University of Toronto, Toronto, M5S 1A1, Canada
| | - Merrill C Miller III
- Environmental Genomics Section, C3-03, PO Box 12233, National Institute of Environmental Health Sciences, Research Triangle Park, NC 27709, USA
| | - Milena Penkowa
- Section of Neuroprotection, Centre of Inflammation and Metabolism, The Faculty of Health Sciences, University of Copenhagen, Blegdamsvej 3, DK-2200, Copenhagen Denmark
| | - Michael S Rolph
- Arthritis and Inflammation Research Program, Garvan Institute of Medical Research, 384 Victoria St, Darlinghurst NSW 2010, Australia
| | - Jordan Sottosanto
- Department of Plant Sciences, University of California, One Shields Ave, Davis, CA 95616, USA
| | - Rene St-Arnaud
- Genetics Unit, Shriners Hospital for Children and Departments of Surgery and Human Genetics, McGill University, Montréal H3A 2T5, Québec, Canada
| | - Michael J Szego
- Programs in Genetics and Developmental Biology, The Research Institute, The Hospital for Sick Children, Toronto, Canada M5G 1X8; Departments of Molecular and Medical Genetics, University of Toronto, Toronto, M5S 1A1, Canada
| | - David Twell
- Department of Biology, University of Leicester, LE1 7RH Leicester, UK
| | - Charles Wang
- Transcriptional Genomics Core, Cedars-Sinai Medical Center, Los Angeles, CA 90048, USA
- Department of Medicine, Cedars-Sinai Medical Center, David Geffen School of Medicine, UCLA, Los Angeles, CA 90048, USA
| |
Collapse
|
13
|
Current Awareness on Comparative and Functional Genomics. Comp Funct Genomics 2005. [PMCID: PMC2447491 DOI: 10.1002/cfg.425] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
|