1
|
Prediction of early breast cancer patient survival using ensembles of hypoxia signatures. PLoS One 2018; 13:e0204123. [PMID: 30216362 PMCID: PMC6138385 DOI: 10.1371/journal.pone.0204123] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2018] [Accepted: 09/04/2018] [Indexed: 12/20/2022] Open
Abstract
Background Biomarkers are a key component of precision medicine. However, full clinical integration of biomarkers has been met with challenges, partly attributed to analytical difficulties. It has been shown that biomarker reproducibility is susceptible to data preprocessing approaches. Here, we systematically evaluated machine-learning ensembles of preprocessing methods as a general strategy to improve biomarker performance for prediction of survival from early breast cancer. Results We risk stratified breast cancer patients into either low-risk or high-risk groups based on four published hypoxia signatures (Buffa, Winter, Hu, and Sorensen), using 24 different preprocessing approaches for microarray normalization. The 24 binary risk profiles determined for each hypoxia signature were combined using a random forest to evaluate the efficacy of a preprocessing ensemble classifier. We demonstrate that the best way of merging preprocessing methods varies from signature to signature, and that there is likely no ‘best’ preprocessing pipeline that is universal across datasets, highlighting the need to evaluate ensembles of preprocessing algorithms. Further, we developed novel signatures for each preprocessing method and the risk classifications from each were incorporated in a meta-random forest model. Interestingly, the classification of these biomarkers and its ensemble show striking consistency, demonstrating that similar intrinsic biological information are being faithfully represented. As such, these classification patterns further confirm that there is a subset of patients whose prognosis is consistently challenging to predict. Conclusions Performance of different prognostic signatures varies with pre-processing method. A simple classifier by unanimous voting of classifications is a reliable way of improving on single preprocessing methods. Future signatures will likely require integration of intrinsic and extrinsic clinico-pathological variables to better predict disease-related outcomes.
Collapse
|
2
|
Fox NS, Starmans MHW, Haider S, Lambin P, Boutros PC. Ensemble analyses improve signatures of tumour hypoxia and reveal inter-platform differences. BMC Bioinformatics 2014; 15:170. [PMID: 24902696 PMCID: PMC4061774 DOI: 10.1186/1471-2105-15-170] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2013] [Accepted: 05/27/2014] [Indexed: 12/24/2022] Open
Abstract
Background The reproducibility of transcriptomic biomarkers across datasets remains poor, limiting clinical application. We and others have suggested that this is in-part caused by differential error-structure between datasets, and their incomplete removal by pre-processing algorithms. Methods To test this hypothesis, we systematically assessed the effects of pre-processing on biomarker classification using 24 different pre-processing methods and 15 distinct signatures of tumour hypoxia in 10 datasets (2,143 patients). Results We confirm strong pre-processing effects for all datasets and signatures, and find that these differ between microarray versions. Importantly, exploiting different pre-processing techniques in an ensemble technique improved classification for a majority of signatures. Conclusions Assessing biomarkers using an ensemble of pre-processing techniques shows clear value across multiple diseases, datasets and biomarkers. Importantly, ensemble classification improves biomarkers with initially good results but does not result in spuriously improved performance for poor biomarkers. While further research is required, this approach has the potential to become a standard for transcriptomic biomarkers.
Collapse
Affiliation(s)
| | | | | | | | - Paul C Boutros
- Informatics and Bio-computing Platform, Ontario Institute for Cancer Research, Toronto, Canada.
| |
Collapse
|
3
|
Östlund G, Sonnhammer EL. Avoiding pitfalls in gene (co)expression meta-analysis. Genomics 2014; 103:21-30. [DOI: 10.1016/j.ygeno.2013.10.006] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2013] [Revised: 09/30/2013] [Accepted: 10/22/2013] [Indexed: 11/16/2022]
|
4
|
Awofala AA. Application of microarray technology in Drosophila ethanol behavioral research. ACTA ACUST UNITED AC 2012. [DOI: 10.1007/s11515-011-1177-8] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
5
|
Unifying gene expression measures from multiple platforms using factor analysis. PLoS One 2011; 6:e17691. [PMID: 21436879 PMCID: PMC3059153 DOI: 10.1371/journal.pone.0017691] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2010] [Accepted: 02/10/2011] [Indexed: 11/19/2022] Open
Abstract
In the Cancer Genome Atlas (TCGA) project, gene expression of the same set of samples is measured multiple times on different microarray platforms. There are two main advantages to combining these measurements. First, we have the opportunity to obtain a more precise and accurate estimate of expression levels than using the individual platforms alone. Second, the combined measure simplifies downstream analysis by eliminating the need to work with three sets of expression measures and to consolidate results from the three platforms. We propose to use factor analysis (FA) to obtain a unified gene expression measure (UE) from multiple platforms. The UE is a weighted average of the three platforms, and is shown to perform well in terms of accuracy and precision. In addition, the FA model produces parameter estimates that allow the assessment of the model fit. The R code is provided in File S2. Gene-level FA measurements for the TCGA data sets are available from http://tcga-data.nci.nih.gov/docs/publications/unified_expression/.
Collapse
|
6
|
Liu X, Li Z, Wen J, Cai Q, Xu Y, Zhang X. Prediction of multiple drug resistance phenotype in cancer cell lines using gene expression profiles and phylogenetic trees. CHINESE SCIENCE BULLETIN-CHINESE 2010. [DOI: 10.1007/s11434-010-4131-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
7
|
Diaz-Romero J, Romeo S, Bovée JVMG, Hogendoorn PCW, Heini PF, Mainil-Varlet P. Hierarchical clustering of flow cytometry data for the study of conventional central chondrosarcoma. J Cell Physiol 2010; 225:601-11. [PMID: 20506378 DOI: 10.1002/jcp.22245] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
We have investigated the use of hierarchical clustering of flow cytometry data to classify samples of conventional central chondrosarcoma, a malignant cartilage forming tumor of uncertain cellular origin, according to similarities with surface marker profiles of several known cell types. Human primary chondrosarcoma cells, articular chondrocytes, mesenchymal stem cells, fibroblasts, and a panel of tumor cell lines from chondrocytic or epithelial origin were clustered based on the expression profile of eleven surface markers. For clustering, eight hierarchical clustering algorithms, three distance metrics, as well as several approaches for data preprocessing, including multivariate outlier detection, logarithmic transformation, and z-score normalization, were systematically evaluated. By selecting clustering approaches shown to give reproducible results for cluster recovery of known cell types, primary conventional central chondrosacoma cells could be grouped in two main clusters with distinctive marker expression signatures: one group clustering together with mesenchymal stem cells (CD49b-high/CD10-low/CD221-high) and a second group clustering close to fibroblasts (CD49b-low/CD10-high/CD221-low). Hierarchical clustering also revealed substantial differences between primary conventional central chondrosarcoma cells and established chondrosarcoma cell lines, with the latter not only segregating apart from primary tumor cells and normal tissue cells, but clustering together with cell lines from epithelial lineage. Our study provides a foundation for the use of hierarchical clustering applied to flow cytometry data as a powerful tool to classify samples according to marker expression patterns, which could lead to uncover new cancer subtypes.
Collapse
Affiliation(s)
- Jose Diaz-Romero
- Osteoarticular Research Group, Institute of Pathology, University of Bern, Bern, Switzerland.
| | | | | | | | | | | |
Collapse
|
8
|
Griffiths W, Koal T, Wang Y, Kohl M, Enot D, Deigner HP. Targeted Metabolomics for Biomarker Discovery. Angew Chem Int Ed Engl 2010; 49:5426-45. [DOI: 10.1002/anie.200905579] [Citation(s) in RCA: 259] [Impact Index Per Article: 18.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]
|
9
|
Griffiths W, Koal T, Wang Y, Kohl M, Enot D, Deigner HP. “Targeted Metabolomics” in der Biomarkerforschung. Angew Chem Int Ed Engl 2010. [DOI: 10.1002/ange.200905579] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]
|
10
|
Sontrop HMJ, Moerland PD, van den Ham R, Reinders MJT, Verhaegh WFJ. A comprehensive sensitivity analysis of microarray breast cancer classification under feature variability. BMC Bioinformatics 2009; 10:389. [PMID: 19941644 PMCID: PMC2789744 DOI: 10.1186/1471-2105-10-389] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2009] [Accepted: 11/26/2009] [Indexed: 01/01/2023] Open
Abstract
BACKGROUND Large discrepancies in signature composition and outcome concordance have been observed between different microarray breast cancer expression profiling studies. This is often ascribed to differences in array platform as well as biological variability. We conjecture that other reasons for the observed discrepancies are the measurement error associated with each feature and the choice of preprocessing method. Microarray data are known to be subject to technical variation and the confidence intervals around individual point estimates of expression levels can be wide. Furthermore, the estimated expression values also vary depending on the selected preprocessing scheme. In microarray breast cancer classification studies, however, these two forms of feature variability are almost always ignored and hence their exact role is unclear. RESULTS We have performed a comprehensive sensitivity analysis of microarray breast cancer classification under the two types of feature variability mentioned above. We used data from six state of the art preprocessing methods, using a compendium consisting of eight different datasets, involving 1131 hybridizations, containing data from both one and two-color array technology. For a wide range of classifiers, we performed a joint study on performance, concordance and stability. In the stability analysis we explicitly tested classifiers for their noise tolerance by using perturbed expression profiles that are based on uncertainty information directly related to the preprocessing methods. Our results indicate that signature composition is strongly influenced by feature variability, even if the array platform and the stratification of patient samples are identical. In addition, we show that there is often a high level of discordance between individual class assignments for signatures constructed on data coming from different preprocessing schemes, even if the actual signature composition is identical. CONCLUSION Feature variability can have a strong impact on breast cancer signature composition, as well as the classification of individual patient samples. We therefore strongly recommend that feature variability is considered in analyzing data from microarray breast cancer expression profiling experiments.
Collapse
|
11
|
Kim SY. Descriptive and Systematic Comparison of Clustering Methods in Microarray Data Analysis. KOREAN JOURNAL OF APPLIED STATISTICS 2009. [DOI: 10.5351/kjas.2009.22.1.089] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
|
12
|
Wilder SP, Kaisaki PJ, Argoud K, Salhan A, Ragoussis J, Bihoreau MT, Gauguier D. Comparative analysis of methods for gene transcription profiling data derived from different microarray technologies in rat and mouse models of diabetes. BMC Genomics 2009; 10:63. [PMID: 19196459 PMCID: PMC2652496 DOI: 10.1186/1471-2164-10-63] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2008] [Accepted: 02/05/2009] [Indexed: 02/05/2023] Open
Abstract
Background Microarray technologies are widely used to quantify the abundance of transcripts corresponding to thousands of genes. To maximise the robustness of transcriptome results, we have tested the performance and reproducibility of rat and mouse gene expression data obtained with Affymetrix, Illumina and Operon platforms. Results We present a thorough analysis of the degree of reproducibility provided by analysing the transcriptomic profile of the same animals of several experimental groups under different popular microarray technologies in different tissues. Concordant results from inter- and intra-platform comparisons were maximised by testing many popular computational methods for generating fold changes and significances and by only considering oligonucleotides giving high expression levels. The choice of Affymetrix signal extraction technique was shown to have the greatest effect on the concordance across platforms. In both species, when choosing optimal methods, the agreement between data generated on the Affymetrix and Illumina was excellent; this was verified using qRT-PCR on a selection of genes present on all platforms. Conclusion This study provides an extensive assessment of analytical methods best suited for processing data from different microarray technologies and can assist integration of technologically different gene expression datasets in biological systems.
Collapse
Affiliation(s)
- Steven P Wilder
- The Wellcome Trust Centre for Human Genetics, University of Oxford, Headington, Oxford, UK.
| | | | | | | | | | | | | |
Collapse
|
13
|
Abstract
Calibration of microarray measurements aims at removing systematic biases from the probe-level data to get expression estimates that linearly correlate with the transcript abundance in the studied samples. The improvement of calibration methods is an essential prerequisite for estimating absolute expression levels, which, in turn, are required for quantitative analyses of transcriptional regulation, for example, in the context of gene profiling of diseases. We address hybridization on microarrays as a reaction process in a complex environment and express the measured intensities as a function of the input quantities of the experiment. Popular calibration methods such as MAS5, dChip, RMA, gcRMA, vsn, and PLIER are briefly reviewed and assessed in light of the hybridization model and of previous benchmark studies. We present our hook method, a new calibration approach that is based on a graphical summary of the actual hybridization characteristics of a particular microarray. Although single-chip related, hook performs as well as the multi-chip-related gcRMA, presently one of the best state-of-the-art methods for estimating expression values. The hook method, in addition, provides a set of chip summary characteristics that evaluate the performance of a given hybridization. The algorithm of the method is briefly described and its performance is exemplified.
Collapse
|
14
|
Sun Z, Wigle DA, Yang P. Non-overlapping and non-cell-type-specific gene expression signatures predict lung cancer survival. J Clin Oncol 2008; 26:877-83. [PMID: 18281660 DOI: 10.1200/jco.2007.13.1516] [Citation(s) in RCA: 86] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
PURPOSE Gene expression profiling for outcome prediction of non-small-cell lung cancer (NSCLC) remains clouded by heterogeneous and unvalidated results. This study applied multivariate approaches to identify and evaluate value-added gene expression signatures in two types of NSCLC. MATERIALS AND METHODS Two NSCLC oligonucleotide microarray data sets of adenocarcinoma and squamous cell carcinoma were used as training sets to select prognostic genes independent of conventional predictors. The top 50 genes from each set were used to predict the outcomes of two independent validation data sets of 84 and 91 NSCLC cases. RESULTS Adenocarcinomas with the 50-gene signature from adenocarcinoma in both validation data sets had a 2.4-fold (95% CI, 1.3 to 4.4 and 1.0 to 5.8) increased mortality after adjustment for conventional predictors. Squamous cell carcinoma with this high-risk signature had an adjusted risk of 1.1 (95% CI, 0.4 to 3.2) in one data set and 2.5 (95% CI, 1.1 to 5.8) in another consisting of stage I tumors. Adenocarcinoma with the 50-gene signature from squamous cell carcinoma had an elevated risk of 3.5 (95% CI, 1.4 to 9.0) after adjustment for conventional predictors. Squamous cell carcinoma with this high risk signature had an adjusted risk of 1.8 (95% CI, 0.7 to 4.6). Despite the little overlap in individual genes, the two gene signatures had significant functional connectedness in molecular pathways. CONCLUSION Two non-overlapping but functionally related gene expression signatures provide consistently improved survival prediction for NSCLC regardless of histologic cell type. Multiple sets of genes may exist for NSCLC with predictive value, but ones with independent predictive value beyond clinical predictors will be required for clinical translation.
Collapse
Affiliation(s)
- Zhifu Sun
- Department of Health Sciences Research, College of Medicine, Mayo Clinic, 200 First St SW, Rochester, MN 55905, USA.
| | | | | |
Collapse
|
15
|
Lau SK, Boutros PC, Pintilie M, Blackhall FH, Zhu CQ, Strumpf D, Johnston MR, Darling G, Keshavjee S, Waddell TK, Liu N, Lau D, Penn LZ, Shepherd FA, Jurisica I, Der SD, Tsao MS. Three-gene prognostic classifier for early-stage non small-cell lung cancer. J Clin Oncol 2007; 25:5562-9. [PMID: 18065728 DOI: 10.1200/jco.2007.12.0352] [Citation(s) in RCA: 181] [Impact Index Per Article: 10.6] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
PURPOSE Several microarray studies have reported gene expression signatures that classify non-small-cell lung carcinoma (NSCLC) patients into different prognostic groups. However, the prognostic gene lists reported to date overlap poorly across studies, and few have been validated independently using more quantitative assay methods. PATIENTS AND METHODS The expression of 158 putative prognostic genes identified in previous microarray studies was analyzed by reverse transcription quantitative polymerase chain reaction in the tumors of 147 NSCLC patients. Concordance indices and risk scores were used to identify a stage-independent set of genes that could classify patients with significantly different prognoses. RESULTS We have identified a three-gene classifier (STX1A, HIF1A, and CCR7) for overall survival (hazard ratio = 3.8; 95% CI, 1.7 to 8.2; P < .001). The classifier was also able to stratify stage I and II patients and further improved the predictive ability of clinical factors such as histology and tumor stage. The predictive value of this three-gene classifier was validated in two large independent microarray data sets from Harvard and Duke Universities. CONCLUSION We have identified a new three-gene classifier that is independent of and improves on stage to stratify early-stage NSCLC patients with significantly different prognoses. This classifier may be tested further for its potential value to improve the selection of resected NSCLC patients in adjuvant therapy.
Collapse
Affiliation(s)
- Suzanne K Lau
- Princess Margaret Hospital, 610 University Ave, Toronto, Ontario, Canada
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
16
|
Graze RM, Barmina O, Tufts D, Naderi E, Harmon KL, Persianinova M, Nuzhdin SV. New candidate genes for sex-comb divergence between Drosophila mauritiana and Drosophila simulans. Genetics 2007; 176:2561-76. [PMID: 17565959 PMCID: PMC1950655 DOI: 10.1534/genetics.106.067686] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
A large-effect QTL for divergence in sex-comb tooth number between Drosophila simulans and D. mauritiana was previously mapped to 73A-84AB. Here we identify genes that are likely contributors to this divergence. We first improved the mapping resolution in the 73A-84AB region using 12 introgression lines and 62 recombinant nearly isogenic lines. To further narrow the list of candidate genes, we assayed leg-specific expression and identified genes with transcript-level evolution consistent with a potential role in sex-comb divergence. Sex combs are formed on the prothoracic (front) legs, but not on the mesothoracic (middle) legs of Drosophila males. We extracted RNA from the prothoracic and mesothoracic pupal legs of two species to determine which of the genes expressed differently between leg types were also divergent for gene expression. Two good functional candidate genes, Scr and dsx, are located in one of our fine-scale QTL regions. In addition, three previously uncharacterized genes (CG15186, CG2016, and CG2791) emerged as new candidates. These genes are located in regions strongly associated with sex-comb tooth number differences and are expressed differently between leg tissues and between species. Further supporting the potential involvement of these genes in sex-comb divergence, we found a significant difference in sex-comb tooth number between co-isogenic D. melanogaster lines with and without P-element insertions at CG2791.
Collapse
Affiliation(s)
- Rita M Graze
- Genetics Graduate Group, Center for Genetics and Development, University of California-Davis, 1 Shields Avenue, Davis, CA 95616.
| | | | | | | | | | | | | |
Collapse
|
17
|
Heber S, Sick B. Quality assessment of Affymetrix GeneChip data. OMICS-A JOURNAL OF INTEGRATIVE BIOLOGY 2006; 10:358-68. [PMID: 17069513 DOI: 10.1089/omi.2006.10.358] [Citation(s) in RCA: 79] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
Affymetrix GeneChips are one of the best established microarray platforms. This powerful technique allows users to measure the expression of thousands of genes simultaneously. However, a microarray experiment is a sophisticated and time consuming endeavor with many potential sources of unwanted variation that could compromise the results if left uncontrolled. Increasing data volume and data complexity have triggered growing concern and awareness of the importance of assessing the quality of generated microarray data. In this review, we give an overview of current methods and software tools for quality assessment of Affymetrix GeneChip data. We focus on quality metrics, diagnostic plots, probe-level methods, pseudo-images, and classification methods to identify corrupted chips. We also describe RNA quality assessment methods which play an important role in challenging RNA sources like formalin embedded biopsies, laser-micro dissected samples, or single cells. No wet-lab methods are discussed in this paper.
Collapse
Affiliation(s)
- Steffen Heber
- Department of Computer Science, North Carolina State University, Raleigh, North Carolina, USA
| | | |
Collapse
|
18
|
Culligan KM, Robertson CE, Foreman J, Doerner P, Britt AB. ATR and ATM play both distinct and additive roles in response to ionizing radiation. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2006; 48:947-61. [PMID: 17227549 DOI: 10.1111/j.1365-313x.2006.02931.x] [Citation(s) in RCA: 234] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/13/2023]
Abstract
The ATR and ATM protein kinases are known to be involved in a wide variety of responses to DNA damage. The Arabidopsis thaliana genome includes both ATR and ATM orthologs, and plants with null alleles of these genes are viable. Arabidopsis atr and atm mutants display hypersensitivity to gamma-irradiation. To further characterize the roles of ATM and ATR in response to ionizing radiation, we performed a short-term global transcription analysis in wild-type and mutant lines. We found that hundreds of genes are upregulated in response to gamma-irradiation, and that the induction of virtually all of these genes is dependent on ATM, but not ATR. The transcript of CYCB1;1 is unique among the cyclin transcripts in being rapidly and powerfully upregulated in response to ionizing radiation, while other G(2)-associated transcripts are suppressed. We found that both ATM and ATR contribute to the induction of a CYCB1;1:GUS fusion by IR, but only ATR is required for the persistence of this response. We propose that this upregulation of CYCB1;1 does not reflect the accumulation of cells in G(2), but instead reflects a still unknown role for this cyclin in DNA damage response.
Collapse
Affiliation(s)
- Kevin M Culligan
- Department of Biochemistry and Molecular Biology, University of New Hampshire, Durham, NH 03824, USA.
| | | | | | | | | |
Collapse
|
19
|
Holloway AJ, Oshlack A, Diyagama DS, Bowtell DDL, Smyth GK. Statistical analysis of an RNA titration series evaluates microarray precision and sensitivity on a whole-array basis. BMC Bioinformatics 2006; 7:511. [PMID: 17118209 PMCID: PMC1664592 DOI: 10.1186/1471-2105-7-511] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2006] [Accepted: 11/22/2006] [Indexed: 12/29/2022] Open
Abstract
BACKGROUND Concerns are often raised about the accuracy of microarray technologies and the degree of cross-platform agreement, but there are yet no methods which can unambiguously evaluate precision and sensitivity for these technologies on a whole-array basis. RESULTS A methodology is described for evaluating the precision and sensitivity of whole-genome gene expression technologies such as microarrays. The method consists of an easy-to-construct titration series of RNA samples and an associated statistical analysis using non-linear regression. The method evaluates the precision and responsiveness of each microarray platform on a whole-array basis, i.e., using all the probes, without the need to match probes across platforms. An experiment is conducted to assess and compare four widely used microarray platforms. All four platforms are shown to have satisfactory precision but the commercial platforms are superior for resolving differential expression for genes at lower expression levels. The effective precision of the two-color platforms is improved by allowing for probe-specific dye-effects in the statistical model. The methodology is used to compare three data extraction algorithms for the Affymetrix platforms, demonstrating poor performance for the commonly used proprietary algorithm relative to the other algorithms. For probes which can be matched across platforms, the cross-platform variability is decomposed into within-platform and between-platform components, showing that platform disagreement is almost entirely systematic rather than due to measurement variability. CONCLUSION The results demonstrate good precision and sensitivity for all the platforms, but highlight the need for improved probe annotation. They quantify the extent to which cross-platform measures can be expected to be less accurate than within-platform comparisons for predicting disease progression or outcome.
Collapse
Affiliation(s)
- Andrew J Holloway
- Ian Potter Foundation Centre for Cancer Genomics and Predictive Medicine, Peter MacCallum Cancer Centre, St Andrew's Place, East Melbourne, Victoria 3002, Australia
| | - Alicia Oshlack
- Walter and Eliza Hall Institute, 1G Royal Parade, Parkville, Victoria 3050, Australia
| | - Dileepa S Diyagama
- Ian Potter Foundation Centre for Cancer Genomics and Predictive Medicine, Peter MacCallum Cancer Centre, St Andrew's Place, East Melbourne, Victoria 3002, Australia
| | - David DL Bowtell
- Ian Potter Foundation Centre for Cancer Genomics and Predictive Medicine, Peter MacCallum Cancer Centre, St Andrew's Place, East Melbourne, Victoria 3002, Australia
| | - Gordon K Smyth
- Walter and Eliza Hall Institute, 1G Royal Parade, Parkville, Victoria 3050, Australia
| |
Collapse
|