Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Luo J, Schumacher M, Scherer A, Sanoudou D, Megherbi D, Davison T, Shi T, Tong W, Shi L, Hong H, Zhao C, Elloumi F, Shi W, Thomas R, Lin S, Tillinghast G, Liu G, Zhou Y, Herman D, Li Y, Deng Y, Fang H, Bushel P, Woods M, Zhang J. A comparison of batch effect removal methods for enhancement of prediction performance using MAQC-II microarray gene expression data. Pharmacogenomics J 2010;10:278-91. [PMID: 20676067 PMCID: PMC2920074 DOI: 10.1038/tpj.2010.57] [Citation(s) in RCA: 202] [Impact Index Per Article: 13.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]

For:	Luo J, Schumacher M, Scherer A, Sanoudou D, Megherbi D, Davison T, Shi T, Tong W, Shi L, Hong H, Zhao C, Elloumi F, Shi W, Thomas R, Lin S, Tillinghast G, Liu G, Zhou Y, Herman D, Li Y, Deng Y, Fang H, Bushel P, Woods M, Zhang J. A comparison of batch effect removal methods for enhancement of prediction performance using MAQC-II microarray gene expression data. Pharmacogenomics J 2010;10:278-91. [PMID: 20676067 PMCID: PMC2920074 DOI: 10.1038/tpj.2010.57] [Citation(s) in RCA: 202] [Impact Index Per Article: 13.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]

Number

Cited by Other Article(s)

151

Parker HS, Corrada Bravo H, Leek JT. Removing batch effects for prediction problems with frozen surrogate variable analysis. PeerJ 2014;2:e561. [PMID: 25332844 PMCID: PMC4179553 DOI: 10.7717/peerj.561] [Citation(s) in RCA: 48] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2014] [Accepted: 08/15/2014] [Indexed: 01/06/2023] Open

152

Richard AC, Lyons PA, Peters JE, Biasci D, Flint SM, Lee JC, McKinney EF, Siegel RM, Smith KGC. Comparison of gene expression microarray data with count-based RNA measurements informs microarray interpretation. BMC Genomics 2014;15:649. [PMID: 25091430 PMCID: PMC4143561 DOI: 10.1186/1471-2164-15-649] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2014] [Accepted: 07/17/2014] [Indexed: 01/02/2023] Open

Abstract

BACKGROUND

Although numerous investigations have compared gene expression microarray platforms, preprocessing methods and batch correction algorithms using constructed spike-in or dilution datasets, there remains a paucity of studies examining the properties of microarray data using diverse biological samples. Most microarray experiments seek to identify subtle differences between samples with variable background noise, a scenario poorly represented by constructed datasets. Thus, microarray users lack important information regarding the complexities introduced in real-world experimental settings. The recent development of a multiplexed, digital technology for nucleic acid measurement enables counting of individual RNA molecules without amplification and, for the first time, permits such a study.

RESULTS

Using a set of human leukocyte subset RNA samples, we compared previously acquired microarray expression values with RNA molecule counts determined by the nCounter Analysis System (NanoString Technologies) in selected genes. We found that gene measurements across samples correlated well between the two platforms, particularly for high-variance genes, while genes deemed unexpressed by the nCounter generally had both low expression and low variance on the microarray. Confirming previous findings from spike-in and dilution datasets, this "gold-standard" comparison demonstrated signal compression that varied dramatically by expression level and, to a lesser extent, by dataset. Most importantly, examination of three different cell types revealed that noise levels differed across tissues.

CONCLUSIONS

Microarray measurements generally correlate with relative RNA molecule counts within optimal ranges but suffer from expression-dependent accuracy bias and precision that varies across datasets. We urge microarray users to consider expression-level effects in signal interpretation and to evaluate noise properties in each dataset independently.

Collapse

153

Lee JA, Dobbin KK, Ahn J. Covariance adjustment for batch effect in gene expression data. Stat Med 2014;33:2681-95. [PMID: 24687561 PMCID: PMC4065794 DOI: 10.1002/sim.6157] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2013] [Revised: 11/26/2013] [Accepted: 02/28/2014] [Indexed: 02/01/2023]

154

Larsen MJ, Thomassen M, Tan Q, Sørensen KP, Kruse TA. Microarray-based RNA profiling of breast cancer: batch effect removal improves cross-platform consistency. BIOMED RESEARCH INTERNATIONAL 2014;2014:651751. [PMID: 25101291 PMCID: PMC4101981 DOI: 10.1155/2014/651751] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/22/2014] [Revised: 04/17/2014] [Accepted: 06/09/2014] [Indexed: 12/13/2022]

155

Soneson C, Gerster S, Delorenzi M. Batch effect confounding leads to strong bias in performance estimates obtained by cross-validation. PLoS One 2014;9:e100335. [PMID: 24967636 PMCID: PMC4072626 DOI: 10.1371/journal.pone.0100335] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2014] [Accepted: 05/26/2014] [Indexed: 01/05/2023] Open

Abstract

Background

With the large amount of biological data that is currently publicly available, many investigators combine multiple data sets to increase the sample size and potentially also the power of their analyses. However, technical differences (“batch effects”) as well as differences in sample composition between the data sets may significantly affect the ability to draw generalizable conclusions from such studies.

Focus

The current study focuses on the construction of classifiers, and the use of cross-validation to estimate their performance. In particular, we investigate the impact of batch effects and differences in sample composition between batches on the accuracy of the classification performance estimate obtained via cross-validation. The focus on estimation bias is a main difference compared to previous studies, which have mostly focused on the predictive performance and how it relates to the presence of batch effects.

Data

We work on simulated data sets. To have realistic intensity distributions, we use real gene expression data as the basis for our simulation. Random samples from this expression matrix are selected and assigned to group 1 (e.g., ‘control’) or group 2 (e.g., ‘treated’). We introduce batch effects and select some features to be differentially expressed between the two groups. We consider several scenarios for our study, most importantly different levels of confounding between groups and batch effects.

Methods

We focus on well-known classifiers: logistic regression, Support Vector Machines (SVM), k-nearest neighbors (kNN) and Random Forests (RF). Feature selection is performed with the Wilcoxon test or the lasso. Parameter tuning and feature selection, as well as the estimation of the prediction performance of each classifier, is performed within a nested cross-validation scheme. The estimated classification performance is then compared to what is obtained when applying the classifier to independent data.

Collapse

156

Parker HS, Leek JT, Favorov AV, Considine M, Xia X, Chavan S, Chung CH, Fertig EJ. Preserving biological heterogeneity with a permuted surrogate variable analysis for genomics batch correction. ACTA ACUST UNITED AC 2014;30:2757-63. [PMID: 24907368 DOI: 10.1093/bioinformatics/btu375] [Citation(s) in RCA: 97] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]

Affiliation(s)

Hilary S Parker Department of Biostatistics, Bloomberg School of Public Health, Johns Hopkins University, Department of Oncology, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University, Baltimore, MD 21205, USA, Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow 119333, Russia, Research Institute for Genetics and Selection of Industrial Microorganisms "GosNIIGenetika", Moscow 117545, Russia, Department of Statistics and Biostatistics, Rutgers University, NJ 08854, USA and Division of Allergy & Clinical Immunology, Department of Medicine, Johns Hopkins University, Baltimore, MD 21224, USA
Jeffrey T Leek Department of Biostatistics, Bloomberg School of Public Health, Johns Hopkins University, Department of Oncology, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University, Baltimore, MD 21205, USA, Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow 119333, Russia, Research Institute for Genetics and Selection of Industrial Microorganisms "GosNIIGenetika", Moscow 117545, Russia, Department of Statistics and Biostatistics, Rutgers University, NJ 08854, USA and Division of Allergy & Clinical Immunology, Department of Medicine, Johns Hopkins University, Baltimore, MD 21224, USA
Alexander V Favorov Department of Biostatistics, Bloomberg School of Public Health, Johns Hopkins University, Department of Oncology, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University, Baltimore, MD 21205, USA, Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow 119333, Russia, Research Institute for Genetics and Selection of Industrial Microorganisms "GosNIIGenetika", Moscow 117545, Russia, Department of Statistics and Biostatistics, Rutgers University, NJ 08854, USA and Division of Allergy & Clinical Immunology, Department of Medicine, Johns Hopkins University, Baltimore, MD 21224, USA Department of Biostatistics, Bloomberg School of Public Health, Johns Hopkins University, Department of Oncology, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University, Baltimore, MD 21205, USA, Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow 119333, Russia, Research Institute for Genetics and Selection of Industrial Microorganisms "GosNIIGenetika", Moscow 117545, Russia, Department of Statistics and Biostatistics, Rutgers University, NJ 08854, USA and Division of Allergy & Clinical Immunology, Department of Medicine, Johns Hopkins University, Baltimore, MD 21224, USA Department of Biostatistics, Bloomberg School of Public Health, Johns Hopkins University, Department of Oncology, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University, Baltimore, MD 21205, USA, Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow 119333, Russia, Research Institute for Genetics and Selection of Industrial Microorganisms "GosNIIGenetika", Moscow 117545, Russia, Department of Statistics and Biostatistics, Rutgers University, NJ 08854, USA and Division of Allergy & Clinical Immunology, Department of Medicine, Johns Hopkins University, Baltimore, MD 21224, USA
Michael Considine Department of Biostatistics, Bloomberg School of Public Health, Johns Hopkins University, Department of Oncology, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University, Baltimore, MD 21205, USA, Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow 119333, Russia, Research Institute for Genetics and Selection of Industrial Microorganisms "GosNIIGenetika", Moscow 117545, Russia, Department of Statistics and Biostatistics, Rutgers University, NJ 08854, USA and Division of Allergy & Clinical Immunology, Department of Medicine, Johns Hopkins University, Baltimore, MD 21224, USA
Xiaoxin Xia Department of Biostatistics, Bloomberg School of Public Health, Johns Hopkins University, Department of Oncology, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University, Baltimore, MD 21205, USA, Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow 119333, Russia, Research Institute for Genetics and Selection of Industrial Microorganisms "GosNIIGenetika", Moscow 117545, Russia, Department of Statistics and Biostatistics, Rutgers University, NJ 08854, USA and Division of Allergy & Clinical Immunology, Department of Medicine, Johns Hopkins University, Baltimore, MD 21224, USA
Sameer Chavan Department of Biostatistics, Bloomberg School of Public Health, Johns Hopkins University, Department of Oncology, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University, Baltimore, MD 21205, USA, Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow 119333, Russia, Research Institute for Genetics and Selection of Industrial Microorganisms "GosNIIGenetika", Moscow 117545, Russia, Department of Statistics and Biostatistics, Rutgers University, NJ 08854, USA and Division of Allergy & Clinical Immunology, Department of Medicine, Johns Hopkins University, Baltimore, MD 21224, USA
Christine H Chung Department of Biostatistics, Bloomberg School of Public Health, Johns Hopkins University, Department of Oncology, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University, Baltimore, MD 21205, USA, Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow 119333, Russia, Research Institute for Genetics and Selection of Industrial Microorganisms "GosNIIGenetika", Moscow 117545, Russia, Department of Statistics and Biostatistics, Rutgers University, NJ 08854, USA and Division of Allergy & Clinical Immunology, Department of Medicine, Johns Hopkins University, Baltimore, MD 21224, USA
Elana J Fertig Department of Biostatistics, Bloomberg School of Public Health, Johns Hopkins University, Department of Oncology, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University, Baltimore, MD 21205, USA, Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow 119333, Russia, Research Institute for Genetics and Selection of Industrial Microorganisms "GosNIIGenetika", Moscow 117545, Russia, Department of Statistics and Biostatistics, Rutgers University, NJ 08854, USA and Division of Allergy & Clinical Immunology, Department of Medicine, Johns Hopkins University, Baltimore, MD 21224, USA

Collapse

157

Sordillo J, Raby BA. Gene expression profiling in asthma. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2014;795:157-81. [PMID: 24162908 DOI: 10.1007/978-1-4614-8603-9_10] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]

158

Cangelosi D, Muselli M, Parodi S, Blengio F, Becherini P, Versteeg R, Conte M, Varesio L. Use of Attribute Driven Incremental Discretization and Logic Learning Machine to build a prognostic classifier for neuroblastoma patients. BMC Bioinformatics 2014;15 Suppl 5:S4. [PMID: 25078098 PMCID: PMC4095004 DOI: 10.1186/1471-2105-15-s5-s4] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023] Open

Abstract

BACKGROUND

Cancer patient's outcome is written, in part, in the gene expression profile of the tumor. We previously identified a 62-probe sets signature (NB-hypo) to identify tissue hypoxia in neuroblastoma tumors and showed that NB-hypo stratified neuroblastoma patients in good and poor outcome 1. It was important to develop a prognostic classifier to cluster patients into risk groups benefiting of defined therapeutic approaches. Novel classification and data discretization approaches can be instrumental for the generation of accurate predictors and robust tools for clinical decision support. We explored the application to gene expression data of Rulex, a novel software suite including the Attribute Driven Incremental Discretization technique for transforming continuous variables into simplified discrete ones and the Logic Learning Machine model for intelligible rule generation.

RESULTS

We applied Rulex components to the problem of predicting the outcome of neuroblastoma patients on the bases of 62 probe sets NB-hypo gene expression signature. The resulting classifier consisted in 9 rules utilizing mainly two conditions of the relative expression of 11 probe sets. These rules were very effective predictors, as shown in an independent validation set, demonstrating the validity of the LLM algorithm applied to microarray data and patients' classification. The LLM performed as efficiently as Prediction Analysis of Microarray and Support Vector Machine, and outperformed other learning algorithms such as C4.5. Rulex carried out a feature selection by selecting a new signature (NB-hypo-II) of 11 probe sets that turned out to be the most relevant in predicting outcome among the 62 of the NB-hypo signature. Rules are easily interpretable as they involve only few conditions.

CONCLUSIONS

Our findings provided evidence that the application of Rulex to the expression values of NB-hypo signature created a set of accurate, high quality, consistent and interpretable rules for the prediction of neuroblastoma patients' outcome. We identified the Rulex weighted classification as a flexible tool that can support clinical decisions. For these reasons, we consider Rulex to be a useful tool for cancer classification from microarray gene expression data.

Collapse

159

Kothari S, Phan JH, Stokes TH, Osunkoya AO, Young AN, Wang MD. Removing batch effects from histopathological images for enhanced cancer diagnosis. IEEE J Biomed Health Inform 2014;18:765-72. [PMID: 24808220 PMCID: PMC5003052 DOI: 10.1109/jbhi.2013.2276766] [Citation(s) in RCA: 38] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]

160

Lê Cao KA, Rohart F, McHugh L, Korn O, Wells CA. YuGene: A simple approach to scale gene expression data derived from different platforms for integrated analyses. Genomics 2014;103:239-51. [DOI: 10.1016/j.ygeno.2014.03.001] [Citation(s) in RCA: 37] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2014] [Revised: 03/14/2014] [Accepted: 03/16/2014] [Indexed: 01/09/2023]

161

Chen JJ, Lin WJ, Chen HC. Pharmacogenomic biomarkers for personalized medicine. Pharmacogenomics 2014;14:969-80. [PMID: 23746190 DOI: 10.2217/pgs.13.75] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open

162

Ford NA, Devlin KL, Lashinger LM, Hursting SD. Deconvoluting the obesity and breast cancer link: secretome, soil and seed interactions. J Mammary Gland Biol Neoplasia 2013;18:267-75. [PMID: 24091864 PMCID: PMC3874287 DOI: 10.1007/s10911-013-9301-9] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/05/2013] [Accepted: 09/24/2013] [Indexed: 12/20/2022] Open

163

Slee RB, Grimes BR, Bansal R, Gore J, Blackburn C, Brown L, Gasaway R, Jeong J, Victorino J, March KL, Colombo R, Herbert BS, Korc M. Selective inhibition of pancreatic ductal adenocarcinoma cell growth by the mitotic MPS1 kinase inhibitor NMS-P715. Mol Cancer Ther 2013;13:307-315. [PMID: 24282275 DOI: 10.1158/1535-7163.mct-13-0324] [Citation(s) in RCA: 40] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]

Affiliation(s)

Roger B Slee Indiana University School of Medicine (IUSM) Department of Medical and Molecular Genetics,Indiana. Nerviano Medical Sciences, Nerviano, Italy.,IU Melvin and Bren Simon Cancer Center (IUSCC), Nerviano Medical Sciences, Nerviano, Italy
Brenda R Grimes Indiana University School of Medicine (IUSM) Department of Medical and Molecular Genetics,Indiana. Nerviano Medical Sciences, Nerviano, Italy.,IU Melvin and Bren Simon Cancer Center (IUSCC), Nerviano Medical Sciences, Nerviano, Italy.,IUSCC Center for Pancreatic Cancer Research, Nerviano Medical Sciences, Nerviano, Italy
Ruchi Bansal Indiana University School of Medicine (IUSM) Department of Medical and Molecular Genetics,Indiana. Nerviano Medical Sciences, Nerviano, Italy
Jesse Gore IUSM Department of Medicine, Nerviano Medical Sciences, Nerviano, Italy
Corinne Blackburn Indiana University School of Medicine (IUSM) Department of Medical and Molecular Genetics,Indiana. Nerviano Medical Sciences, Nerviano, Italy
Lyndsey Brown Indiana University School of Medicine (IUSM) Department of Medical and Molecular Genetics,Indiana. Nerviano Medical Sciences, Nerviano, Italy
Rachel Gasaway Indiana University School of Medicine (IUSM) Department of Medical and Molecular Genetics,Indiana. Nerviano Medical Sciences, Nerviano, Italy
Jaesik Jeong IUSM Department of Biostatistics, Nerviano Medical Sciences, Nerviano, Italy
Jose Victorino Indiana University School of Medicine (IUSM) Department of Medical and Molecular Genetics,Indiana. Nerviano Medical Sciences, Nerviano, Italy.,California State University Dominguez Hills, Nerviano Medical Sciences, Nerviano, Italy
Keith L March IUSM Department of Medicine, Nerviano Medical Sciences, Nerviano, Italy.,IUSM Department of Biochemistry and Molecular Biology, Nerviano Medical Sciences, Nerviano, Italy.,Krannert Institute of Cardiology, Nerviano Medical Sciences, Nerviano, Italy.,Indiana Center for Vascular Biology, Nerviano Medical Sciences, Nerviano, Italy.,R.L. Roudebush Veterans Affairs Medical Center, Nerviano Medical Sciences, Nerviano, Italy
Riccardo Colombo Indianapolis, Indiana. Nerviano Medical Sciences, Nerviano, Italy
Brittney-Shea Herbert Indiana University School of Medicine (IUSM) Department of Medical and Molecular Genetics,Indiana. Nerviano Medical Sciences, Nerviano, Italy.,IU Melvin and Bren Simon Cancer Center (IUSCC), Nerviano Medical Sciences, Nerviano, Italy
Murray Korc IU Melvin and Bren Simon Cancer Center (IUSCC), Nerviano Medical Sciences, Nerviano, Italy.,IUSCC Center for Pancreatic Cancer Research, Nerviano Medical Sciences, Nerviano, Italy.,IUSM Department of Medicine, Nerviano Medical Sciences, Nerviano, Italy.,IUSM Department of Biochemistry and Molecular Biology, Nerviano Medical Sciences, Nerviano, Italy

Collapse

164

Reese SE, Archer KJ, Therneau TM, Atkinson EJ, Vachon CM, de Andrade M, Kocher JPA, Eckel-Passow JE. A new statistic for identifying batch effects in high-throughput genomic data that uses guided principal component analysis. Bioinformatics 2013;29:2877-83. [PMID: 23958724 PMCID: PMC3810845 DOI: 10.1093/bioinformatics/btt480] [Citation(s) in RCA: 92] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2012] [Revised: 07/03/2013] [Accepted: 08/14/2013] [Indexed: 12/31/2022] Open

165

Halloran PF, Pereira AB, Chang J, Matas A, Picton M, De Freitas D, Bromberg J, Serón D, Sellarés J, Einecke G, Reeve J. Microarray diagnosis of antibody-mediated rejection in kidney transplant biopsies: an international prospective study (INTERCOM). Am J Transplant 2013;13:2865-74. [PMID: 24119109 DOI: 10.1111/ajt.12465] [Citation(s) in RCA: 134] [Impact Index Per Article: 11.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2013] [Revised: 07/22/2013] [Accepted: 08/01/2013] [Indexed: 01/25/2023]

166

Wang X, Markowetz F, De Sousa E Melo F, Medema JP, Vermeulen L. Dissecting cancer heterogeneity--an unsupervised classification approach. Int J Biochem Cell Biol 2013;45:2574-9. [PMID: 24004832 DOI: 10.1016/j.biocel.2013.08.014] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2013] [Revised: 08/20/2013] [Accepted: 08/22/2013] [Indexed: 02/04/2023]

167

Zeller T, Blankenberg S. Blood-based gene expression tests: promises and limitations. ACTA ACUST UNITED AC 2013;6:139-40. [PMID: 23591038 DOI: 10.1161/circgenetics.113.000149] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]

168

Rao SSS, Shepherd LA, Bruno AE, Liu S, Miecznikowski JC. Comparing Imputation Procedures for Affymetrix Gene Expression Datasets Using MAQC Datasets. Adv Bioinformatics 2013;2013:790567. [PMID: 24223587 PMCID: PMC3809938 DOI: 10.1155/2013/790567] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2013] [Accepted: 08/28/2013] [Indexed: 01/13/2023] Open

169

Genetic and nongenetic variation revealed for the principal components of human gene expression. Genetics 2013;195:1117-28. [PMID: 24026092 DOI: 10.1534/genetics.113.153221] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open

170

Halloran PF, Pereira AB, Chang J, Matas A, Picton M, De Freitas D, Bromberg J, Serón D, Sellarés J, Einecke G, Reeve J. Potential impact of microarray diagnosis of T cell-mediated rejection in kidney transplants: The INTERCOM study. Am J Transplant 2013;13:2352-63. [PMID: 23915426 DOI: 10.1111/ajt.12387] [Citation(s) in RCA: 105] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2013] [Revised: 05/30/2013] [Accepted: 06/14/2013] [Indexed: 01/25/2023]

171

Tarca AL, Lauria M, Unger M, Bilal E, Boue S, Kumar Dey K, Hoeng J, Koeppl H, Martin F, Meyer P, Nandy P, Norel R, Peitsch M, Rice JJ, Romero R, Stolovitzky G, Talikka M, Xiang Y, Zechner C. Strengths and limitations of microarray-based phenotype prediction: lessons learned from the IMPROVER Diagnostic Signature Challenge. ACTA ACUST UNITED AC 2013;29:2892-9. [PMID: 23966112 DOI: 10.1093/bioinformatics/btt492] [Citation(s) in RCA: 101] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]

Abstract

MOTIVATION

After more than a decade since microarrays were used to predict phenotype of biological samples, real-life applications for disease screening and identification of patients who would best benefit from treatment are still emerging. The interest of the scientific community in identifying best approaches to develop such prediction models was reaffirmed in a competition style international collaboration called IMPROVER Diagnostic Signature Challenge whose results we describe herein.

RESULTS

Fifty-four teams used public data to develop prediction models in four disease areas including multiple sclerosis, lung cancer, psoriasis and chronic obstructive pulmonary disease, and made predictions on blinded new data that we generated. Teams were scored using three metrics that captured various aspects of the quality of predictions, and best performers were awarded. This article presents the challenge results and introduces to the community the approaches of the best overall three performers, as well as an R package that implements the approach of the best overall team. The analyses of model performance data submitted in the challenge as well as additional simulations that we have performed revealed that (i) the quality of predictions depends more on the disease endpoint than on the particular approaches used in the challenge; (ii) the most important modeling factor (e.g. data preprocessing, feature selection and classifier type) is problem dependent; and (iii) for optimal results datasets and methods have to be carefully matched. Biomedical factors such as the disease severity and confidence in diagnostic were found to be associated with the misclassification rates across the different teams.

AVAILABILITY

The lung cancer dataset is available from Gene Expression Omnibus (accession, GSE43580). The maPredictDSC R package implementing the approach of the best overall team is available at www.bioconductor.org or http://bioinformaticsprb.med.wayne.edu/.

Collapse

172

Kothari S, Phan JH, Stokes TH, Wang MD. Pathology imaging informatics for quantitative analysis of whole-slide images. J Am Med Inform Assoc 2013;20:1099-108. [PMID: 23959844 PMCID: PMC3822114 DOI: 10.1136/amiajnl-2012-001540] [Citation(s) in RCA: 150] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open

173

Gregori J, Villarreal L, Sánchez A, Baselga J, Villanueva J. An effect size filter improves the reproducibility in spectral counting-based comparative proteomics. J Proteomics 2013;95:55-65. [PMID: 23770383 DOI: 10.1016/j.jprot.2013.05.030] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2012] [Revised: 05/06/2013] [Accepted: 05/22/2013] [Indexed: 11/17/2022]

Abstract

UNLABELLED

The microarray community has shown that the low reproducibility observed in gene expression-based biomarker discovery studies is partially due to relying solely on p-values to get the lists of differentially expressed genes. Their conclusions recommended complementing the p-value cutoff with the use of effect-size criteria. The aim of this work was to evaluate the influence of such an effect-size filter on spectral counting-based comparative proteomic analysis. The results proved that the filter increased the number of true positives and decreased the number of false positives and the false discovery rate of the dataset. These results were confirmed by simulation experiments where the effect size filter was used to evaluate systematically variable fractions of differentially expressed proteins. Our results suggest that relaxing the p-value cut-off followed by a post-test filter based on effect size and signal level thresholds can increase the reproducibility of statistical results obtained in comparative proteomic analysis. Based on our work, we recommend using a filter consisting of a minimum absolute log2 fold change of 0.8 and a minimum signal of 2-4 SpC on the most abundant condition for the general practice of comparative proteomics. The implementation of feature filtering approaches could improve proteomic biomarker discovery initiatives by increasing the reproducibility of the results obtained among independent laboratories and MS platforms.

BIOLOGICAL SIGNIFICANCE

Quality control analysis of microarray-based gene expression studies pointed out that the low reproducibility observed in the lists of differentially expressed genes could be partially attributed to the fact that these lists are generated relying solely on p-values. Our study has established that the implementation of an effect size post-test filter improves the statistical results of spectral count-based quantitative proteomics. The results proved that the filter increased the number of true positives whereas decreased the false positives and the false discovery rate of the datasets. The results presented here prove that a post-test filter applying a reasonable effect size and signal level thresholds helps to increase the reproducibility of statistical results in comparative proteomic analysis. Furthermore, the implementation of feature filtering approaches could improve proteomic biomarker discovery initiatives by increasing the reproducibility of results obtained among independent laboratories and MS platforms. This article is part of a Special Issue entitled: Standardization and Quality Control in Proteomics.

Collapse

174

Liu HC, Peng PC, Hsieh TC, Yeh TC, Lin CJ, Chen CY, Hou JY, Shih LY, Liang DC. Comparison of feature selection methods for cross-laboratory microarray analysis. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2013;10:593-604. [PMID: 24091394 DOI: 10.1109/tcbb.2013.70] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/25/2023]

175

Calciano M, Lemarié JC, Blondiaux E, Einstein R, Fehlbaum-Beurdeley P. A predictive microarray-based biomarker for early detection of Alzheimer’s disease intended for clinical diagnostic application. Biomarkers 2013;18:264-72. [DOI: 10.3109/1354750x.2013.773083] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]

176

Tsuyuzaki K, Tominaga D, Kwon Y, Miyazaki S. Two-way AIC: detection of differentially expressed genes from large scale microarray meta-dataset. BMC Genomics 2013;14 Suppl 2:S9. [PMID: 23445621 PMCID: PMC3582450 DOI: 10.1186/1471-2164-14-s2-s9] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open

177

Heider A, Alt R. virtualArray: a R/bioconductor package to merge raw data from different microarray platforms. BMC Bioinformatics 2013;14:75. [PMID: 23452776 PMCID: PMC3599117 DOI: 10.1186/1471-2105-14-75] [Citation(s) in RCA: 41] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2012] [Accepted: 02/22/2013] [Indexed: 11/10/2022] Open

Abstract

Background

Microarrays have become a routine tool to address diverse biological questions. Therefore, different types and generations of microarrays have been produced by several manufacturers over time. Likewise, the diversity of raw data deposited in public databases such as NCBI GEO or EBI ArrayExpress has grown enormously.

This has resulted in databases currently containing several hundred thousand microarray samples clustered by different species, manufacturers and chip generations. While one of the original goals of these databases was to make the data available to other researchers for independent analysis and, where appropriate, integration with their own data, current software implementations could not provide that feature.

Only those data sets generated on the same chip platform can be readily combined and even here there are batch effects to be taken care of. A straightforward approach to deal with multiple chip types and batch effects has been missing.

The software presented here was designed to solve both of these problems in a convenient and user friendly way.

Results

The virtualArray software package can combine raw data sets using almost any chip types based on current annotations from NCBI GEO or Bioconductor. After establishing congruent annotations for the raw data, virtualArray can then directly employ one of seven implemented methods to adjust for batch effects in the data resulting from differences between the chip types used. Both steps can be tuned to the preferences of the user. When the run is finished, the whole dataset is presented as a conventional Bioconductor “ExpressionSet” object, which can be used as input to other Bioconductor packages.

Conclusions

Using this software package, researchers can easily integrate their own microarray data with data from public repositories or other sources that are based on different microarray chip types. Using the default approach a robust and up-to-date batch effect correction technique is applied to the data.

Collapse

178

Lazar C, Taminau J, Meganck S, Steenhoff D, Coletta A, Solís DYW, Molter C, Duque R, Bersini H, Nowé A. GENESHIFT: a nonparametric approach for integrating microarray gene expression data based on the inner product as a distance measure between the distributions of genes. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2013;10:383-392. [PMID: 23929862 DOI: 10.1109/tcbb.2013.12] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/02/2023]

Abstract

The potential of microarray gene expression (MAGE) data is only partially explored due to the limited number of samples in individual studies. This limitation can be surmounted by merging or integrating data sets originating from independent MAGE experiments, which are designed to study the same biological problem. However, this process is hindered by batch effects that are study-dependent and result in random data distortion; therefore numerical transformations are needed to render the integration of different data sets accurate and meaningful. Our contribution in this paper is two-fold. First we propose GENESHIFT, a new nonparametric batch effect removal method based on two key elements from statistics: empirical density estimation and the inner product as a distance measure between two probability density functions; second we introduce a new validation index of batch effect removal methods based on the observation that samples from two independent studies drawn from a same population should exhibit similar probability density functions. We evaluated and compared the GENESHIFT method with four other state-of-the-art methods for batch effect removal: Batch-mean centering, empirical Bayes or COMBAT, distance-weighted discrimination, and cross-platform normalization. Several validation indices providing complementary information about the efficiency of batch effect removal methods have been employed in our validation framework. The results show that none of the methods clearly outperforms the others. More than that, most of the methods used for comparison perform very well with respect to some validation indices while performing very poor with respect to others. GENESHIFT exhibits robust performances and its average rank is the highest among the average ranks of all methods used for comparison.

Collapse

179

Giordan M. A Two-Stage Procedure for the Removal of Batch Effects in Microarray Studies. STATISTICS IN BIOSCIENCES 2013. [DOI: 10.1007/s12561-013-9081-1] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023]

180

Koh CH, Wong L. Embracing noise to improve cross-batch prediction accuracy. BMC SYSTEMS BIOLOGY 2013;6 Suppl 2:S3. [PMID: 23282067 PMCID: PMC3521182 DOI: 10.1186/1752-0509-6-s2-s3] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]

181

Application of DNA microarray technology to gerontological studies. Methods Mol Biol 2013;1048:285-308. [PMID: 23929111 DOI: 10.1007/978-1-62703-556-9_19] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]

182

Wang SY, Kuo CH, Tseng YJ. Batch Normalizer: A Fast Total Abundance Regression Calibration Method to Simultaneously Adjust Batch and Injection Order Effects in Liquid Chromatography/Time-of-Flight Mass Spectrometry-Based Metabolomics Data and Comparison with Current Calibration Methods. Anal Chem 2012;85:1037-46. [DOI: 10.1021/ac302877x] [Citation(s) in RCA: 81] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2023]

183

Xu L, Cheng C, George EO, Homayouni R. Literature aided determination of data quality and statistical significance threshold for gene expression studies. BMC Genomics 2012;13 Suppl 8:S23. [PMID: 23282414 PMCID: PMC3535704 DOI: 10.1186/1471-2164-13-s8-s23] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022] Open

Abstract

Background

Gene expression data are noisy due to technical and biological variability. Consequently, analysis of gene expression data is complex. Different statistical methods produce distinct sets of genes. In addition, selection of expression p-value (EPv) threshold is somewhat arbitrary. In this study, we aimed to develop novel literature based approaches to integrate functional information in analysis of gene expression data.

Methods

Functional relationships between genes were derived by Latent Semantic Indexing (LSI) of Medline abstracts and used to calculate the function cohesion of gene sets. In this study, literature cohesion was applied in two ways. First, Literature-Based Functional Significance (LBFS) method was developed to calculate a p-value for the cohesion of differentially expressed genes (DEGs) in order to objectively evaluate the overall biological significance of the gene expression experiments. Second, Literature Aided Statistical Significance Threshold (LASST) was developed to determine the appropriate expression p-value threshold for a given experiment.

Results

We tested our methods on three different publicly available datasets. LBFS analysis demonstrated that only two experiments were significantly cohesive. For each experiment, we also compared the LBFS values of DEGs generated by four different statistical methods. We found that some statistical tests produced more functionally cohesive gene sets than others. However, no statistical test was consistently better for all experiments. This reemphasizes that a statistical test must be carefully selected for each expression study. Moreover, LASST analysis demonstrated that the expression p-value thresholds for some experiments were considerably lower (p < 0.02 and 0.01), suggesting that the arbitrary p-values and false discovery rate thresholds that are commonly used in expression studies may not be biologically sound.

Conclusions

We have developed robust and objective literature-based methods to evaluate the biological support for gene expression experiments and to determine the appropriate statistical significance threshold. These methods will assist investigators to more efficiently extract biologically meaningful insights from high throughput gene expression experiments.

Collapse

184

Quo CF, Kaddi C, Phan JH, Zollanvari A, Xu M, Wang MD, Alterovitz G. Reverse engineering biomolecular systems using -omic data: challenges, progress and opportunities. Brief Bioinform 2012;13:430-45. [PMID: 22833495 DOI: 10.1093/bib/bbs026] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open

185

G-cimp status prediction of glioblastoma samples using mRNA expression data. PLoS One 2012;7:e47839. [PMID: 23139755 PMCID: PMC3490960 DOI: 10.1371/journal.pone.0047839] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2012] [Accepted: 09/21/2012] [Indexed: 11/19/2022] Open

186

Parker HS, Leek JT. The practical effect of batch on genomic prediction. Stat Appl Genet Mol Biol 2012;11:Article 10. [PMID: 22611599 DOI: 10.1515/1544-6115.1766] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]

187

Lazar C, Meganck S, Taminau J, Steenhoff D, Coletta A, Molter C, Weiss-Solís DY, Duque R, Bersini H, Nowé A. Batch effect removal methods for microarray gene expression data integration: a survey. Brief Bioinform 2012;14:469-90. [PMID: 22851511 DOI: 10.1093/bib/bbs037] [Citation(s) in RCA: 216] [Impact Index Per Article: 16.6] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open

188

Identification of a radiosensitivity signature using integrative metaanalysis of published microarray data for NCI-60 cancer cells. BMC Genomics 2012;13:348. [PMID: 22846430 PMCID: PMC3472294 DOI: 10.1186/1471-2164-13-348] [Citation(s) in RCA: 114] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2011] [Accepted: 07/18/2012] [Indexed: 11/21/2022] Open

Abstract

Background

In the postgenome era, a prediction of response to treatment could lead to better dose selection for patients in radiotherapy. To identify a radiosensitive gene signature and elucidate related signaling pathways, four different microarray experiments were reanalyzed before radiotherapy.

Results

Radiosensitivity profiling data using clonogenic assay and gene expression profiling data from four published microarray platforms applied to NCI-60 cancer cell panel were used. The survival fraction at 2 Gy (SF2, range from 0 to 1) was calculated as a measure of radiosensitivity and a linear regression model was applied to identify genes or a gene set with a correlation between expression and radiosensitivity (SF2). Radiosensitivity signature genes were identified using significant analysis of microarrays (SAM) and gene set analysis was performed using a global test using linear regression model. Using the radiation-related signaling pathway and identified genes, a genetic network was generated. According to SAM, 31 genes were identified as common to all the microarray platforms and therefore a common radiosensitivity signature. In gene set analysis, functions in the cell cycle, DNA replication, and cell junction, including adherence and gap junctions were related to radiosensitivity. The integrin, VEGF, MAPK, p53, JAK-STAT and Wnt signaling pathways were overrepresented in radiosensitivity. Significant genes including ACTN1, CCND1, HCLS1, ITGB5, PFN2, PTPRC, RAB13, and WAS, which are adhesion-related molecules that were identified by both SAM and gene set analysis, and showed interaction in the genetic network with the integrin signaling pathway.

Conclusions

Integration of four different microarray experiments and gene selection using gene set analysis discovered possible target genes and pathways relevant to radiosensitivity. Our results suggested that the identified genes are candidates for radiosensitivity biomarkers and that integrin signaling via adhesion molecules could be a target for radiosensitization.

Collapse

189

Troendle JF, Yu KF, Westfall PH, Pennello G, Schisterman EF. Comparing the Expected Misclassification Cost for Two Classifiers Based on Estimates From the Same Sample. Stat Biopharm Res 2012. [DOI: 10.1080/19466315.2012.695263] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]

190

Kupfer P, Guthke R, Pohlers D, Huber R, Koczan D, Kinne RW. Batch correction of microarray data substantially improves the identification of genes differentially expressed in rheumatoid arthritis and osteoarthritis. BMC Med Genomics 2012;5:23. [PMID: 22682473 PMCID: PMC3528008 DOI: 10.1186/1755-8794-5-23] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2011] [Accepted: 05/21/2012] [Indexed: 11/10/2022] Open

Abstract

Background

Batch effects due to sample preparation or array variation (type, charge, and/or platform) may influence the results of microarray experiments and thus mask and/or confound true biological differences. Of the published approaches for batch correction, the algorithm “Combating Batch Effects When Combining Batches of Gene Expression Microarray Data” (ComBat) appears to be most suitable for small sample sizes and multiple batches.

Methods

Synovial fibroblasts (SFB; purity > 98%) were obtained from rheumatoid arthritis (RA) and osteoarthritis (OA) patients (n = 6 each) and stimulated with TNF-α or TGF-β1 for 0, 1, 2, 4, or 12 hours. Gene expression was analyzed using Affymetrix Human Genome U133 Plus 2.0 chips, an alternative chip definition file, and normalization by Robust Multi-Array Analysis (RMA). Data were batch-corrected for different acquiry dates using ComBat and the efficacy of the correction was validated using hierarchical clustering.

Results

In contrast to the hierarchical clustering dendrogram before batch correction, in which RA and OA patients clustered randomly, batch correction led to a clear separation of RA and OA. Strikingly, this applied not only to the 0 hour time point (i.e., before stimulation with TNF-α/TGF-β1), but also to all time points following stimulation except for the late 12 hour time point. Batch-corrected data then allowed the identification of differentially expressed genes discriminating between RA and OA. Batch correction only marginally modified the original data, as demonstrated by preservation of the main Gene Ontology (GO) categories of interest, and by minimally changed mean expression levels (maximal change 4.087%) or variances for all genes of interest. Eight genes from the GO category “extracellular matrix structural constituent” (5 different collagens, biglycan, and tubulointerstitial nephritis antigen-like 1) were differentially expressed between RA and OA (RA > OA), both constitutively at time point 0, and at all time points following stimulation with either TNF-α or TGF-β1.

Conclusion

Batch correction appears to be an extremely valuable tool to eliminate non-biological batch effects, and allows the identification of genes discriminating between different joint diseases. RA-SFB show an upregulated expression of extracellular matrix components, both constitutively following isolation from the synovial membrane and upon stimulation with disease-relevant cytokines or growth factors, suggesting an “imprinted” alteration of their phenotype.

Collapse

191

Verderio P. Assessing the Clinical Relevance of Oncogenic Pathways in Neoadjuvant Breast Cancer. J Clin Oncol 2012;30:1912-5. [DOI: 10.1200/jco.2012.41.7386] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023] Open

192

Gregori J, Villarreal L, Méndez O, Sánchez A, Baselga J, Villanueva J. Batch effects correction improves the sensitivity of significance tests in spectral counting-based comparative discovery proteomics. J Proteomics 2012;75:3938-51. [PMID: 22588121 DOI: 10.1016/j.jprot.2012.05.005] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2012] [Revised: 04/27/2012] [Accepted: 05/02/2012] [Indexed: 02/04/2023]

193

De Serres SA, Mfarrej BG, Grafals M, Riella LV, Magee CN, Yeung MY, Dyer C, Ahmad U, Chandraker A, Najafian N. Derivation and validation of a cytokine-based assay to screen for acute rejection in renal transplant recipients. Clin J Am Soc Nephrol 2012;7:1018-25. [PMID: 22498498 DOI: 10.2215/cjn.11051011] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]

194

Sun Z, Chai HS, Wu Y, White WM, Donkena KV, Klein CJ, Garovic VD, Therneau TM, Kocher JPA. Batch effect correction for genome-wide methylation data with Illumina Infinium platform. BMC Med Genomics 2011;4:84. [PMID: 22171553 PMCID: PMC3265417 DOI: 10.1186/1755-8794-4-84] [Citation(s) in RCA: 86] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2011] [Accepted: 12/16/2011] [Indexed: 01/12/2023] Open

Abstract

Background

Genome-wide methylation profiling has led to more comprehensive insights into gene regulation mechanisms and potential therapeutic targets. Illumina Human Methylation BeadChip is one of the most commonly used genome-wide methylation platforms. Similar to other microarray experiments, methylation data is susceptible to various technical artifacts, particularly batch effects. To date, little attention has been given to issues related to normalization and batch effect correction for this kind of data.

Methods

We evaluated three common normalization approaches and investigated their performance in batch effect removal using three datasets with different degrees of batch effects generated from HumanMethylation27 platform: quantile normalization at average β value (QNβ); two step quantile normalization at probe signals implemented in "lumi" package of R (lumi); and quantile normalization of A and B signal separately (ABnorm). Subsequent Empirical Bayes (EB) batch adjustment was also evaluated.

Results

Each normalization could remove a portion of batch effects and their effectiveness differed depending on the severity of batch effects in a dataset. For the dataset with minor batch effects (Dataset 1), normalization alone appeared adequate and "lumi" showed the best performance. However, all methods left substantial batch effects intact in the datasets with obvious batch effects and further correction was necessary. Without any correction, 50 and 66 percent of CpGs were associated with batch effects in Dataset 2 and 3, respectively. After QNβ, lumi or ABnorm, the number of CpGs associated with batch effects were reduced to 24, 32, and 26 percent for Dataset 2; and 37, 46, and 35 percent for Dataset 3, respectively. Additional EB correction effectively removed such remaining non-biological effects. More importantly, the two-step procedure almost tripled the numbers of CpGs associated with the outcome of interest for the two datasets.

Conclusion

Genome-wide methylation data from Infinium Methylation BeadChip can be susceptible to batch effects with profound impacts on downstream analyses and conclusions. Normalization can reduce part but not all batch effects. EB correction along with normalization is recommended for effective batch effect removal.

Collapse

195

Nueda MJ, Ferrer A, Conesa A. ARSyN: a method for the identification and removal of systematic noise in multifactorial time course microarray experiments. Biostatistics 2011;13:553-66. [PMID: 22085896 DOI: 10.1093/biostatistics/kxr042] [Citation(s) in RCA: 50] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open

Abstract

Transcriptomic profiling experiments that aim to the identification of responsive genes in specific biological conditions are commonly set up under defined experimental designs that try to assess the effects of factors and their interactions on gene expression. Data from these controlled experiments, however, may also contain sources of unwanted noise that can distort the signal under study, affect the residuals of applied statistical models, and hamper data analysis. Commonly, normalization methods are applied to transcriptomics data to remove technical artifacts, but these are normally based on general assumptions of transcript distribution and greatly ignore both the characteristics of the experiment under consideration and the coordinative nature of gene expression. In this paper, we propose a novel methodology, ARSyN, for the preprocessing of microarray data that takes into account these 2 last aspects. By combining analysis of variance (ANOVA) modeling of gene expression values and multivariate analysis of estimated effects, the method identifies the nonstructured part of the signal associated to the experimental factors (the noise within the signal) and the structured variation of the ANOVA errors (the signal of the noise). By removing these noise fractions from the original data, we create a filtered data set that is rich in the information of interest and includes only the random noise required for inferential analysis. In this work, we focus on multifactorial time course microarray (MTCM) experiments with 2 factors: one quantitative such as time or dosage and the other qualitative, as tissue, strain, or treatment. However, the method can be used in other situations such as experiments with only one factor or more complex designs with more than 2 factors. The filtered data obtained after applying ARSyN can be further analyzed with the appropriate statistical technique to obtain the biological information required. To evaluate the performance of the filtering strategy, we have applied different statistical approaches for MTCM analysis to several real and simulated data sets, studying also the efficiency of these techniques. By comparing the results obtained with the original and ARSyN filtered data and also with other filtering techniques, we can conclude that the proposed method increases the statistical power to detect biological signals, especially in cases where there are high levels of structural noise. Software for ARSyN is freely available at http://www.ua.es/personal/mj.nueda.

Collapse

196

Barbau-Piednoir E, Lievens A, Vandermassen E, Mbongolo-Mbella EG, Leunda-Casi A, Roosens N, Sneyers M, Van den Bulcke M. Four new SYBR®Green qPCR screening methods for the detection of Roundup Ready®, LibertyLink®, and CryIAb traits in genetically modified products. Eur Food Res Technol 2011. [DOI: 10.1007/s00217-011-1605-7] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]

197

Williams R, Schuldt B, Müller FJ. A guide to stem cell identification: progress and challenges in system-wide predictive testing with complex biomarkers. Bioessays 2011;33:880-90. [PMID: 21901750 DOI: 10.1002/bies.201100073] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]

198

Mendrick DL. Transcriptional profiling to identify biomarkers of disease and drug response. Pharmacogenomics 2011;12:235-49. [PMID: 21332316 DOI: 10.2217/pgs.10.184] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023] Open

199

Malone JH, Oliver B. Microarrays, deep sequencing and the true measure of the transcriptome. BMC Biol 2011;9:34. [PMID: 21627854 PMCID: PMC3104486 DOI: 10.1186/1741-7007-9-34] [Citation(s) in RCA: 347] [Impact Index Per Article: 24.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2010] [Accepted: 05/31/2011] [Indexed: 12/11/2022] Open

200

Chen C, Grennan K, Badner J, Zhang D, Gershon E, Jin L, Liu C. Removing batch effects in analysis of expression microarray data: an evaluation of six batch adjustment methods. PLoS One 2011;6:e17238. [PMID: 21386892 PMCID: PMC3046121 DOI: 10.1371/journal.pone.0017238] [Citation(s) in RCA: 341] [Impact Index Per Article: 24.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2010] [Accepted: 01/24/2011] [Indexed: 01/07/2023] Open