51
|
Golf O, Muirhead LJ, Speller A, Balog J, Abbassi-Ghadi N, Kumar S, Mróz A, Veselkov K, Takáts Z. XMS: cross-platform normalization method for multimodal mass spectrometric tissue profiling. JOURNAL OF THE AMERICAN SOCIETY FOR MASS SPECTROMETRY 2015; 26:44-54. [PMID: 25380777 DOI: 10.1007/s13361-014-0997-6] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/19/2014] [Revised: 09/01/2014] [Accepted: 09/02/2014] [Indexed: 06/04/2023]
Abstract
Here we present a proof of concept cross-platform normalization approach to convert raw mass spectra acquired by distinct desorption ionization methods and/or instrumental setups to cross-platform normalized analyte profiles. The initial step of the workflow is database driven peak annotation followed by summarization of peak intensities of different ions from the same molecule. The resulting compound-intensity spectra are adjusted to a method-independent intensity scale by using predetermined, compound-specific normalization factors. The method is based on the assumption that distinct MS-based platforms capture a similar set of chemical species in a biological sample, though these species may exhibit platform-specific molecular ion intensity distribution patterns. The method was validated on two sample sets of (1) porcine tissue analyzed by laser desorption ionization (LDI), desorption electrospray ionization (DESI), and rapid evaporative ionization mass spectrometric (REIMS) in combination with Fourier transformation-based mass spectrometry; and (2) healthy/cancerous colorectal tissue analyzed by DESI and REIMS with the latter being combined with time-of-flight mass spectrometry. We demonstrate the capacity of our method to reduce MS-platform specific variation resulting in (1) high inter-platform concordance coefficients of analyte intensities; (2) clear principal component based clustering of analyte profiles according to histological tissue types, irrespective of the used desorption ionization technique or mass spectrometer; and (3) accurate "blind" classification of histologic tissue types using cross-platform normalized analyte profiles.
Collapse
Affiliation(s)
- Ottmar Golf
- Institute for Inorganic and Analytical Chemistry, Justus Liebig University, Giessen, Germany
| | | | | | | | | | | | | | | | | |
Collapse
|
52
|
A Model for Cross-Platform Searches in Temporal Microarray Data. Artif Intell Med 2015. [DOI: 10.1007/978-3-319-19551-3_19] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
53
|
Chen H, Fang Y, Zhu H, Li S, Wang T, Gu P, Fang X, Wu Y, Liang J, Zeng Y, Zhang L, Qiu W, Zhang L, Yi X. Protein-protein interaction analysis of distinct molecular pathways in two subtypes of colorectal carcinoma. Mol Med Rep 2014; 10:2868-74. [PMID: 25242495 PMCID: PMC4227423 DOI: 10.3892/mmr.2014.2585] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2013] [Accepted: 06/02/2014] [Indexed: 12/30/2022] Open
Abstract
The aim of this study was to identify the molecular events that distinguish serrated colorectal carcinoma (SCRC) from conventional colorectal carcinoma (CCRC) through differential gene expression, pathway and protein-protein interaction (PPI) network analysis. The GSE4045 and GSE8671 microarray datasets were downloaded from the Gene Expression Omnibus database. We identified the genes that are differentially expressed between SCRC and normal colon tissues, CCRC and healthy tissues, and between SCRC and CCRC using Student’s t-tests and Benjamini-Hochberg (BH) multiple testing corrections. The differentially expressed genes (DEGs) were then mapped to Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways and their enrichment for specific pathways was investigated using the Database for Annotation, Visualization and Integrated Discovery (DAVID) tool with a significance threshold of 0.1. Analysis of the potential interactions between the protein products of 220 DEGs (between CCRC and SCRC) was performed by constructing a PPI network using data from the high performance RDF database (P<0.1). The interaction between pathways was also analyzed in CCRC based on the PPI network. Our study identified thousands of genes differentially expressed in SCRC and CCRC compared to healthy tissues. The DEGs in SCRC and CCRC were enriched in cell cycle, DNA replication, and base excision repair pathways. The proteasome pathway was significantly enriched in SCRC but not in CCRC after BH adjustment. The PPI network showed that tumour necrosis factor receptor-associated factor 6 (TRAF6) and atrophin 1 (ATN1) were the most central genes in the network, with respective degrees of node predicted at 90 and 88. In conclusion, the preoteasome pathway was shown to be specifically enriched in SCRC. Furthermore, TRAF6 and ATN1 may be promising biomarkers for the distinction between serrated and conventional CRC.
Collapse
Affiliation(s)
- Hanzhang Chen
- Department of Pathology, Central Hospital of Shanghai Zhabei District, Shanghai 200070, P.R. China
| | - Yunzhen Fang
- The Operating Room, Central Hospital of Shanghai Zhabei District, Shanghai 200070, P.R. China
| | - Hailong Zhu
- Department of Pathology, Tongji Hospital, Tongji University School of Medicine, Shanghai Tongji Hospital, Shanghai 200065, P.R. China
| | - Shuai Li
- Department of Pathology, Tongji Hospital, Tongji University School of Medicine, Shanghai Tongji Hospital, Shanghai 200065, P.R. China
| | - Tao Wang
- Urology Surgery, The First People's Hospital of Jingzhou, Hubei 434000, P.R. China
| | - Pan Gu
- Department of Pathology, Tongji Hospital, Tongji University School of Medicine, Shanghai Tongji Hospital, Shanghai 200065, P.R. China
| | - Xia Fang
- Hematology Department, University Medical Center of Princeton, Plainsboro, NJ 08536, USA
| | - Yunjin Wu
- Department of Pathology, Tongji Hospital, Tongji University School of Medicine, Shanghai Tongji Hospital, Shanghai 200065, P.R. China
| | - Jun Liang
- Department of Pathology, Tongji Hospital, Tongji University School of Medicine, Shanghai Tongji Hospital, Shanghai 200065, P.R. China
| | - Yu Zeng
- Department of Pathology, Tongji Hospital, Tongji University School of Medicine, Shanghai Tongji Hospital, Shanghai 200065, P.R. China
| | - Long Zhang
- Department of Pathology, Tongji Hospital, Tongji University School of Medicine, Shanghai Tongji Hospital, Shanghai 200065, P.R. China
| | - Weizhe Qiu
- Department of Pathology, Tongji Hospital, Tongji University School of Medicine, Shanghai Tongji Hospital, Shanghai 200065, P.R. China
| | - Lanjing Zhang
- Department of Pathology, University Medical Center of Princeton, Plainsboro, NJ 08536, USA
| | - Xianghua Yi
- Department of Pathology, Tongji Hospital, Tongji University School of Medicine, Shanghai Tongji Hospital, Shanghai 200065, P.R. China
| |
Collapse
|
54
|
Huang S, Yee C, Ching T, Yu H, Garmire LX. A novel model to combine clinical and pathway-based transcriptomic information for the prognosis prediction of breast cancer. PLoS Comput Biol 2014; 10:e1003851. [PMID: 25233347 PMCID: PMC4168973 DOI: 10.1371/journal.pcbi.1003851] [Citation(s) in RCA: 59] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2014] [Accepted: 08/08/2014] [Indexed: 01/19/2023] Open
Abstract
Breast cancer is the most common malignancy in women worldwide. With the increasing awareness of heterogeneity in breast cancers, better prediction of breast cancer prognosis is much needed for more personalized treatment and disease management. Towards this goal, we have developed a novel computational model for breast cancer prognosis by combining the Pathway Deregulation Score (PDS) based pathifier algorithm, Cox regression and L1-LASSO penalization method. We trained the model on a set of 236 patients with gene expression data and clinical information, and validated the performance on three diversified testing data sets of 606 patients. To evaluate the performance of the model, we conducted survival analysis of the dichotomized groups, and compared the areas under the curve based on the binary classification. The resulting prognosis genomic model is composed of fifteen pathways (e.g. P53 pathway) that had previously reported cancer relevance, and it successfully differentiated relapse in the training set (log rank p-value = 6.25e-12) and three testing data sets (log rank p-value<0.0005). Moreover, the pathway-based genomic models consistently performed better than gene-based models on all four data sets. We also find strong evidence that combining genomic information with clinical information improved the p-values of prognosis prediction by at least three orders of magnitude in comparison to using either genomic or clinical information alone. In summary, we propose a novel prognosis model that harnesses the pathway-based dysregulation as well as valuable clinical information. The selected pathways in our prognosis model are promising targets for therapeutic intervention. With the increasing awareness of heterogeneity in breast cancers, better prediction of breast cancer prognosis is much needed early on for more personalized treatment and management. Towards this goal we propose in this study a novel pathway-based prognosis prediction model, which emphasizes on individualized pathway-based risk measurement using the pathway dysregulation score (PDS). In combination with the L1-LASSO penalized feature selection and the COX-Proportional Hazards regression model, we have identified fifteen cancer relevant pathways using the pathway-based genomic model that successfully differentiated the relapse in the training set as well as three diversified test sets. Moreover, given the debate whether higher-order representative features, such as GO sets, pathways and network modules are superior to the gene-level features in the genomic models, we demonstrate that pathway-based genomic models consistently performed better than gene-based models in all four data sets. Last but not least, we show strong evidence that models that combine genomic information with clinical information improves the prognosis prediction significantly, in comparison to models that use either genomic or clinical information alone.
Collapse
Affiliation(s)
- Sijia Huang
- Molecular Biosciences and Bioengineering Graduate Program, University of Hawaii at Manoa, Honolulu, Hawaii, United States of America
- Epidemiology Program, University of Hawaii Cancer Center, Honolulu, Hawaii, United States of America
| | - Cameron Yee
- Neurobiology Program of Biology Department, University of Washington, Seattle, Washington, United States of America
| | - Travers Ching
- Molecular Biosciences and Bioengineering Graduate Program, University of Hawaii at Manoa, Honolulu, Hawaii, United States of America
- Epidemiology Program, University of Hawaii Cancer Center, Honolulu, Hawaii, United States of America
| | - Herbert Yu
- Epidemiology Program, University of Hawaii Cancer Center, Honolulu, Hawaii, United States of America
| | - Lana X. Garmire
- Molecular Biosciences and Bioengineering Graduate Program, University of Hawaii at Manoa, Honolulu, Hawaii, United States of America
- Epidemiology Program, University of Hawaii Cancer Center, Honolulu, Hawaii, United States of America
- * E-mail:
| |
Collapse
|
55
|
Ali HR, Rueda OM, Chin SF, Curtis C, Dunning MJ, Aparicio SAJR, Caldas C. Genome-driven integrated classification of breast cancer validated in over 7,500 samples. Genome Biol 2014; 15:431. [PMID: 25164602 PMCID: PMC4166472 DOI: 10.1186/s13059-014-0431-1] [Citation(s) in RCA: 140] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2014] [Accepted: 08/01/2014] [Indexed: 12/16/2022] Open
Abstract
BACKGROUND IntClust is a classification of breast cancer comprising 10 subtypes based on molecular drivers identified through the integration of genomic and transcriptomic data from 1,000 breast tumors and validated in a further 1,000. We present a reliable method for subtyping breast tumors into the IntClust subtypes based on gene expression and demonstrate the clinical and biological validity of the IntClust classification. RESULTS We developed a gene expression-based approach for classifying breast tumors into the ten IntClust subtypes by using the ensemble profile of the index discovery dataset. We evaluate this approach in 983 independent samples for which the combined copy-number and gene expression IntClust classification was available. Only 24 samples are discordantly classified. Next, we compile a consolidated external dataset composed of a further 7,544 breast tumors. We use our approach to classify all samples into the IntClust subtypes. All ten subtypes are observable in most studies at comparable frequencies. The IntClust subtypes are significantly associated with relapse-free survival and recapitulate patterns of survival observed previously. In studies of neo-adjuvant chemotherapy, IntClust reveals distinct patterns of chemosensitivity. Finally, patterns of expression of genomic drivers reported by TCGA (The Cancer Genome Atlas) are better explained by IntClust as compared to the PAM50 classifier. CONCLUSIONS IntClust subtypes are reproducible in a large meta-analysis, show clinical validity and best capture variation in genomic drivers. IntClust is a driver-based breast cancer classification and is likely to become increasingly relevant as more targeted biological therapies become available.
Collapse
Affiliation(s)
- H Raza Ali
- />Cancer Research UK Cambridge Institute, University of Cambridge, Li Ka Shing Centre, Robinson Way, CB2 0RE Cambridge, UK
- />Department of Pathology, University of Cambridge, Tennis Court Road, CB2 1QP Cambridge, UK
- />Cambridge Experimental Cancer Medicine Centre and NIHR Cambridge Biomedical, Research Centre, Cambridge University Hospitals NHS, Hills Road, CB2 0QQ Cambridge, UK
| | - Oscar M Rueda
- />Cancer Research UK Cambridge Institute, University of Cambridge, Li Ka Shing Centre, Robinson Way, CB2 0RE Cambridge, UK
| | - Suet-Feung Chin
- />Cancer Research UK Cambridge Institute, University of Cambridge, Li Ka Shing Centre, Robinson Way, CB2 0RE Cambridge, UK
| | - Christina Curtis
- />Keck School of Medicine, University of Southern California, CA 90033 California, USA
| | - Mark J Dunning
- />Cancer Research UK Cambridge Institute, University of Cambridge, Li Ka Shing Centre, Robinson Way, CB2 0RE Cambridge, UK
| | - Samuel AJR Aparicio
- />Department of Molecular Oncology, British Columbia Cancer Research Centre, Vancouver, V5Z 1L3 British Columbia, Canada
| | - Carlos Caldas
- />Cancer Research UK Cambridge Institute, University of Cambridge, Li Ka Shing Centre, Robinson Way, CB2 0RE Cambridge, UK
- />Department of Oncology, University of Cambridge, Addenbrooke’s Hospital, Hills Road, CB2 0QQ Cambridge, UK
- />Cambridge Experimental Cancer Medicine Centre and NIHR Cambridge Biomedical, Research Centre, Cambridge University Hospitals NHS, Hills Road, CB2 0QQ Cambridge, UK
| |
Collapse
|
56
|
Visualized gene network reveals the novel target transcripts Sox2 and Pax6 of neuronal development in trans-placental exposure to bisphenol A. PLoS One 2014; 9:e100576. [PMID: 25051057 PMCID: PMC4106758 DOI: 10.1371/journal.pone.0100576] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2014] [Accepted: 05/26/2014] [Indexed: 12/12/2022] Open
Abstract
Background Bisphenol A (BPA) is a ubiquitous endocrine disrupting chemical in our daily life, and its health effect in response to prenatal exposure is still controversial. Early-life BPA exposure may impact brain development and contribute to childhood neurological disorders. The aim of the present study was to investigate molecular target genes of neuronal development in trans-placental exposure to BPA. Methodology A meta-analysis of three public microarray datasets was performed to screen for differentially expressed genes (DEGs) in exposure to BPA. The candidate genes of neuronal development were identified from gene ontology analysis in a reconstructed neuronal sub-network, and their gene expressions were determined using real-time PCR in 20 umbilical cord blood samples dichotomized into high and low BPA level groups upon the median 16.8 nM. Principal Findings Among 36 neuronal transcripts sorted from DAVID ontology clusters of 457 DEGs using the analysis of Bioconductor limma package, we found two neuronal genes, sex determining region Y-box 2 (Sox2) and paired box 6 (Pax6), had preferentially down-regulated expression (Bonferroni correction p-value <10−4 and log2-transformed fold change ≤−1.2) in response to BPA exposure. Fetal cord blood samples had the obviously attenuated gene expression of Sox2 and Pax6 in high BPA group referred to low BPA group. Visualized gene network of Cytoscape analysis showed that Sox2 and Pax6 which were contributed to neural precursor cell proliferation and neuronal differentiation might be down-regulated through sonic hedgehog (Shh), vascular endothelial growth factor A (VEGFA) and Notch signaling. Conclusions These results indicated that trans-placental BPA exposure down-regulated gene expression of Sox2 and Pax6 potentially underlying the adverse effect on childhood neuronal development.
Collapse
|
57
|
Bundela S, Sharma A, Bisen PS. Potential therapeutic targets for oral cancer: ADM, TP53, EGFR, LYN, CTLA4, SKIL, CTGF, CD70. PLoS One 2014; 9:e102610. [PMID: 25029526 PMCID: PMC4110113 DOI: 10.1371/journal.pone.0102610] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2013] [Accepted: 06/20/2014] [Indexed: 12/16/2022] Open
Abstract
In India, oral cancer has consistently ranked among top three causes of cancer-related deaths, and it has emerged as a top cause for the cancer-related deaths among men. Lack of effective therapeutic options is one of the main challenges in clinical management of oral cancer patients. We interrogated large pool of samples from oral cancer gene expression studies to identify potential therapeutic targets that are involved in multiple cancer hallmark events. Therapeutic strategies directed towards such targets can be expected to effectively control cancer cells. Datasets from different gene expression studies were integrated by removing batch-effects and was used for downstream analyses, including differential expression analysis. Dependency network analysis was done to identify genes that undergo marked topological changes in oral cancer samples when compared with control samples. Causal reasoning analysis was carried out to identify significant hypotheses, which can explain gene expression profiles observed in oral cancer samples. Text-mining based approach was used to detect cancer hallmarks associated with genes significantly expressed in oral cancer. In all, 2365 genes were detected to be differentially expressed genes, which includes some of the highly differentially expressed genes like matrix metalloproteinases (MMP-1/3/10/13), chemokine (C-X-C motif) ligands (IL8, CXCL-10/-11), PTHLH, SERPINE1, NELL2, S100A7A, MAL, CRNN, TGM3, CLCA4, keratins (KRT-3/4/13/76/78), SERPINB11 and serine peptidase inhibitors (SPINK-5/7). XIST, TCEAL2, NRAS and FGFR2 are some of the important genes detected by dependency and causal network analysis. Literature mining analysis annotated 1014 genes, out of which 841 genes were statistically significantly annotated. The integration of output of various analyses, resulted in the list of potential therapeutic targets for oral cancer, which included targets such as ADM, TP53, EGFR, LYN, CTLA4, SKIL, CTGF and CD70.
Collapse
Affiliation(s)
- Saurabh Bundela
- Defence Research Development Establishment, Defence Research Development Organization, Ministry of Defence, Govt. of India, Gwalior, Madhya Pradesh, India
- Department of Postgraduate Studies & Research in Biological Sciences, Rani Durgavati University, Jabalpur, Madhya Pradesh, India
| | - Anjana Sharma
- Department of Postgraduate Studies & Research in Biological Sciences, Rani Durgavati University, Jabalpur, Madhya Pradesh, India
| | - Prakash S. Bisen
- Defence Research Development Establishment, Defence Research Development Organization, Ministry of Defence, Govt. of India, Gwalior, Madhya Pradesh, India
- School of Studies in Biotechnology, Jiwaji University, Gwalior, Madhya Pradesh, India
- * E-mail:
| |
Collapse
|
58
|
Chou WC, Cheng AL, Brotto M, Chuang CY. Visual gene-network analysis reveals the cancer gene co-expression in human endometrial cancer. BMC Genomics 2014; 15:300. [PMID: 24758163 PMCID: PMC4234489 DOI: 10.1186/1471-2164-15-300] [Citation(s) in RCA: 78] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2013] [Accepted: 04/04/2014] [Indexed: 11/10/2022] Open
Abstract
Background Endometrial cancers (ECs) are the most common form of gynecologic malignancy. Recent studies have reported that ECs reveal distinct markers for molecular pathogenesis, which in turn is linked to the various histological types of ECs. To understand further the molecular events contributing to ECs and endometrial tumorigenesis in general, a more precise identification of cancer-associated molecules and signaling networks would be useful for the detection and monitoring of malignancy, improving clinical cancer therapy, and personalization of treatments. Results ECs-specific gene co-expression networks were constructed by differential expression analysis and weighted gene co-expression network analysis (WGCNA). Important pathways and putative cancer hub genes contribution to tumorigenesis of ECs were identified. An elastic-net regularized classification model was built using the cancer hub gene signatures to predict the phenotypic characteristics of ECs. The 19 cancer hub gene signatures had high predictive power to distinguish among three key principal features of ECs: grade, type, and stage. Intriguingly, these hub gene networks seem to contribute to ECs progression and malignancy via cell-cycle regulation, antigen processing and the citric acid (TCA) cycle. Conclusions The results of this study provide a powerful biomarker discovery platform to better understand the progression of ECs and to uncover potential therapeutic targets in the treatment of ECs. This information might lead to improved monitoring of ECs and resulting improvement of treatment of ECs, the 4th most common of cancer in women.
Collapse
Affiliation(s)
| | | | | | - Chun-Yu Chuang
- Department of Biomedical Engineering and Environmental Sciences, National Tsing Hua University, Hsinchu 30013, Taiwan.
| |
Collapse
|
59
|
Abel L, Kutschki S, Turewicz M, Eisenacher M, Stoutjesdijk J, Meyer HE, Woitalla D, May C. Autoimmune profiling with protein microarrays in clinical applications. BIOCHIMICA ET BIOPHYSICA ACTA-PROTEINS AND PROTEOMICS 2014; 1844:977-87. [PMID: 24607371 DOI: 10.1016/j.bbapap.2014.02.023] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/31/2013] [Revised: 02/18/2014] [Accepted: 02/27/2014] [Indexed: 02/05/2023]
Abstract
In recent years, knowledge about immune-related disorders has substantially increased, especially in the field of central nervous system (CNS) disorders. Recent innovations in protein-related microarray technology have enabled the analysis of interactions between numerous samples and up to 20,000 targets. Antibodies directed against ion channels, receptors and other synaptic proteins have been identified, and their causative roles in different disorders have been identified. Knowledge about immunological disorders is likely to expand further as more antibody targets are discovered. Therefore, protein microarrays may become an established tool for routine diagnostic procedures in the future. The identification of relevant target proteins requires the development of new strategies to handle and process vast quantities of data so that these data can be evaluated and correlated with relevant clinical issues, such as disease progression, clinical manifestations and prognostic factors. This review will mainly focus on new protein array technologies, which allow the processing of a large number of samples, and their various applications with a deeper insight into their potential use as diagnostic tools in neurodegenerative diseases and other diseases. This article is part of a Special Issue entitled: Biomarkers: A Proteomic Challenge.
Collapse
Affiliation(s)
- Laura Abel
- Department of Medical Proteomics/Bioanalytics, Medizinisches Proteom-Center, Ruhr-Universität Bochum, 44801 Bochum, Germany
| | - Simone Kutschki
- Department of Medical Proteomics/Bioanalytics, Medizinisches Proteom-Center, Ruhr-Universität Bochum, 44801 Bochum, Germany
| | - Michael Turewicz
- Department of Medical Proteomics/Bioanalytics, Medizinisches Proteom-Center, Ruhr-Universität Bochum, 44801 Bochum, Germany
| | - Martin Eisenacher
- Department of Medical Proteomics/Bioanalytics, Medizinisches Proteom-Center, Ruhr-Universität Bochum, 44801 Bochum, Germany
| | - Jale Stoutjesdijk
- Department of Medical Proteomics/Bioanalytics, Medizinisches Proteom-Center, Ruhr-Universität Bochum, 44801 Bochum, Germany
| | - Helmut E Meyer
- Department of Medical Proteomics/Bioanalytics, Medizinisches Proteom-Center, Ruhr-Universität Bochum, 44801 Bochum, Germany; Leibniz-Institut für Analytische Wissenschaften - ISAS - e.V., Dortmund, Germany
| | - Dirk Woitalla
- S. Josef Hospital, Ruhr-University Bochum, 44780 Bochum, Germany; St. Josef-Krankenhaus Kupferdreh, Heidbergweg 22-24, 45257 Essen, Germany
| | - Caroline May
- Department of Medical Proteomics/Bioanalytics, Medizinisches Proteom-Center, Ruhr-Universität Bochum, 44801 Bochum, Germany.
| |
Collapse
|
60
|
Geeleher P, Cox NJ, Huang RS. Clinical drug response can be predicted using baseline gene expression levels and in vitro drug sensitivity in cell lines. Genome Biol 2014; 15:R47. [PMID: 24580837 PMCID: PMC4054092 DOI: 10.1186/gb-2014-15-3-r47] [Citation(s) in RCA: 575] [Impact Index Per Article: 57.5] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2013] [Accepted: 03/03/2014] [Indexed: 12/16/2022] Open
Abstract
We demonstrate a method for the prediction of chemotherapeutic response in patients using only before-treatment baseline tumor gene expression data. First, we fitted models for whole-genome gene expression against drug sensitivity in a large panel of cell lines, using a method that allows every gene to influence the prediction. Following data homogenization and filtering, these models were applied to baseline expression levels from primary tumor biopsies, yielding an in vivo drug sensitivity prediction. We validated this approach in three independent clinical trial datasets, and obtained predictions equally good, or better than, gene signatures derived directly from clinical data.
Collapse
|
61
|
Co-expression network analysis and genetic algorithms for gene prioritization in preeclampsia. BMC Med Genomics 2013; 6:51. [PMID: 24219996 PMCID: PMC3829810 DOI: 10.1186/1755-8794-6-51] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2013] [Accepted: 11/08/2013] [Indexed: 12/18/2022] Open
Abstract
BACKGROUND In this study, we explored the gene prioritization in preeclampsia, combining co-expression network analysis and genetic algorithms optimization approaches. We analysed five public projects obtaining 1,146 significant genes after cross-platform and processing of 81 and 149 microarrays in preeclamptic and normal conditions, respectively. METHODS After co-expression network construction, modular and node analysis were performed using several approaches. Moreover, genetic algorithms were also applied in combination with the nearest neighbour and discriminant analysis classification methods. RESULTS Significant differences were found in the genes connectivity distribution, both in normal and preeclampsia conditions pointing to the need and importance of examining connectivity alongside expression for prioritization. We discuss the global as well as intra-modular connectivity for hubs detection and also the utility of genetic algorithms in combination with the network information. FLT1, LEP, INHA and ENG genes were identified according to the literature, however, we also found other genes as FLNB, INHBA, NDRG1 and LYN highly significant but underexplored during normal pregnancy or preeclampsia. CONCLUSIONS Weighted genes co-expression network analysis reveals a similar distribution along the modules detected both in normal and preeclampsia conditions. However, major differences were obtained by analysing the nodes connectivity. All models obtained by genetic algorithm procedures were consistent with a correct classification, higher than 90%, restricting to 30 variables in both classification methods applied.Combining the two methods we identified well known genes related to preeclampsia, but also lead us to propose new candidates poorly explored or completely unknown in the pathogenesis of preeclampsia, which may have to be validated experimentally.
Collapse
|
62
|
Deshwar AG, Morris Q. PLIDA: cross-platform gene expression normalization using perturbed topic models. ACTA ACUST UNITED AC 2013; 30:956-61. [PMID: 24123674 DOI: 10.1093/bioinformatics/btt574] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
MOTIVATION Gene expression data are currently collected on a wide range of platforms. Differences between platforms make it challenging to combine and compare data collected on different platforms. We propose a new method of cross-platform normalization that uses topic models to summarize the expression patterns in each dataset before normalizing the topics learned from each dataset using per-gene multiplicative weights. RESULTS This method allows for cross-platform normalization even when samples profiled on different platforms have systematic differences, allows the simultaneous normalization of data from an arbitrary number of platforms and, after suitable training, allows for online normalization of expression data collected individually or in small batches. In addition, our method outperforms existing state-of-the-art platform normalization tools. AVAILABILITY AND IMPLEMENTATION MATLAB code is available at http://morrislab.med.utoronto.ca/plida/.
Collapse
Affiliation(s)
- Amit G Deshwar
- Edward S. Rogers Sr. Department of Electrical and Computer Engineering, Department of Molecular Genetics, Department of Computer Science and Terrence Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, Ontario M5S 1A1, Canada
| | | |
Collapse
|
63
|
Lauss M, Visne I, Kriegner A, Ringnér M, Jönsson G, Höglund M. Monitoring of technical variation in quantitative high-throughput datasets. Cancer Inform 2013; 12:193-201. [PMID: 24092958 PMCID: PMC3785384 DOI: 10.4137/cin.s12862] [Citation(s) in RCA: 44] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023] Open
Abstract
High-dimensional datasets can be confounded by variation from technical sources, such as batches. Undetected batch effects can have severe consequences for the validity of a study’s conclusion(s). We evaluate high-throughput RNAseq and miRNAseq as well as DNA methylation and gene expression microarray datasets, mainly from the Cancer Genome Atlas (TCGA) project, in respect to technical and biological annotations. We observe technical bias in these datasets and discuss corrective interventions. We then suggest a general procedure to control study design, detect technical bias using linear regression of principal components, correct for batch effects, and re-evaluate principal components. This procedure is implemented in the R package swamp, and as graphical user interface software. In conclusion, high-throughput platforms that generate continuous measurements are sensitive to various forms of technical bias. For such data, monitoring of technical variation is an important analysis step.
Collapse
Affiliation(s)
- Martin Lauss
- Department of Oncology, Clinical Sciences, Lund University, Sweden
| | | | | | | | | | | |
Collapse
|
64
|
Meta-analysis of genetic programs between idiopathic pulmonary fibrosis and sarcoidosis. PLoS One 2013; 8:e71059. [PMID: 23967151 PMCID: PMC3743918 DOI: 10.1371/journal.pone.0071059] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2013] [Accepted: 06/24/2013] [Indexed: 11/19/2022] Open
Abstract
Background Idiopathic pulmonary fibrosis (IPF) and pulmonary sarcoidosis are typical interstitial lung diseases with unknown etiology that cause lethal lung damages. There are notable differences between these two pulmonary disorders, although they do share some similarities. Gene expression profiles have been reported independently, but differences on the transcriptional level between these two entities have not been investigated. Methods/Results All expression data of lung tissue samples for IPF and sarcoidosis were from published datasets in the Gene Expression Omnibus (GEO) repository. After cross platform normalization, the merged sample data were grouped together and were subjected to statistical analysis for finding discriminate genes. Gene enrichments with their corresponding functions were analyzed by the online analysis engine “Database for Annotation, Visualization and Integrated Discovery” (DAVID) 6.7, and genes interactions and functional networks were further analyzed by STRING 9.0 and Cytoscape 3.0.0 Beta1. One hundred and thirty signature genes could potentially differentiate one disease state from another. Compared with normal lung tissue, tissue affected by IPF and sarcoidosis displayed similar signatures that concentrated on proliferation and differentiation. Distinctly expressed genes that could distinguish IPF from sarcoidosis are more enriched in processes of cilium biogenesis or degradation and regulating T cell activations. Key discriminative network modules involve aspects of bone morphogenetic protein receptor two (BMPR2) related and v-myb myeloblastosis viral oncogene (MYB) related proliferation. Conclusions This study is the first attempt to examine the transcriptional regulation of IPF and sarcoidosis across different studies based on different working platforms. Groups of significant genes were found to clearly distinguish one condition from the other. While IPF and sarcoidosis share notable similarities in cell proliferation, differentiation and migration, remarkable differences between the diseases were found at the transcription level, suggesting that the two diseases are regulated by overlapping yet distinctive transcriptional networks.
Collapse
|
65
|
Combining evidence of preferential gene-tissue relationships from multiple sources. PLoS One 2013; 8:e70568. [PMID: 23950964 PMCID: PMC3741196 DOI: 10.1371/journal.pone.0070568] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2013] [Accepted: 06/21/2013] [Indexed: 11/19/2022] Open
Abstract
An important challenge in drug discovery and disease prognosis is to predict genes that are preferentially expressed in one or a few tissues, i.e. showing a considerably higher expression in one tissue(s) compared to the others. Although several data sources and methods have been published explicitly for this purpose, they often disagree and it is not evident how to retrieve these genes and how to distinguish true biological findings from those that are due to choice-of-method and/or experimental settings. In this work we have developed a computational approach that combines results from multiple methods and datasets with the aim to eliminate method/study-specific biases and to improve the predictability of preferentially expressed human genes. A rule-based score is used to merge and assign support to the results. Five sets of genes with known tissue specificity were used for parameter pruning and cross-validation. In total we identify 3434 tissue-specific genes. We compare the genes of highest scores with the public databases: PaGenBase (microarray), TiGER (EST) and HPA (protein expression data). The results have 85% overlap to PaGenBase, 71% to TiGER and only 28% to HPA. 99% of our predictions have support from at least one of these databases. Our approach also performs better than any of the databases on identifying drug targets and biomarkers with known tissue-specificity.
Collapse
|
66
|
Turewicz M, May C, Ahrens M, Woitalla D, Gold R, Casjens S, Pesch B, Brüning T, Meyer HE, Nordhoff E, Böckmann M, Stephan C, Eisenacher M. Improving the default data analysis workflow for large autoimmune biomarker discovery studies with ProtoArrays. Proteomics 2013; 13:2083-7. [PMID: 23616427 PMCID: PMC3810711 DOI: 10.1002/pmic.201200518] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2012] [Revised: 02/19/2013] [Accepted: 03/26/2013] [Indexed: 12/11/2022]
Abstract
Contemporary protein microarrays such as the ProtoArray® are used for autoimmune antibody screening studies to discover biomarker panels. For ProtoArray data analysis, the software Prospector and a default workflow are suggested by the manufacturer. While analyzing a large data set of a discovery study for diagnostic biomarkers of the Parkinson’s disease (ParkCHIP), we have revealed the need for distinct improvements of the suggested workflow concerning raw data acquisition, normalization and preselection method availability, batch effects, feature selection, and feature validation. In this work, appropriate improvements of the default workflow are proposed. It is shown that completely automatic data acquisition as a batch, a re-implementation of Prospector’s pre-selection method, multivariate or hybrid feature selection, and validation of the selected protein panel using an independent test set define in combination an improved workflow for large studies.
Collapse
Affiliation(s)
- Michael Turewicz
- Medizinisches Proteom-Center, Ruhr-University Bochum, Bochum, Germany.
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
67
|
Lee J, Lee S. Cross Platform Data Analysis in Microarray Experiment. KOREAN JOURNAL OF APPLIED STATISTICS 2013. [DOI: 10.5351/kjas.2013.26.2.307] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
|
68
|
Lahti L, Torrente A, Elo LL, Brazma A, Rung J. A fully scalable online pre-processing algorithm for short oligonucleotide microarray atlases. Nucleic Acids Res 2013; 41:e110. [PMID: 23563154 PMCID: PMC3664815 DOI: 10.1093/nar/gkt229] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2023] Open
Abstract
Rapid accumulation of large and standardized microarray data collections is opening up novel opportunities for holistic characterization of genome function. The limited scalability of current preprocessing techniques has, however, formed a bottleneck for full utilization of these data resources. Although short oligonucleotide arrays constitute a major source of genome-wide profiling data, scalable probe-level techniques have been available only for few platforms based on pre-calculated probe effects from restricted reference training sets. To overcome these key limitations, we introduce a fully scalable online-learning algorithm for probe-level analysis and pre-processing of large microarray atlases involving tens of thousands of arrays. In contrast to the alternatives, our algorithm scales up linearly with respect to sample size and is applicable to all short oligonucleotide platforms. The model can use the most comprehensive data collections available to date to pinpoint individual probes affected by noise and biases, providing tools to guide array design and quality control. This is the only available algorithm that can learn probe-level parameters based on sequential hyperparameter updates at small consecutive batches of data, thus circumventing the extensive memory requirements of the standard approaches and opening up novel opportunities to take full advantage of contemporary microarray collections.
Collapse
Affiliation(s)
- Leo Lahti
- Department of Veterinary Bioscience, University of Helsinki, Agnes Sjöbergin katu 2, PO Box 66, FI-00014 University of Helsinki, Finland.
| | | | | | | | | |
Collapse
|
69
|
Heider A, Alt R. virtualArray: a R/bioconductor package to merge raw data from different microarray platforms. BMC Bioinformatics 2013; 14:75. [PMID: 23452776 PMCID: PMC3599117 DOI: 10.1186/1471-2105-14-75] [Citation(s) in RCA: 41] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2012] [Accepted: 02/22/2013] [Indexed: 11/10/2022] Open
Abstract
Background Microarrays have become a routine tool to address diverse biological questions. Therefore, different types and generations of microarrays have been produced by several manufacturers over time. Likewise, the diversity of raw data deposited in public databases such as NCBI GEO or EBI ArrayExpress has grown enormously. This has resulted in databases currently containing several hundred thousand microarray samples clustered by different species, manufacturers and chip generations. While one of the original goals of these databases was to make the data available to other researchers for independent analysis and, where appropriate, integration with their own data, current software implementations could not provide that feature. Only those data sets generated on the same chip platform can be readily combined and even here there are batch effects to be taken care of. A straightforward approach to deal with multiple chip types and batch effects has been missing. The software presented here was designed to solve both of these problems in a convenient and user friendly way. Results The virtualArray software package can combine raw data sets using almost any chip types based on current annotations from NCBI GEO or Bioconductor. After establishing congruent annotations for the raw data, virtualArray can then directly employ one of seven implemented methods to adjust for batch effects in the data resulting from differences between the chip types used. Both steps can be tuned to the preferences of the user. When the run is finished, the whole dataset is presented as a conventional Bioconductor “ExpressionSet” object, which can be used as input to other Bioconductor packages. Conclusions Using this software package, researchers can easily integrate their own microarray data with data from public repositories or other sources that are based on different microarray chip types. Using the default approach a robust and up-to-date batch effect correction technique is applied to the data.
Collapse
Affiliation(s)
- Andreas Heider
- Translational Centre for Regenerative Medicine Leipzig, University of Leipzig, Semmelweisstr. 14, Leipzig 04103, Germany.
| | | |
Collapse
|
70
|
Abstract
Our understanding of gene expression has changed dramatically over the past decade, largely catalysed by technological developments. High-throughput experiments - microarrays and next-generation sequencing - have generated large amounts of genome-wide gene expression data that are collected in public archives. Added-value databases process, analyse and annotate these data further to make them accessible to every biologist. In this Review, we discuss the utility of the gene expression data that are in the public domain and how researchers are making use of these data. Reuse of public data can be very powerful, but there are many obstacles in data preparation and analysis and in the interpretation of the results. We will discuss these challenges and provide recommendations that we believe can improve the utility of such data.
Collapse
Affiliation(s)
- Johan Rung
- EMBL-EBI, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | | |
Collapse
|
71
|
Taminau J, Meganck S, Lazar C, Steenhoff D, Coletta A, Molter C, Duque R, de Schaetzen V, Weiss Solís DY, Bersini H, Nowé A. Unlocking the potential of publicly available microarray data using inSilicoDb and inSilicoMerging R/Bioconductor packages. BMC Bioinformatics 2012; 13:335. [PMID: 23259851 PMCID: PMC3568420 DOI: 10.1186/1471-2105-13-335] [Citation(s) in RCA: 134] [Impact Index Per Article: 11.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2012] [Accepted: 12/18/2012] [Indexed: 12/20/2022] Open
Abstract
Background With an abundant amount of microarray gene expression data sets available through public repositories, new possibilities lie in combining multiple existing data sets. In this new context, analysis itself is no longer the problem, but retrieving and consistently integrating all this data before delivering it to the wide variety of existing analysis tools becomes the new bottleneck. Results We present the newly released inSilicoMerging R/Bioconductor package which, together with the earlier released inSilicoDb R/Bioconductor package, allows consistent retrieval, integration and analysis of publicly available microarray gene expression data sets. Inside the inSilicoMerging package a set of five visual and six quantitative validation measures are available as well. Conclusions By providing (i) access to uniformly curated and preprocessed data, (ii) a collection of techniques to remove the batch effects between data sets from different sources, and (iii) several validation tools enabling the inspection of the integration process, these packages enable researchers to fully explore the potential of combining gene expression data for downstream analysis. The power of using both packages is demonstrated by programmatically retrieving and integrating gene expression studies from the InSilico DB repository [https://insilicodb.org/app/].
Collapse
Affiliation(s)
- Jonatan Taminau
- AI (CoMo), Vrije Universiteit Brussel, Pleinlaan 2, 1050 Brussels, Belgium.
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
72
|
Manshaei R, Sobhe Bidari P, Aliyari Shoorehdeli M, Feizi A, Lohrasebi T, Malboobi MA, Kyan M, Alirezaie J. Hybrid-controlled neurofuzzy networks analysis resulting in genetic regulatory networks reconstruction. ISRN BIOINFORMATICS 2012; 2012:419419. [PMID: 25969749 PMCID: PMC4393070 DOI: 10.5402/2012/419419] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/10/2012] [Accepted: 08/15/2012] [Indexed: 12/03/2022]
Abstract
Reverse engineering of gene regulatory networks (GRNs) is the process of estimating genetic interactions of a cellular system from gene expression data. In this paper, we propose a novel hybrid systematic algorithm based on neurofuzzy network for reconstructing GRNs from observational gene expression data when only a medium-small number of measurements are available. The approach uses fuzzy logic to transform gene expression values into qualitative descriptors that can be evaluated by using a set of defined rules. The algorithm uses neurofuzzy network to model genes effects on other genes followed by four stages of decision making to extract gene interactions. One of the main features of the proposed algorithm is that an optimal number of fuzzy rules can be easily and rapidly extracted without overparameterizing. Data analysis and simulation are conducted on microarray expression profiles of S. cerevisiae cell cycle and demonstrate that the proposed algorithm not only selects the patterns of the time series gene expression data accurately, but also provides models with better reconstruction accuracy when compared with four published algorithms: DBNs, VBEM, time delay ARACNE, and PF subjected to LASSO. The accuracy of the proposed approach is evaluated in terms of recall and F-score for the network reconstruction task.
Collapse
Affiliation(s)
- Roozbeh Manshaei
- Electrical and Computer Engineering Department, Ryerson University, Toronto, ON, Canada M5B 2K3
| | - Pooya Sobhe Bidari
- Electrical and Computer Engineering Department, Ryerson University, Toronto, ON, Canada M5B 2K3
| | - Mahdi Aliyari Shoorehdeli
- Electrical and Computer Engineering Department, K.N. Toosi University of Technology, Tehran 16315-1355, Iran
| | - Amir Feizi
- Department of Chemical and Biological Engineering, Systems and Synthetic Biology Group, Chalmers University, 41296 Gutenberg, Sweden
| | - Tahmineh Lohrasebi
- National Institute of Genetic Engineering and Biotechnology (NIGEB), Tehran 14965/161, Iran
| | - Mohammad Ali Malboobi
- National Institute of Genetic Engineering and Biotechnology (NIGEB), Tehran 14965/161, Iran
| | - Matthew Kyan
- Electrical and Computer Engineering Department, Ryerson University, Toronto, ON, Canada M5B 2K3
| | - Javad Alirezaie
- Electrical and Computer Engineering Department, Ryerson University, Toronto, ON, Canada M5B 2K3
| |
Collapse
|
73
|
Vaughan AA, Dunn WB, Allwood JW, Wedge DC, Blackhall FH, Whetton AD, Dive C, Goodacre R. Liquid Chromatography–Mass Spectrometry Calibration Transfer and Metabolomics Data Fusion. Anal Chem 2012; 84:9848-57. [DOI: 10.1021/ac302227c] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023]
Affiliation(s)
- Andrew A. Vaughan
- School of Chemistry, Manchester Institute
of Biotechnology, University of Manchester, 131 Princess Street, Manchester, M1 7DN,
United Kingdom
| | - Warwick B. Dunn
- Centre for Advanced
Discovery and Experimental Therapeutics (CADET), Central
Manchester NHS Foundation Trust and School of Biomedicine, University of Manchester, Manchester Academic Health
Science Centre, York Place, Oxford Road, Manchester, M13 9WL, United
Kingdom
- Manchester
Centre
for Integrative Systems Biology, Manchester Institute
of Biotechnology, University of Manchester, 131 Princess Street, Manchester, M1 7DN, United Kingdom
| | - J. William Allwood
- School of Chemistry, Manchester Institute
of Biotechnology, University of Manchester, 131 Princess Street, Manchester, M1 7DN,
United Kingdom
| | - David C. Wedge
- School of Chemistry, Manchester Institute
of Biotechnology, University of Manchester, 131 Princess Street, Manchester, M1 7DN,
United Kingdom
- Cancer Genome Project, Wellcome Trust Sanger Institute, Hinxton,
Cambridgeshire, CB10 1SA, United Kingdom
| | - Fiona H. Blackhall
- Clinical and Experimental
Pharmacology Group, Paterson Institute for Cancer Research
and Manchester Cancer Research Centre (MCRC), Manchester Academic
Health Science Centre, University of Manchester, Wilmslow Road, Withington, Manchester, M20 4BX, United Kingdom
| | - Anthony D. Whetton
- School of Cancer and Enabling
Sciences, Manchester Academic Health Science Centre, University of Manchester, Manchester, M20 3LJ, United
Kingdom
| | - Caroline Dive
- Clinical and Experimental
Pharmacology Group, Paterson Institute for Cancer Research
and Manchester Cancer Research Centre (MCRC), Manchester Academic
Health Science Centre, University of Manchester, Wilmslow Road, Withington, Manchester, M20 4BX, United Kingdom
| | - Royston Goodacre
- School of Chemistry, Manchester Institute
of Biotechnology, University of Manchester, 131 Princess Street, Manchester, M1 7DN,
United Kingdom
- Manchester
Centre
for Integrative Systems Biology, Manchester Institute
of Biotechnology, University of Manchester, 131 Princess Street, Manchester, M1 7DN, United Kingdom
| |
Collapse
|
74
|
Gene expression signatures of angiocidin and darapladib treatment connect to therapy options in cervical cancer. J Cancer Res Clin Oncol 2012; 139:259-67. [PMID: 23052694 DOI: 10.1007/s00432-012-1317-9] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2012] [Accepted: 09/13/2012] [Indexed: 01/24/2023]
Abstract
PURPOSE To assign functional properties to gene expression profiles of cervical cancer stages and identify clinically relevant biomarker genes. EXPERIMENTAL DESIGN Microarray samples of 24 normal and 102 cervical cancer biopsies from four publicly available studies were pooled and evaluated. High-quality microarrays were normalized using the CONOR package from the Bioconductor project. Gene expression profiling was performed using variance-component analysis for accessing most reliable probes, which were subsequently processed by gene set enrichment analysis. RESULTS Of 22.277 probes that were subject to variance-component analysis, eleven probes had low heterogeneity, that is, a W/T ratio between 0.18 and 0.38. Seven of these probes are induced in all cervical cancer stages: they are GINS1, PAK2, DTL, AURKA, PRKDC, NEK2 and CEP55. The other four probes are induced in normal cervix: P11, EMP1, UPK1A and HSPC159. We performed GSEA of 9.873 probes exhibiting less variability, that is, having a W/T ratio of <0.75. Repeatedly, significant gene expression signatures were found that are related to treatment using angiocidin and darapladib. Additionally, expression signatures from immunological disease signatures were found, for example graft versus host disease and acute kidney rejection. Another finding comprises a gene expression signature in stage IB2 that refers to MT1-MMP-dependent migration and invasion. This gene signature is accompanied by gene expression signatures which refer to ECM receptor-mediated interactions. CONCLUSION Analysis of cervical cancer patient gene expression data reveals a novel perspective on HPV-mediated transcription processes. This novel point of view contains a better understanding and even might provide improvements to cancer therapy.
Collapse
|
75
|
Turnbull AK, Kitchen RR, Larionov AA, Renshaw L, Dixon JM, Sims AH. Direct integration of intensity-level data from Affymetrix and Illumina microarrays improves statistical power for robust reanalysis. BMC Med Genomics 2012; 5:35. [PMID: 22909195 PMCID: PMC3443058 DOI: 10.1186/1755-8794-5-35] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2012] [Accepted: 08/15/2012] [Indexed: 12/18/2022] Open
Abstract
Background Affymetrix GeneChips and Illumina BeadArrays are the most widely used commercial single channel gene expression microarrays. Public data repositories are an extremely valuable resource, providing array-derived gene expression measurements from many thousands of experiments. Unfortunately many of these studies are underpowered and it is desirable to improve power by combining data from more than one study; we sought to determine whether platform-specific bias precludes direct integration of probe intensity signals for combined reanalysis. Results Using Affymetrix and Illumina data from the microarray quality control project, from our own clinical samples, and from additional publicly available datasets we evaluated several approaches to directly integrate intensity level expression data from the two platforms. After mapping probe sequences to Ensembl genes we demonstrate that, ComBat and cross platform normalisation (XPN), significantly outperform mean-centering and distance-weighted discrimination (DWD) in terms of minimising inter-platform variance. In particular we observed that DWD, a popular method used in a number of previous studies, removed systematic bias at the expense of genuine biological variability, potentially reducing legitimate biological differences from integrated datasets. Conclusion Normalised and batch-corrected intensity-level data from Affymetrix and Illumina microarrays can be directly combined to generate biologically meaningful results with improved statistical power for robust, integrated reanalysis.
Collapse
Affiliation(s)
- Arran K Turnbull
- Breakthrough Research Unit, University of Edinburgh, Crewe Road South, Edinburgh, EH4 2XR, UK
| | | | | | | | | | | |
Collapse
|
76
|
Lazar C, Meganck S, Taminau J, Steenhoff D, Coletta A, Molter C, Weiss-Solís DY, Duque R, Bersini H, Nowé A. Batch effect removal methods for microarray gene expression data integration: a survey. Brief Bioinform 2012; 14:469-90. [PMID: 22851511 DOI: 10.1093/bib/bbs037] [Citation(s) in RCA: 210] [Impact Index Per Article: 17.5] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open
Abstract
Genomic data integration is a key goal to be achieved towards large-scale genomic data analysis. This process is very challenging due to the diverse sources of information resulting from genomics experiments. In this work, we review methods designed to combine genomic data recorded from microarray gene expression (MAGE) experiments. It has been acknowledged that the main source of variation between different MAGE datasets is due to the so-called 'batch effects'. The methods reviewed here perform data integration by removing (or more precisely attempting to remove) the unwanted variation associated with batch effects. They are presented in a unified framework together with a wide range of evaluation tools, which are mandatory in assessing the efficiency and the quality of the data integration process. We provide a systematic description of the MAGE data integration methodology together with some basic recommendation to help the users in choosing the appropriate tools to integrate MAGE data for large-scale analysis; and also how to evaluate them from different perspectives in order to quantify their efficiency. All genomic data used in this study for illustration purposes were retrieved from InSilicoDB http://insilico.ulb.ac.be.
Collapse
Affiliation(s)
- Cosmin Lazar
- Como, Vrije Universiteit Brussel, Pleinlaanz, 1050 Brussels, Belgium.
| | | | | | | | | | | | | | | | | | | |
Collapse
|
77
|
Feature extraction via composite scoring and voting in breast cancer. Breast Cancer Res Treat 2012; 135:307-18. [PMID: 22833200 DOI: 10.1007/s10549-012-2177-3] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2012] [Accepted: 07/17/2012] [Indexed: 01/22/2023]
Abstract
Identification and characterization of tumor subtypes using gene expression profiles of triple negative breast cancer patients. Microarray data of four breast cancer studies were pooled and evaluated. Molecular subtype classification was performed using random forest and a novel algorithm for feature extraction via composite scoring and voting. Biological and clinical properties were evaluated via GSEA, functional annotation clustering and clinical endpoint analysis. The subtype signatures are highly predictive for distant metastasis free survival of tamoxifen-treated patients. Consensus clustering and the novel algorithm proposed three triple negative subtypes. One subtype shows low E2F4 gene expression and is predictive for survival of ER negative breast cancer patients. The other two subtypes share commonalities with luminal B tumors. Classification of breast cancer expression profiles may reveal novel tumor subtypes, possessing clinical impact. Furthermore, subtype characterizing gene signatures might hold potential for novel strategies in cancer therapy.
Collapse
|