1
|
Pandey D, Perumal P. O. Improved meta-analysis pipeline ameliorates distinctive gene regulators of diabetic vasculopathy in human endothelial cell (hECs) RNA-Seq data. PLoS One 2023; 18:e0293939. [PMID: 37943808 PMCID: PMC10635490 DOI: 10.1371/journal.pone.0293939] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2023] [Accepted: 10/21/2023] [Indexed: 11/12/2023] Open
Abstract
Enormous gene expression data generated through next-generation sequencing (NGS) technologies are accessible to the scientific community via public repositories. The data harboured in these repositories are foundational for data integrative studies enabling large-scale data analysis whose potential is yet to be fully realized. Prudent integration of individual gene expression data i.e. RNA-Seq datasets is remarkably challenging as it encompasses an assortment and series of data analysis steps that requires to be accomplished before arriving at meaningful insights on biological interrogations. These insights are at all times latent within the data and are not usually revealed from the modest individual data analysis owing to the limited number of biological samples in individual studies. Nevertheless, a sensibly designed meta-analysis of select individual studies would not only maximize the sample size of the analysis but also significantly improves the statistical power of analysis thereby revealing the latent insights. In the present study, a custom-built meta-analysis pipeline is presented for the integration of multiple datasets from different origins. As a case study, we have tested with the integration of two relevant datasets pertaining to diabetic vasculopathy retrieved from the open source domain. We report the meta-analysis ameliorated distinctive and latent gene regulators of diabetic vasculopathy and uncovered a total of 975 i.e. 930 up-regulated and 45 down-regulated gene signatures. Further investigation revealed a subset of 14 DEGs including CTLA4, CALR, G0S2, CALCR, OMA1, and DNAJC3 as latent i.e. novel as these signatures have not been reported earlier. Moreover, downstream investigations including enrichment analysis, and protein-protein interaction (PPI) network analysis of DEGs revealed durable disease association signifying their potential as novel transcriptomic biomarkers of diabetic vasculopathy. While the meta-analysis of individual whole transcriptomic datasets for diabetic vasculopathy is exclusive to our comprehension, however, the novel meta-analysis pipeline could very well be extended to study the mechanistic links of DEGs in other disease conditions.
Collapse
Affiliation(s)
- Diksha Pandey
- Department of Biotechnology, National Institute of Technology, Warangal, India
| | - Onkara Perumal P.
- Department of Biotechnology, National Institute of Technology, Warangal, India
| |
Collapse
|
2
|
Mishra A, Chanchal S, Ashraf MZ. Host-Viral Interactions Revealed among Shared Transcriptomics Signatures of ARDS and Thrombosis: A Clue into COVID-19 Pathogenesis. TH OPEN 2020; 4:e403-e412. [PMID: 33354650 PMCID: PMC7746517 DOI: 10.1055/s-0040-1721706] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2020] [Accepted: 11/02/2020] [Indexed: 01/07/2023] Open
Abstract
Severe novel corona virus disease 2019 (COVID-19) infection is associated with a considerable activation of coagulation pathways, endothelial damage, and subsequent thrombotic microvascular injuries. These consistent observations may have serious implications for the treatment and management of this highly pathogenic disease. As a consequence, the anticoagulant therapeutic strategies, such as low molecular weight heparin, have shown some encouraging results. Cytokine burst leading to sepsis which is one of the primary reasons for acute respiratory distress syndrome (ARDS) drive that could be worsened with the accumulation of coagulation factors in the lungs of COVID-19 patients. However, the obscurity of this syndrome remains a hurdle in making decisive treatment choices. Therefore, an attempt to characterize shared biological mechanisms between ARDS and thrombosis using comprehensive transcriptomics meta-analysis is made. We conducted an integrated gene expression meta-analysis of two independently publicly available datasets of ARDS and venous thromboembolism (VTE). Datasets GSE76293 and GSE19151 derived from National Centre for Biotechnology Information–Gene Expression Omnibus (NCBI-GEO) database were used for ARDS and VTE, respectively. Integrative meta-analysis of expression data (INMEX) tool preprocessed the datasets and effect size combination with random effect modeling was used for obtaining differentially expressed genes (DEGs). Network construction was done for hub genes and pathway enrichment analysis. Our meta-analysis identified a total of 1,878 significant DEGs among the datasets, which when subjected to enrichment analysis suggested inflammation–coagulation–hypoxemia convolutions in COVID-19 pathogenesis. The top hub genes of our study such as tumor protein 53 (TP53), lysine acetyltransferase 2B (KAT2B), DExH-box helicase 9 (DHX9), REL-associated protein (RELA), RING-box protein 1 (RBX1), and proteasome 20S subunit beta 2 (PSMB2) gave insights into the genes known to be participating in the host–virus interactions that could pave the way to understand the various strategies deployed by the virus to improve its replication and spreading.
Collapse
Affiliation(s)
- Aastha Mishra
- Department of Biotechnology, Faculty of Natural Sciences, Jamia Millia Islamia, New Delhi, India
| | - Shankar Chanchal
- Department of Biotechnology, Faculty of Natural Sciences, Jamia Millia Islamia, New Delhi, India
| | - Mohammad Z Ashraf
- Department of Biotechnology, Faculty of Natural Sciences, Jamia Millia Islamia, New Delhi, India
| |
Collapse
|
3
|
Vennou KE, Piovani D, Kontou PI, Bonovas S, Bagos PG. Multiple outcome meta-analysis of gene-expression data in inflammatory bowel disease. Genomics 2020; 112:1761-1767. [DOI: 10.1016/j.ygeno.2019.09.019] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2019] [Revised: 09/26/2019] [Accepted: 09/27/2019] [Indexed: 01/02/2023]
|
4
|
Meta-analysis of gene expression profiles in preeclampsia. Pregnancy Hypertens 2020; 19:52-60. [DOI: 10.1016/j.preghy.2019.12.007] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2019] [Accepted: 12/18/2019] [Indexed: 01/12/2023]
|
5
|
Kontou P, Pavlopoulou A, Braliou G, Bogiatzi S, Dimou N, Bangalore S, Bagos P. Identification of gene expression profiles in myocardial infarction: a systematic review and meta-analysis. BMC Med Genomics 2018; 11:109. [PMID: 30482209 PMCID: PMC6260684 DOI: 10.1186/s12920-018-0427-x] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2018] [Accepted: 11/07/2018] [Indexed: 12/20/2022] Open
Abstract
BACKGROUND Myocardial infarction (MI) is a multifactorial disease with complex pathogenesis, mainly the result of the interplay of genetic and environmental risk factors. The regulation of thrombosis, inflammation and cholesterol and lipid metabolism are the main factors that have been proposed thus far to be involved in the pathogenesis of MI. Traditional risk-estimation tools depend largely on conventional risk factors but there is a need for identification of novel biochemical and genetic markers. The aim of the study is to identify differentially expressed genes that are consistently associated with the incidence myocardial infarction (MI), which could be potentially incorporated into the traditional cardiovascular diseases risk factors models. METHODS The biomedical literature and gene expression databases, PubMed and GEO, respectively, were searched following the PRISMA guidelines. The key inclusion criteria were gene expression data derived from case-control studies on MI patients from blood samples. Gene expression datasets regarding the effect of medicinal drugs on MI were excluded. The t-test was applied to gene expression data from case-control studies in MI patients. RESULTS A total of 162 articles and 174 gene expression datasets were retrieved. Of those a total of 4 gene expression datasets met the inclusion criteria, which contained data on 31,180 loci in 93 MI patients and 89 healthy individuals. Collectively, 626 differentially expressed genes were detected in MI patients as compared to non-affected individuals at an FDR q-value = 0.01. Of those, 88 genes/gene products were interconnected in an interaction network. Totally, 15 genes were identified as hubs of the network. CONCLUSIONS Functional enrichment analyses revealed that the DEGs and that they are mainly involved in inflammatory/wound healing, RNA processing/transport mechanisms and a yet not fully characterized pathway implicated in RNA transport and nuclear pore proteins. The overlap between the DEGs identified in this study and the genes identified through genetic-association studies is minimal. These data could be useful in future studies on the molecular mechanisms of MI as well as diagnostic and prognostic markers.
Collapse
Affiliation(s)
- Panagiota Kontou
- Department of Computer Science and Biomedical Informatics, University of Thessaly, 35131, Lamia, Greece
| | - Athanasia Pavlopoulou
- Izmir Biomedicine and Genome Institute, Dokuz Eylül University Health Campus, 35340, Izmir, Turkey
| | - Georgia Braliou
- Department of Computer Science and Biomedical Informatics, University of Thessaly, 35131, Lamia, Greece
| | - Spyridoula Bogiatzi
- Department of Computer Science and Biomedical Informatics, University of Thessaly, 35131, Lamia, Greece
| | - Niki Dimou
- Department of Hygiene and Epidemiology, University of Ioannina School of Medicine, Stavros Niarchos Av, 45110, Ioannina, Greece
| | - Sripal Bangalore
- School of Medicine, New York University, New York, NY 10016, USA
| | - Pantelis Bagos
- Department of Computer Science and Biomedical Informatics, University of Thessaly, 35131, Lamia, Greece. .,Lamia, University of Thessaly, Papasiopoulou 2-4, 35131, Lamia, Greece.
| |
Collapse
|
6
|
Kontou PI, Pavlopoulou A, Bagos PG. Methods of Analysis and Meta-Analysis for Identifying Differentially Expressed Genes. Methods Mol Biol 2018; 1793:183-210. [PMID: 29876898 DOI: 10.1007/978-1-4939-7868-7_12] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
Microarray approaches are widely used high-throughput techniques to assess simultaneously the expression of thousands of genes under certain conditions and study the effects of certain treatments, diseases, and developmental stages. The traditional way to perform such experiments is to design oligonucleotide hybridization probes that correspond to specific genes and then measure the expression of the genes in order to determine which of them are up- or down-regulated compared to a condition that is used as a control. Hitherto, individual experiments cannot capture the bigger picture of how a biological system works and, therefore, data integration from multiple experimental studies and external data repositories is necessary to understand the function of genes and their expression patterns under certain conditions. Therefore, the development of methods for handling, integrating, comparing, interpreting and visualizing microarray data is necessary. The selection of an appropriate method for analysing microarray datasets is not an easy task. In this chapter, we provide an overview of the various methods developed for microarray data analysis, as well as suggestions for choosing the appropriate method for microarray meta-analysis.
Collapse
Affiliation(s)
- Panagiota I Kontou
- Department of Computer Science and Biomedical Informatics, University of Thessaly, Lamia, Greece
| | - Athanasia Pavlopoulou
- Department of Computer Science and Biomedical Informatics, University of Thessaly, Lamia, Greece.,International Biomedicine and Genome Institute (iBG-Izmir), Dokuz Eylul University, Izmir, 35340, Turkey
| | - Pantelis G Bagos
- Department of Computer Science and Biomedical Informatics, University of Thessaly, Lamia, Greece.
| |
Collapse
|
7
|
Wei Z, Wang X, Conlon EM. Parallel Markov chain Monte Carlo for Bayesian dynamic item response models in educational testing. Stat (Int Stat Inst) 2017. [DOI: 10.1002/sta4.164] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Affiliation(s)
- Zheng Wei
- Department of Mathematics and Statistics; University of Maine; Orono 04469 ME USA
| | - Xiaojing Wang
- Department of Statistics; University of Connecticut; Storrs 06269 CT USA
| | - Erin Marie Conlon
- Department of Mathematics and Statistics; University of Massachusetts; Amherst 01003 MA USA
| |
Collapse
|
8
|
Li N, McCall MN, Wu Z. Establishing Informative Prior for Gene Expression Variance from Public Databases. STATISTICS IN BIOSCIENCES 2017. [DOI: 10.1007/s12561-016-9172-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
9
|
Wang T, Zhang L, Tian P, Tian S. Identification of differentially-expressed genes between early-stage adenocarcinoma and squamous cell carcinoma lung cancer using meta-analysis methods. Oncol Lett 2017; 13:3314-3322. [PMID: 28521438 DOI: 10.3892/ol.2017.5838] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2015] [Accepted: 10/06/2016] [Indexed: 01/04/2023] Open
Abstract
Lung adenocarcinoma (AC) and squamous cell lung carcinoma (SCC) are two major subtypes of non-small cell lung cancer (NSCLC). Previous studies have demonstrated that fundamental differences exist in the underlying mechanisms of tumor development, growth and invasion between these subtypes. The investigation of differentially-expressed genes (DEGs) between these two NSCLC subtypes is useful for determining and understanding such differences. The present study aimed to identify those DEGs using meta-analysis and the data from four microarray experiments, consisting of 164 AC and 161 SCC samples. Raw gene expression values were converted into the probability of expression (POE) representing the differentially-expressed probability of a gene and expression barcode values representing its expression status. The results indicated that when applying a meta-analysis using barcode values, heterogeneity in genes across studies was less severe than when applying a meta-analysis using POE values. DEGs in each meta-analysis method overlapped substantially (P=1.3×10-4), but the barcode method yielded a lower global false discovery rate. Based on this and several other performance statistics, it was concluded that the barcode approach outperformed the POE method. Finally, using those DEGs, ontology and pathway analyses were conducted. A number of genes and enriched pathways were found to be closely associated with NSCLC.
Collapse
Affiliation(s)
- Tianjiao Wang
- School of Life Science, Jilin University, Changchun, Jilin 130012, P.R. China
| | - Lei Zhang
- School of Life Science, Jilin University, Changchun, Jilin 130012, P.R. China.,Department of Neurology, The Second Hospital of Jilin University, Changchun, Jilin 130041, P.R. China
| | - Pu Tian
- School of Life Science, Jilin University, Changchun, Jilin 130012, P.R. China
| | - Suyan Tian
- Division of Clinical Epidemiology, First Hospital of Jilin University, Changchun, Jilin 130021, P.R. China
| |
Collapse
|
10
|
Kavakiotis I, Xochelli A, Agathangelidis A, Tsoumakas G, Maglaveras N, Stamatopoulos K, Hadzidimitriou A, Vlahavas I, Chouvarda I. Integrating multiple immunogenetic data sources for feature extraction and mining somatic hypermutation patterns: the case of "towards analysis" in chronic lymphocytic leukaemia. BMC Bioinformatics 2016; 17 Suppl 5:173. [PMID: 27295298 PMCID: PMC4905615 DOI: 10.1186/s12859-016-1044-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/29/2023] Open
Abstract
BACKGROUND Somatic Hypermutation (SHM) refers to the introduction of mutations within rearranged V(D)J genes, a process that increases the diversity of Immunoglobulins (IGs). The analysis of SHM has offered critical insight into the physiology and pathology of B cells, leading to strong prognostication markers for clinical outcome in chronic lymphocytic leukaemia (CLL), the most frequent adult B-cell malignancy. In this paper we present a methodology for integrating multiple immunogenetic and clinocobiological data sources in order to extract features and create high quality datasets for SHM analysis in IG receptors of CLL patients. This dataset is used as the basis for a higher level integration procedure, inspired form social choice theory. This is applied in the Towards Analysis, our attempt to investigate the potential ontogenetic transformation of genes belonging to specific stereotyped CLL subsets towards other genes or gene families, through SHM. RESULTS The data integration process, followed by feature extraction, resulted in the generation of a dataset containing information about mutations occurring through SHM. The Towards analysis performed on the integrated dataset applying voting techniques, revealed the distinct behaviour of subset #201 compared to other subsets, as regards SHM related movements among gene clans, both in allele-conserved and non-conserved gene areas. With respect to movement between genes, a high percentage movement towards pseudo genes was found in all CLL subsets. CONCLUSIONS This data integration and feature extraction process can set the basis for exploratory analysis or a fully automated computational data mining approach on many as yet unanswered, clinically relevant biological questions.
Collapse
Affiliation(s)
- Ioannis Kavakiotis
- Department of Informatics, Aristotle University of Thessaloniki, Thessaloniki, Greece.
| | - Aliki Xochelli
- Institute of Applied Biosciences, CERTH, Thessaloniki, Greece.,Department of Immunology, Genetics and Pathology, Science for Life Laboratory, Uppsala University, Uppsala, Sweden
| | - Andreas Agathangelidis
- Division of Molecular Oncology and Department of Onco-Hematology, San Raffaele Scientific Institute, Milan, Italy
| | - Grigorios Tsoumakas
- Department of Informatics, Aristotle University of Thessaloniki, Thessaloniki, Greece
| | - Nicos Maglaveras
- Institute of Applied Biosciences, CERTH, Thessaloniki, Greece.,Lab of Computing and Medical Informatics, Medical School, Aristotle University of Thessaloniki, Thessaloniki, Greece
| | - Kostas Stamatopoulos
- Institute of Applied Biosciences, CERTH, Thessaloniki, Greece.,Department of Immunology, Genetics and Pathology, Science for Life Laboratory, Uppsala University, Uppsala, Sweden
| | - Anastasia Hadzidimitriou
- Institute of Applied Biosciences, CERTH, Thessaloniki, Greece.,Department of Immunology, Genetics and Pathology, Science for Life Laboratory, Uppsala University, Uppsala, Sweden
| | - Ioannis Vlahavas
- Department of Informatics, Aristotle University of Thessaloniki, Thessaloniki, Greece
| | - Ioanna Chouvarda
- Institute of Applied Biosciences, CERTH, Thessaloniki, Greece.,Lab of Computing and Medical Informatics, Medical School, Aristotle University of Thessaloniki, Thessaloniki, Greece
| |
Collapse
|
11
|
Li B, Sun Z, He Q, Zhu Y, Qin ZS. Bayesian inference with historical data-based informative priors improves detection of differentially expressed genes. Bioinformatics 2016; 32:682-9. [PMID: 26519502 DOI: 10.1093/bioinformatics/btv631] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2015] [Accepted: 10/26/2015] [Indexed: 12/13/2022] Open
Abstract
MOTIVATION Modern high-throughput biotechnologies such as microarray are capable of producing a massive amount of information for each sample. However, in a typical high-throughput experiment, only limited number of samples were assayed, thus the classical 'large p, small n' problem. On the other hand, rapid propagation of these high-throughput technologies has resulted in a substantial collection of data, often carried out on the same platform and using the same protocol. It is highly desirable to utilize the existing data when performing analysis and inference on a new dataset. RESULTS Utilizing existing data can be carried out in a straightforward fashion under the Bayesian framework in which the repository of historical data can be exploited to build informative priors and used in new data analysis. In this work, using microarray data, we investigate the feasibility and effectiveness of deriving informative priors from historical data and using them in the problem of detecting differentially expressed genes. Through simulation and real data analysis, we show that the proposed strategy significantly outperforms existing methods including the popular and state-of-the-art Bayesian hierarchical model-based approaches. Our work illustrates the feasibility and benefits of exploiting the increasingly available genomics big data in statistical inference and presents a promising practical strategy for dealing with the 'large p, small n' problem. AVAILABILITY AND IMPLEMENTATION Our method is implemented in R package IPBT, which is freely available from https://github.com/benliemory/IPBT CONTACT: yuzhu@purdue.edu; zhaohui.qin@emory.edu SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Ben Li
- Department of Biostatistics and Bioinformatics, Rollins School of Public Health, Emory University, Atlanta, GA 30322, USA
| | - Zhaonan Sun
- Department of Statistics, Purdue University, West Lafayette, IN 47906, USA and
| | - Qing He
- Department of Biostatistics and Bioinformatics, Rollins School of Public Health, Emory University, Atlanta, GA 30322, USA
| | - Yu Zhu
- Department of Statistics, Purdue University, West Lafayette, IN 47906, USA and
| | - Zhaohui S Qin
- Department of Biostatistics and Bioinformatics, Rollins School of Public Health, Emory University, Atlanta, GA 30322, USA, Department of Biomedical Informatics, Emory University School of Medicine, Atlanta, GA 30322, USA
| |
Collapse
|
12
|
Bergon A, Belzeaux R, Comte M, Pelletier F, Hervé M, Gardiner EJ, Beveridge NJ, Liu B, Carr V, Scott RJ, Kelly B, Cairns MJ, Kumarasinghe N, Schall U, Blin O, Boucraut J, Tooney PA, Fakra E, Ibrahim EC. CX3CR1 is dysregulated in blood and brain from schizophrenia patients. Schizophr Res 2015; 168:434-43. [PMID: 26285829 DOI: 10.1016/j.schres.2015.08.010] [Citation(s) in RCA: 44] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/25/2015] [Revised: 08/05/2015] [Accepted: 08/06/2015] [Indexed: 12/31/2022]
Abstract
The molecular mechanisms underlying schizophrenia remain largely unknown. Although schizophrenia is a mental disorder, there is increasing evidence to indicate that inflammatory processes driven by diverse environmental factors play a significant role in its development. With gene expression studies having been conducted across a variety of sample types, e.g., blood and postmortem brain, it is possible to investigate convergent signatures that may reveal interactions between the immune and nervous systems in schizophrenia pathophysiology. We conducted two meta-analyses of schizophrenia microarray gene expression data (N=474) and non-psychiatric control (N=485) data from postmortem brain and blood. Then, we assessed whether significantly dysregulated genes in schizophrenia could be shared between blood and brain. To validate our findings, we selected a top gene candidate and analyzed its expression by RT-qPCR in a cohort of schizophrenia subjects stabilized by atypical antipsychotic monotherapy (N=29) and matched controls (N=31). Meta-analyses highlighted inflammation as the major biological process associated with schizophrenia and that the chemokine receptor CX3CR1 was significantly down-regulated in schizophrenia. This differential expression was also confirmed in our validation cohort. Given both the recent data demonstrating selective CX3CR1 expression in subsets of neuroimmune cells, as well as behavioral and neuropathological observations of CX3CR1 deficiency in mouse models, our results of reduced CX3CR1 expression adds further support for a role played by monocyte/microglia in the neurodevelopment of schizophrenia.
Collapse
Affiliation(s)
- Aurélie Bergon
- INSERM, TAGC UMR_S 1090, 13288 Marseille Cedex 09, France; Aix Marseille Université, TAGC UMR_S 1090, 13288 Marseille Cedex 09, France
| | - Raoul Belzeaux
- Aix Marseille Université, CNRS, CRN2M UMR 7286, 13344 Marseille Cedex 15, France; FondaMental, Fondation de Recherche et de Soins en Santé Mentale, 94000 Créteil, France; AP-HM, Hôpital Sainte Marguerite, Pôle de Psychiatrie Universitaire Solaris, 13009 Marseille, France
| | - Magali Comte
- Aix-Marseille Université, CNRS, Institut de Neurosciences de la Timone UMR 7289, 13005 Marseille, France
| | - Florence Pelletier
- Aix Marseille Université, CNRS, CRN2M UMR 7286, 13344 Marseille Cedex 15, France; FondaMental, Fondation de Recherche et de Soins en Santé Mentale, 94000 Créteil, France
| | - Mylène Hervé
- Aix Marseille Université, CNRS, CRN2M UMR 7286, 13344 Marseille Cedex 15, France; FondaMental, Fondation de Recherche et de Soins en Santé Mentale, 94000 Créteil, France
| | - Erin J Gardiner
- School of Biomedical Sciences and Pharmacy and School of Medicine and Public Health, Faculty of Health, The University of Newcastle, University Drive, Callaghan, NSW 2308 Australia; Centre for Translational Neuroscience and Mental Health, The University of Newcastle, Callaghan, NSW 2308 Australia; Hunter Medical Research Institute, New Lambton Heights, NSW 2305, Australia; Schizophrenia Research Institute, Darlinghurst, NSW 2010 Australia
| | - Natalie J Beveridge
- School of Biomedical Sciences and Pharmacy and School of Medicine and Public Health, Faculty of Health, The University of Newcastle, University Drive, Callaghan, NSW 2308 Australia; Centre for Translational Neuroscience and Mental Health, The University of Newcastle, Callaghan, NSW 2308 Australia; Hunter Medical Research Institute, New Lambton Heights, NSW 2305, Australia; Schizophrenia Research Institute, Darlinghurst, NSW 2010 Australia
| | - Bing Liu
- School of Biomedical Sciences and Pharmacy and School of Medicine and Public Health, Faculty of Health, The University of Newcastle, University Drive, Callaghan, NSW 2308 Australia; Centre for Translational Neuroscience and Mental Health, The University of Newcastle, Callaghan, NSW 2308 Australia; Kids Cancer Alliance, Cancer Institute NSW, Sydney, Australia
| | - Vaughan Carr
- Schizophrenia Research Institute, Darlinghurst, NSW 2010 Australia; School of Psychiatry, University of New South Wales, Randwick, NSW 2301, Australia; Department of Psychiatry, Monash University, Clayton, VIC 3168, Australia
| | - Rodney J Scott
- School of Biomedical Sciences and Pharmacy and School of Medicine and Public Health, Faculty of Health, The University of Newcastle, University Drive, Callaghan, NSW 2308 Australia; Centre for Translational Neuroscience and Mental Health, The University of Newcastle, Callaghan, NSW 2308 Australia; Hunter Medical Research Institute, New Lambton Heights, NSW 2305, Australia; Schizophrenia Research Institute, Darlinghurst, NSW 2010 Australia
| | - Brian Kelly
- School of Biomedical Sciences and Pharmacy and School of Medicine and Public Health, Faculty of Health, The University of Newcastle, University Drive, Callaghan, NSW 2308 Australia; Centre for Translational Neuroscience and Mental Health, The University of Newcastle, Callaghan, NSW 2308 Australia; Hunter Medical Research Institute, New Lambton Heights, NSW 2305, Australia
| | - Murray J Cairns
- School of Biomedical Sciences and Pharmacy and School of Medicine and Public Health, Faculty of Health, The University of Newcastle, University Drive, Callaghan, NSW 2308 Australia; Centre for Translational Neuroscience and Mental Health, The University of Newcastle, Callaghan, NSW 2308 Australia; Hunter Medical Research Institute, New Lambton Heights, NSW 2305, Australia; Schizophrenia Research Institute, Darlinghurst, NSW 2010 Australia
| | - Nishantha Kumarasinghe
- School of Biomedical Sciences and Pharmacy and School of Medicine and Public Health, Faculty of Health, The University of Newcastle, University Drive, Callaghan, NSW 2308 Australia; Centre for Translational Neuroscience and Mental Health, The University of Newcastle, Callaghan, NSW 2308 Australia; Hunter Medical Research Institute, New Lambton Heights, NSW 2305, Australia; Schizophrenia Research Institute, Darlinghurst, NSW 2010 Australia; University of Sri Jayewardenepura, Nugegoda, Sri Lanka; National Institute of Mental Health, Angoda, Sri Lanka
| | - Ulrich Schall
- School of Biomedical Sciences and Pharmacy and School of Medicine and Public Health, Faculty of Health, The University of Newcastle, University Drive, Callaghan, NSW 2308 Australia; Centre for Translational Neuroscience and Mental Health, The University of Newcastle, Callaghan, NSW 2308 Australia; Hunter Medical Research Institute, New Lambton Heights, NSW 2305, Australia; Schizophrenia Research Institute, Darlinghurst, NSW 2010 Australia
| | - Olivier Blin
- CIC-UPCET et Pharmacologie Clinique, Hôpital de la Timone, 13005 Marseille, France
| | - José Boucraut
- Aix Marseille Université, CNRS, CRN2M UMR 7286, 13344 Marseille Cedex 15, France; FondaMental, Fondation de Recherche et de Soins en Santé Mentale, 94000 Créteil, France
| | - Paul A Tooney
- School of Biomedical Sciences and Pharmacy and School of Medicine and Public Health, Faculty of Health, The University of Newcastle, University Drive, Callaghan, NSW 2308 Australia; Centre for Translational Neuroscience and Mental Health, The University of Newcastle, Callaghan, NSW 2308 Australia; Hunter Medical Research Institute, New Lambton Heights, NSW 2305, Australia; Schizophrenia Research Institute, Darlinghurst, NSW 2010 Australia
| | - Eric Fakra
- Aix-Marseille Université, CNRS, Institut de Neurosciences de la Timone UMR 7289, 13005 Marseille, France; CHU de Saint-Etienne, Pôle de Psychiatrie, 42100 Saint-Etienne, France
| | - El Chérif Ibrahim
- Aix Marseille Université, CNRS, CRN2M UMR 7286, 13344 Marseille Cedex 15, France; FondaMental, Fondation de Recherche et de Soins en Santé Mentale, 94000 Créteil, France.
| |
Collapse
|
13
|
Feng F, Kepler TB. Bayesian Estimation of the Active Concentration and Affinity Constants Using Surface Plasmon Resonance Technology. PLoS One 2015; 10:e0130812. [PMID: 26098764 PMCID: PMC4476803 DOI: 10.1371/journal.pone.0130812] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/25/2014] [Accepted: 05/25/2015] [Indexed: 11/19/2022] Open
Abstract
Surface plasmon resonance (SPR) has previously been employed to measure the active concentration of analyte in addition to the kinetic rate constants in molecular binding reactions. Those approaches, however, have a few restrictions. In this work, a Bayesian approach is developed to determine both active concentration and affinity constants using SPR technology. With the appropriate prior probabilities on the parameters and a derived likelihood function, a Markov Chain Monte Carlo (MCMC) algorithm is applied to compute the posterior probability densities of both the active concentration and kinetic rate constants based on the collected SPR data. Compared with previous approaches, ours exploits information from the duration of the process in its entirety, including both association and dissociation phases, under partial mass transport conditions; do not depend on calibration data; multiple injections of analyte at varying flow rates are not necessary. Finally the method is validated by analyzing both simulated and experimental datasets. A software package implementing our approach is developed with a user-friendly interface and made freely available.
Collapse
Affiliation(s)
- Feng Feng
- Department of Microbiology, Boston University School of Medicine, Boston, Massachusetts, 02118, United States of America
- * E-mail:
| | - Thomas B. Kepler
- Department of Microbiology, Boston University School of Medicine, Boston, Massachusetts, 02118, United States of America
- Department of Mathematics & Statistics, Boston University, Boston, Massachusetts, 02118, United States of America
| |
Collapse
|
14
|
Zollinger A, Davison AC, Goldstein DR. Meta-analysis of incomplete microarray studies. Biostatistics 2015; 16:686-700. [PMID: 25987649 DOI: 10.1093/biostatistics/kxv014] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2013] [Accepted: 03/12/2015] [Indexed: 12/18/2022] Open
Abstract
Meta-analysis of microarray studies to produce an overall gene list is relatively straightforward when complete data are available. When some studies lack information-providing only a ranked list of genes, for example-it is common to reduce all studies to ranked lists prior to combining them. Since this entails a loss of information, we consider a hierarchical Bayes approach to meta-analysis using different types of information from different studies: the full data matrix, summary statistics, or ranks. The model uses an informative prior for the parameter of interest to aid the detection of differentially expressed genes. Simulations show that the new approach can give substantial power gains compared with classical meta-analysis and list aggregation methods. A meta-analysis of 11 published studies with different data types identifies genes known to be involved in ovarian cancer and shows significant enrichment.
Collapse
Affiliation(s)
- Alix Zollinger
- Ecole Polytechnique Fédérale de Lausanne, EPFL-FSB-MATHAA-STAT, Station 8, 1015 Lausanne, Switzerland
| | - Anthony C Davison
- Ecole Polytechnique Fédérale de Lausanne, EPFL-FSB-MATHAA-STAT, Station 8, 1015 Lausanne, Switzerland
| | - Darlene R Goldstein
- Ecole Polytechnique Fédérale de Lausanne, EPFL-FSB-MATHAA-STAT, Station 8, 1015 Lausanne, Switzerland
| |
Collapse
|
15
|
Parker JD, Torchin ME, Hufbauer RA, Lemoine NP, Alba C, Blumenthal DM, Bossdorf O, Byers JE, Dunn AM, Heckman RW, Hejda M, Jarošík V, Kanarek AR, Martin LB, Perkins SE, Pyšek P, Schierenbeck K, Schlöder C, van Klinken R, Vaughn KJ, Williams W, Wolfe LM. Do invasive species perform better in their new ranges? Ecology 2013; 94:985-94. [DOI: 10.1890/12-1810.1] [Citation(s) in RCA: 183] [Impact Index Per Article: 16.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
|
16
|
Conlon EM, Postier BL, Methé BA, Nevin KP, Lovley DR. A Bayesian model for pooling gene expression studies that incorporates co-regulation information. PLoS One 2012; 7:e52137. [PMID: 23284902 PMCID: PMC3532429 DOI: 10.1371/journal.pone.0052137] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2012] [Accepted: 11/13/2012] [Indexed: 12/01/2022] Open
Abstract
Current Bayesian microarray models that pool multiple studies assume gene expression is independent of other genes. However, in prokaryotic organisms, genes are arranged in units that are co-regulated (called operons). Here, we introduce a new Bayesian model for pooling gene expression studies that incorporates operon information into the model. Our Bayesian model borrows information from other genes within the same operon to improve estimation of gene expression. The model produces the gene-specific posterior probability of differential expression, which is the basis for inference. We found in simulations and in biological studies that incorporating co-regulation information improves upon the independence model. We assume that each study contains two experimental conditions: a treatment and control. We note that there exist environmental conditions for which genes that are supposed to be transcribed together lose their operon structure, and that our model is best carried out for known operon structures.
Collapse
Affiliation(s)
- Erin M Conlon
- Department of Mathematics and Statistics, University of Massachusetts, Amherst, MA, USA.
| | | | | | | | | |
Collapse
|
17
|
Wang Y, Hu YL, Cao J, He M. Bioinformatic screening of key genes expressed in both human and mouse hepatocellular carcinoma. Shijie Huaren Xiaohua Zazhi 2012; 20:1012-1017. [DOI: 10.11569/wcjd.v20.i12.1012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
AIM: To mine and analyze large amounts of data generated by gene chips and screen key genes expressed in both human and mouse hepatocellular carcinoma using bioinformatic methods.
METHODS: Through literature search and collection, nine sets of gene chip data that meet the criteria were downloaded from GEO database. The data were standardized by using Bioconductor and R version 2.10.1. The original Affymetrix data were normalized, background corrected, standardized and log2 transformed using the RMA algorithm. Excel's TTEST function was used to calculate the significance of each gene. DAVID was used for gene ID conversion and a table was established for samples and the corresponding gene expression data. A meta-analysis was then performed to screen genes expressed in both human and mouse hepatocellular carcinoma. KEGG pathways were then enriched.
RESULTS: A total of 52 genes were found to be expressed in both human and mouse hepatocellular carcinoma. Five of them were up-regulated, while four of them down-regulated. Seven KEGG pathways were enriched, of which glycine, serine, threonine metabolic pathways and axon guidance pathway have been previously reported to associated with the development of hepatocellular carcinoma.
CONCLUSION: Bioinformatic tools allow us to identify key genes and pathways that are closely related to the development of hepatocellular carcinoma in both human and mice.
Collapse
|
18
|
Tseng GC, Ghosh D, Feingold E. Comprehensive literature review and statistical considerations for microarray meta-analysis. Nucleic Acids Res 2012; 40:3785-99. [PMID: 22262733 PMCID: PMC3351145 DOI: 10.1093/nar/gkr1265] [Citation(s) in RCA: 266] [Impact Index Per Article: 22.2] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023] Open
Abstract
With the rapid advances of various high-throughput technologies, generation of ‘-omics’ data is commonplace in almost every biomedical field. Effective data management and analytical approaches are essential to fully decipher the biological knowledge contained in the tremendous amount of experimental data. Meta-analysis, a set of statistical tools for combining multiple studies of a related hypothesis, has become popular in genomic research. Here, we perform a systematic search from PubMed and manual collection to obtain 620 genomic meta-analysis papers, of which 333 microarray meta-analysis papers are summarized as the basis of this paper and the other 249 GWAS meta-analysis papers are discussed in the next companion paper. The review in the present paper focuses on various biological purposes of microarray meta-analysis, databases and software and related statistical procedures. Statistical considerations of such an analysis are further scrutinized and illustrated by a case study. Finally, several open questions are listed and discussed.
Collapse
Affiliation(s)
- George C Tseng
- Department of Biostatistics, University of Pittsburgh, Pittsburgh, PA, USA.
| | | | | |
Collapse
|
19
|
Fierro AC, Vandenbussche F, Engelen K, Van de Peer Y, Marchal K. Meta Analysis of Gene Expression Data within and Across Species. Curr Genomics 2011; 9:525-34. [PMID: 19516959 PMCID: PMC2694560 DOI: 10.2174/138920208786847935] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2008] [Revised: 07/07/2008] [Accepted: 07/18/2008] [Indexed: 01/15/2023] Open
Abstract
Since the second half of the 1990s, a large number of genome-wide analyses have been described that study gene expression at the transcript level. To this end, two major strategies have been adopted, a first one relying on hybridization techniques such as microarrays, and a second one based on sequencing techniques such as serial analysis of gene expression (SAGE), cDNA-AFLP, and analysis based on expressed sequence tags (ESTs). Despite both types of profiling experiments becoming routine techniques in many research groups, their application remains costly and laborious. As a result, the number of conditions profiled in individual studies is still relatively small and usually varies from only two to few hundreds of samples for the largest experiments. More and more, scientific journals require the deposit of these high throughput experiments in public databases upon publication. Mining the information present in these databases offers molecular biologists the possibility to view their own small-scale analysis in the light of what is already available. However, so far, the richness of the public information remains largely unexploited. Several obstacles such as the correct association between ESTs and microarray probes with the corresponding gene transcript, the incompleteness and inconsistency in the annotation of experimental conditions, and the lack of standardized experimental protocols to generate gene expression data, all impede the successful mining of these data. Here, we review the potential and difficulties of combining publicly available expression data from respectively EST analyses and microarray experiments. With examples from literature, we show how meta-analysis of expression profiling experiments can be used to study expression behavior in a single organism or between organisms, across a wide range of experimental conditions. We also provide an overview of the methods and tools that can aid molecular biologists in exploiting these public data.
Collapse
Affiliation(s)
- Ana C Fierro
- Department of Microbial and Molecular Systems, Katholieke Universiteit Leuven, Kasteelpark Arenberg 20, 3001 Leuven, Belgium
| | | | | | | | | |
Collapse
|
20
|
Li J, Tseng GC. An adaptively weighted statistic for detecting differential gene expression when combining multiple transcriptomic studies. Ann Appl Stat 2011. [DOI: 10.1214/10-aoas393] [Citation(s) in RCA: 84] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
21
|
Feng F, Sales AP, Kepler TB. A Bayesian approach for estimating calibration curves and unknown concentrations in immunoassays. ACTA ACUST UNITED AC 2010; 27:707-12. [PMID: 21149344 DOI: 10.1093/bioinformatics/btq686] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
MOTIVATION Immunoassays are primary diagnostic and research tools throughout the medical and life sciences. The common approach to the processing of immunoassay data involves estimation of the calibration curve followed by inversion of the calibration function to read off the concentration estimates. This approach, however, does not lend itself easily to acceptable estimation of confidence limits on the estimated concentrations. Such estimates must account for uncertainty in the calibration curve as well as uncertainty in the target measurement. Even point estimates can be problematic: because of the non-linearity of calibration curves and error heteroscedasticity, the neglect of components of measurement error can produce significant bias. METHODS We have developed a Bayesian approach for the estimation of concentrations from immunoassay data that treats the propagation of measurement error appropriately. The method uses Markov Chain Monte Carlo (MCMC) to approximate the posterior distribution of the target concentrations and numerically compute the relevant summary statistics. Software implementing the method is freely available for public use. RESULTS The new method was tested on both simulated and experimental datasets with different measurement error models. The method outperformed the common inverse method on samples with large measurement errors. Even in cases with extreme measurements where the common inverse method failed, our approach always generated reasonable estimates for the target concentrations. AVAILABILITY Project name: Baecs; Project home page: www.computationalimmunology.org/utilities/; Operating systems: Linux, MacOS X and Windows; Programming language: C++; License: Free for Academic Use.
Collapse
Affiliation(s)
- Feng Feng
- Center for Computational Immunology, Department of Biostatistics and Bioinformatics, Duke University Medical Center, Durham, NC 27705, USA.
| | | | | |
Collapse
|
22
|
Nguyen TT, Almon RR, Dubois DC, Jusko WJ, Androulakis IP. Comparative analysis of acute and chronic corticosteroid pharmacogenomic effects in rat liver: transcriptional dynamics and regulatory structures. BMC Bioinformatics 2010; 11:515. [PMID: 20946642 PMCID: PMC2973961 DOI: 10.1186/1471-2105-11-515] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2010] [Accepted: 10/14/2010] [Indexed: 12/11/2022] Open
Abstract
Background Comprehensively understanding corticosteroid pharmacogenomic effects is an essential step towards an insight into the underlying molecular mechanisms for both beneficial and detrimental clinical effects. Nevertheless, even in a single tissue different methods of corticosteroid administration can induce different patterns of expression and regulatory control structures. Therefore, rich in vivo datasets of pharmacological time-series with two dosing regimens sampled from rat liver are examined for temporal patterns of changes in gene expression and their regulatory commonalities. Results The study addresses two issues, including (1) identifying significant transcriptional modules coupled with dynamic expression patterns and (2) predicting relevant common transcriptional controls to better understand the underlying mechanisms of corticosteroid adverse effects. Following the orientation of meta-analysis, an extended computational approach that explores the concept of agreement matrix from consensus clustering has been proposed with the aims of identifying gene clusters that share common expression patterns across multiple dosing regimens as well as handling challenges in the analysis of microarray data from heterogeneous sources, e.g. different platforms and time-grids in this study. Six significant transcriptional modules coupled with typical patterns of expression have been identified. Functional analysis reveals that virtually all enriched functions (gene ontologies, pathways) in these modules are shown to be related to metabolic processes, implying the importance of these modules in adverse effects under the administration of corticosteroids. Relevant putative transcriptional regulators (e.g. RXRF, FKHD, SP1F) are also predicted to provide another source of information towards better understanding the complexities of expression patterns and the underlying regulatory mechanisms of those modules. Conclusions We have proposed a framework to identify significant coexpressed clusters of genes across multiple conditions experimented from different microarray platforms, time-grids, and also tissues if applicable. Analysis on rich in vivo datasets of corticosteroid time-series yielded significant insights into the pharmacogenomic effects of corticosteroids, especially the relevance to metabolic side-effects. This has been illustrated through enriched metabolic functions in those transcriptional modules and the presence of GRE binding motifs in those enriched pathways, providing significant modules for further analysis on pharmacogenomic corticosteroid effects.
Collapse
Affiliation(s)
- Tung T Nguyen
- BioMaPS Institute for Quantitative Biology, Rutgers University, Piscataway, New Jersey, USA
| | | | | | | | | |
Collapse
|
23
|
|
24
|
Gholami AM, Fellenberg K. Cross-species common regulatory network inference without requirement for prior gene affiliation. ACTA ACUST UNITED AC 2010; 26:1082-90. [PMID: 20200011 DOI: 10.1093/bioinformatics/btq096] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
MOTIVATION Cross-species meta-analyses of microarray data usually require prior affiliation of genes based on orthology information that often relies on sequence similarity. RESULTS We present an algorithm merging microarray datasets on the basis of co-expression alone, without any requirement for orthology information to affiliate genes. Combining existing methods such as co-inertia analysis, back-transformation, Hungarian matching and majority voting in an iterative non-greedy hill-climbing approach, it affiliates arrays and genes at the same time, maximizing the co-structure between the datasets. To introduce the method, we demonstrate its performance on two closely and two distantly related datasets of different experimental context and produced on different platforms. Each pair stems from two different species. The resulting cross-species dynamic Bayesian gene networks improve on the networks inferred from each dataset alone by yielding more significant network motifs, as well as more of the interactions already recorded in KEGG and other databases. Also, it is shown that our algorithm converges on the optimal number of nodes for network inference. Being readily extendable to more than two datasets, it provides the opportunity to infer extensive gene regulatory networks. AVAILABILITY AND IMPLEMENTATION Source code (MATLAB and R) freely available for download at http://www.mchips.org/supplements/moghaddasi_source.tgz.
Collapse
Affiliation(s)
- Amin Moghaddas Gholami
- Chair of Proteomics and Bioanalytics, Center for Integrated Protein Sciences Munich (CIPSM), Technische Universität München, Emil Erlenmeyer Forum 5, 85354 Freising, Germany
| | | |
Collapse
|
25
|
Abstract
DNA microarray profiles are plagued by the issue of large number of variables but small number of samples and are often notorious for their low signal-to-noise ratio for clinical applications. Therefore, a great need for meta-analysis techniques is emerging to yield more valid and informative results than each experiment separately. By exploring the power of several studies in one single analysis, meta-analysis of many cancer gene-profiling data increases the statistical power to detect differentially expressed genes and allows assessment of heterogeneity. OrderedList is such a method that was specially proposed for cancer gene expression data meta-analysis. It is superior to other methods in that it does not rely on strong effects of differential gene expression in a single study but on consistent regulated genes across multiple studies. This chapter introduces the R implementation of this methodology on real data sets to identify biomarkers for adenocarcinoma lung cancer.
Collapse
Affiliation(s)
- Xinan Yang
- Division of Bioinformatics, State Key Laboratory of Bioelectronics (Chien-Shiung Wu Laboratory), Southeast University, Nanjing, China.
| | | |
Collapse
|
26
|
Wang K, Narayanan M, Zhong H, Tompa M, Schadt EE, Zhu J. Meta-analysis of inter-species liver co-expression networks elucidates traits associated with common human diseases. PLoS Comput Biol 2009; 5:e1000616. [PMID: 20019805 PMCID: PMC2787626 DOI: 10.1371/journal.pcbi.1000616] [Citation(s) in RCA: 43] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2009] [Accepted: 11/16/2009] [Indexed: 12/02/2022] Open
Abstract
Co-expression networks are routinely used to study human diseases like obesity and diabetes. Systematic comparison of these networks between species has the potential to elucidate common mechanisms that are conserved between human and rodent species, as well as those that are species-specific characterizing evolutionary plasticity. We developed a semi-parametric meta-analysis approach for combining gene-gene co-expression relationships across expression profile datasets from multiple species. The simulation results showed that the semi-parametric method is robust against noise. When applied to human, mouse, and rat liver co-expression networks, our method out-performed existing methods in identifying gene pairs with coherent biological functions. We identified a network conserved across species that highlighted cell-cell signaling, cell-adhesion and sterol biosynthesis as main biological processes represented in genome-wide association study candidate gene sets for blood lipid levels. We further developed a heterogeneity statistic to test for network differences among multiple datasets, and demonstrated that genes with species-specific interactions tend to be under positive selection throughout evolution. Finally, we identified a human-specific sub-network regulated by RXRG, which has been validated to play a different role in hyperlipidemia and Type 2 diabetes between human and mouse. Taken together, our approach represents a novel step forward in integrating gene co-expression networks from multiple large scale datasets to leverage not only common information but also differences that are dataset-specific. Two important aspects of drug development are drug target identification and biomarker discovery for early disease detection, disease progression, drug efficacy and drug toxicity, etc. Recently, many single nucleotide polymorphisms (SNPs) associated with human diseases are discovered through large genome-wide association studies (GWAS). However, it is still largely unclear how these candidate SNPs may cause human diseases. The ultimate aim of this paper is to put these GWAS candidate SNPs and their associated genes into a network context to understand their mechanism of action in human diseases. In addition to large-scale human data sets that are often heterogeneous in terms of genetic and environmental factors, many high quality data sets in rodents exist and are frequently used to model human diseases. To leverage such information, we developed a method for combining and contrasting gene networks between human and rodents, specifically to elucidate how GWAS candidate SNPs may contribute to human diseases. By identifying mechanisms that are conserved or divergent between human and rodents, we can also predict which disease causal genes can be studied using rodent models and which ones may not.
Collapse
Affiliation(s)
- Kai Wang
- Department of Genetics, Rosetta Inpharmatics, Seattle, Washington, United States of America
| | - Manikandan Narayanan
- Department of Genetics, Rosetta Inpharmatics, Seattle, Washington, United States of America
| | - Hua Zhong
- Department of Genetics, Rosetta Inpharmatics, Seattle, Washington, United States of America
| | - Martin Tompa
- Department of Genetics, Rosetta Inpharmatics, Seattle, Washington, United States of America
- Department of Computer Sciences, University of Washington, Seattle, Washington, United States of America
| | - Eric E. Schadt
- Department of Genetics, Rosetta Inpharmatics, Seattle, Washington, United States of America
- * E-mail: (EES); (JZ)
| | - Jun Zhu
- Department of Genetics, Rosetta Inpharmatics, Seattle, Washington, United States of America
- * E-mail: (EES); (JZ)
| |
Collapse
|
27
|
Lu S, Li J, Song C, Shen K, Tseng GC. Biomarker detection in the integration of multiple multi-class genomic studies. ACTA ACUST UNITED AC 2009; 26:333-40. [PMID: 19965884 DOI: 10.1093/bioinformatics/btp669] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
MOTIVATION Systematic information integration of multiple-related microarray studies has become an important issue as the technology becomes mature and prevalent in the past decade. The aggregated information provides more robust and accurate biomarker detection. So far, published meta-analysis methods for this purpose mostly consider two-class comparison. Methods for combining multi-class studies and considering expression pattern concordance are rarely explored. RESULTS In this article, we develop three integration methods for biomarker detection in multiple multi-class microarray studies: ANOVA-maxP, min-MCC and OW-min-MCC. We first consider a natural extension of combining P-values from the traditional ANOVA model. Since P-values from ANOVA do not guarantee to reflect the concordant expression pattern information across studies, we propose a multi-class correlation (MCC) measure to specifically seek for biomarkers of concordant inter-class patterns across a pair of studies. For both ANOVA and MCC approaches, we use extreme order statistics to identify biomarkers differentially expressed (DE) in all studies (i.e. ANOVA-maxP and min-MCC). The min-MCC method is further extended to identify biomarkers DE in partial studies by incorporating a recently developed optimally weighted (OW) technique (OW-min-MCC). All methods are evaluated by simulation studies and by three meta-analysis applications to multi-tissue mouse metabolism datasets, multi-condition mouse trauma datasets and multi-malignant-condition human prostate cancer datasets. The results show complementary strength of the three methods for different biological purposes. AVAILABILITY http://www.biostat.pitt.edu/bioinfo/. SUPPLEMENTARY INFORMATION Supplementary data is available at Bioinformatics online.
Collapse
Affiliation(s)
- Shuya Lu
- Department of Biostatistics, University of Pittsburgh, Pittsburgh, PA, USA
| | | | | | | | | |
Collapse
|
28
|
Scharpf RB, Tjelmeland H, Parmigiani G, Nobel AB. A Bayesian model for cross-study differential gene expression. J Am Stat Assoc 2009; 104:1295-1310. [PMID: 21127725 DOI: 10.1198/jasa.2009.ap07611] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
Abstract
In this paper we define a hierarchical Bayesian model for microarray expression data collected from several studies and use it to identify genes that show differential expression between two conditions. Key features include shrinkage across both genes and studies, and flexible modeling that allows for interactions between platforms and the estimated effect, as well as concordant and discordant differential expression across studies. We evaluated the performance of our model in a comprehensive fashion, using both artificial data, and a "split-study" validation approach that provides an agnostic assessment of the model's behavior not only under the null hypothesis, but also under a realistic alternative. The simulation results from the artificial data demonstrate the advantages of the Bayesian model. The 1 - AUC values for the Bayesian model are roughly half of the corresponding values for a direct combination of t- and SAM-statistics. Furthermore, the simulations provide guidelines for when the Bayesian model is most likely to be useful. Most noticeably, in small studies the Bayesian model generally outperforms other methods when evaluated by AUC, FDR, and MDR across a range of simulation parameters, and this difference diminishes for larger sample sizes in the individual studies. The split-study validation illustrates appropriate shrinkage of the Bayesian model in the absence of platform-, sample-, and annotation-differences that otherwise complicate experimental data analyses. Finally, we fit our model to four breast cancer studies employing different technologies (cDNA and Affymetrix) to estimate differential expression in estrogen receptor positive tumors versus negative ones. Software and data for reproducing our analysis are publicly available.
Collapse
Affiliation(s)
- Robert B Scharpf
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD 21205
| | | | | | | |
Collapse
|
29
|
Conlon EM, Postier BL, Methé BA, Nevin KP, Lovley DR. Hierarchical Bayesian meta-analysis models for cross-platform microarray studies. J Appl Stat 2009. [DOI: 10.1080/02664760802562480] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
30
|
Marot G, Foulley JL, Mayer CD, Jaffrézic F. Moderated effect size and P-value combinations for microarray meta-analyses. Bioinformatics 2009; 25:2692-9. [PMID: 19628502 DOI: 10.1093/bioinformatics/btp444] [Citation(s) in RCA: 116] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION With the proliferation of microarray experiments and their availability in the public domain, the use of meta-analysis methods to combine results from different studies increases. In microarray experiments, where the sample size is often limited, meta-analysis offers the possibility to considerably increase the statistical power and give more accurate results. RESULTS A moderated effect size combination method was proposed and compared with other meta-analysis approaches. All methods were applied to real publicly available datasets on prostate cancer, and were compared in an extensive simulation study for various amounts of inter-study variability. Although the proposed moderated effect size combination improved already existing effect size approaches, the P-value combination was found to provide a better sensitivity and a better gene ranking than the other meta-analysis methods, while effect size methods were more conservative. AVAILABILITY An R package metaMA is available on the CRAN.
Collapse
Affiliation(s)
- Guillemette Marot
- INRA, UMR 1313 Génétique Animale et Biologie Intégrative, Jouy-en-Josas, F-78350, France.
| | | | | | | |
Collapse
|
31
|
Wren JD. A global meta-analysis of microarray expression data to predict unknown gene functions and estimate the literature-data divide. ACTA ACUST UNITED AC 2009; 25:1694-701. [PMID: 19447786 DOI: 10.1093/bioinformatics/btp290] [Citation(s) in RCA: 78] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
MOTIVATION Approximately 9334 (37%) of human genes have no publications documenting their function and, for those that are published, the number of publications per gene is highly skewed. Furthermore, for reasons not clear, the entry of new gene names into the literature has slowed in recent years. If we are to better understand human/mammalian biology and complete the catalog of human gene function, it is important to finish predicting putative functions for these genes based upon existing experimental evidence. RESULTS A global meta-analysis (GMA) of all publicly available GEO two-channel human microarray datasets (3551 experiments total) was conducted to identify genes with recurrent, reproducible patterns of co-regulation across different conditions. Patterns of co-expression were divided into parallel (i.e. genes are up and down-regulated together) and anti-parallel. Several ranking methods to predict a gene's function based on its top 20 co-expressed gene pairs were compared. In the best method, 34% of predicted Gene Ontology (GO) categories matched exactly with the known GO categories for approximately 5000 genes analyzed versus only 3% for random gene sets. Only 2.4% of co-expressed gene pairs were found as co-occurring gene pairs in MEDLINE. CONCLUSIONS Via a GO enrichment analysis, genes co-expressed in parallel with the query gene were frequently associated with the same GO categories, whereas anti-parallel genes were not. Combining parallel and anti-parallel genes for analysis resulted in fewer significant GO categories, suggesting they are best analyzed separately. Expression databases contain much unexpected genetic knowledge that has not yet been reported in the literature. A total of 1642 Human genes with unknown function were differentially expressed in at least 30 experiments. AVAILABILITY Data matrix available upon request.
Collapse
Affiliation(s)
- Jonathan D Wren
- Arthritis and Immunology Research Program, Oklahoma Medical Research Foundation;, 825 N.E. 13th Street, Oklahoma City, OK 73104-5005, USA.
| |
Collapse
|
32
|
Ma S, Huang J. Regularized gene selection in cancer microarray meta-analysis. BMC Bioinformatics 2009; 10:1. [PMID: 19118496 PMCID: PMC2631520 DOI: 10.1186/1471-2105-10-1] [Citation(s) in RCA: 140] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2008] [Accepted: 01/01/2009] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND In cancer studies, it is common that multiple microarray experiments are conducted to measure the same clinical outcome and expressions of the same set of genes. An important goal of such experiments is to identify a subset of genes that can potentially serve as predictive markers for cancer development and progression. Analyses of individual experiments may lead to unreliable gene selection results because of the small sample sizes. Meta analysis can be used to pool multiple experiments, increase statistical power, and achieve more reliable gene selection. The meta analysis of cancer microarray data is challenging because of the high dimensionality of gene expressions and the differences in experimental settings amongst different experiments. RESULTS We propose a Meta Threshold Gradient Descent Regularization (MTGDR) approach for gene selection in the meta analysis of cancer microarray data. The MTGDR has many advantages over existing approaches. It allows different experiments to have different experimental settings. It can account for the joint effects of multiple genes on cancer, and it can select the same set of cancer-associated genes across multiple experiments. Simulation studies and analyses of multiple pancreatic and liver cancer experiments demonstrate the superior performance of the MTGDR. CONCLUSION The MTGDR provides an effective way of analyzing multiple cancer microarray studies and selecting reliable cancer-associated genes.
Collapse
Affiliation(s)
- Shuangge Ma
- Department of Epidemiology and Public Health, Yale University, New Haven, CT 06520, USA.
| | | |
Collapse
|
33
|
Blangiardo M, Richardson S. A Bayesian calibration model for combining different pre-processing methods in Affymetrix chips. BMC Bioinformatics 2008; 9:512. [PMID: 19046434 PMCID: PMC2639433 DOI: 10.1186/1471-2105-9-512] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2008] [Accepted: 12/01/2008] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND In gene expression studies a key role is played by the so called "pre-processing", a series of steps designed to extract the signal and account for the sources of variability due to the technology used rather than to biological differences between the RNA samples. At the moment there is no commonly agreed gold standard pre-processing method and each researcher has the responsibility to choose one method, incurring the risk of false positive and false negative features arising from the particular method chosen. RESULTS We propose a Bayesian calibration model that makes use of the information provided by several pre-processing methods and we show that this model gives a better assessment of the 'true' unknown differential expression between two conditions. We demonstrate how to estimate the posterior distribution of the differential expression values of interest from the combined information. CONCLUSION On simulated data and on the spike-in Latin Square dataset from Affymetrix the Bayesian calibration model proves to have more power than each pre-processing method. Its biological interest is demonstrated through an experimental example on publicly available data.
Collapse
Affiliation(s)
- Marta Blangiardo
- Centre for Biostatistics, Imperial College, St Mary's Campus, Norfolk Place, London, UK.
| | | |
Collapse
|
34
|
Meta-analysis of genome-wide expression patterns associated with behavioral maturation in honey bees. BMC Genomics 2008; 9:503. [PMID: 18950506 PMCID: PMC2582039 DOI: 10.1186/1471-2164-9-503] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2008] [Accepted: 10/24/2008] [Indexed: 11/22/2022] Open
Abstract
Background The information from multiple microarray experiments can be integrated in an objective manner via meta-analysis. However, multiple meta-analysis approaches are available and their relative strengths have not been directly compared using experimental data in the context of different gene expression scenarios and studies with different degrees of relationship. This study investigates the complementary advantages of meta-analysis approaches to integrate information across studies, and further mine the transcriptome for genes that are associated with complex processes such as behavioral maturation in honey bees. Behavioral maturation and division of labor in honey bees are related to changes in the expression of hundreds of genes in the brain. The information from various microarray studies comparing the expression of genes at different maturation stages in honey bee brains was integrated using complementary meta-analysis approaches. Results Comparison of lists of genes with significant differential expression across studies failed to identify genes with consistent patterns of expression that were below the selected significance threshold, or identified genes with significant yet inconsistent patterns. The meta-analytical framework supported the identification of genes with consistent overall expression patterns and eliminated genes that exhibited contradictory expression patterns across studies. Sample-level meta-analysis of normalized gene-expression can detect more differentially expressed genes than the study-level meta-analysis of estimates for genes that were well described by similar model parameter estimates across studies and had small variation across studies. Furthermore, study-level meta-analysis was well suited for genes that exhibit consistent patterns across studies, genes that had substantial variation across studies, and genes that did not conform to the assumptions of the sample-level meta-analysis. Meta-analyses confirmed previously reported genes and helped identify genes (e.g. Tomosyn, Chitinase 5, Adar, Innexin 2, Transferrin 1, Sick, Oatp26F) and Gene Ontology categories (e.g. purine nucleotide binding) not previously associated with maturation in honey bees. Conclusion This study demonstrated that a combination of meta-analytical approaches best addresses the highly dimensional nature of genome-wide microarray studies. As expected, the integration of gene expression information from microarray studies using meta-analysis enhanced the characterization of the transcriptome of complex biological processes.
Collapse
|
35
|
Liang Y, Kelemen A. Bayesian models and meta analysis for multiple tissue gene expression data following corticosteroid administration. BMC Bioinformatics 2008; 9:354. [PMID: 18755028 PMCID: PMC2579308 DOI: 10.1186/1471-2105-9-354] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2008] [Accepted: 08/28/2008] [Indexed: 11/29/2022] Open
Abstract
Background This paper addresses key biological problems and statistical issues in the analysis of large gene expression data sets that describe systemic temporal response cascades to therapeutic doses in multiple tissues such as liver, skeletal muscle, and kidney from the same animals. Affymetrix time course gene expression data U34A are obtained from three different tissues including kidney, liver and muscle. Our goal is not only to find the concordance of gene in different tissues, identify the common differentially expressed genes over time and also examine the reproducibility of the findings by integrating the results through meta analysis from multiple tissues in order to gain a significant increase in the power of detecting differentially expressed genes over time and to find the differential differences of three tissues responding to the drug. Results and conclusion Bayesian categorical model for estimating the proportion of the 'call' are used for pre-screening genes. Hierarchical Bayesian Mixture Model is further developed for the identifications of differentially expressed genes across time and dynamic clusters. Deviance information criterion is applied to determine the number of components for model comparisons and selections. Bayesian mixture model produces the gene-specific posterior probability of differential/non-differential expression and the 95% credible interval, which is the basis for our further Bayesian meta-inference. Meta-analysis is performed in order to identify commonly expressed genes from multiple tissues that may serve as ideal targets for novel treatment strategies and to integrate the results across separate studies. We have found the common expressed genes in the three tissues. However, the up/down/no regulations of these common genes are different at different time points. Moreover, the most differentially expressed genes were found in the liver, then in kidney, and then in muscle.
Collapse
Affiliation(s)
- Yulan Liang
- Department of Organizational Systems and Adult Health, University of Maryland, 655 W, Lombard Street, Baltimore, MD 21201-1579, USA.
| | | |
Collapse
|
36
|
Combining transcriptional datasets using the generalized singular value decomposition. BMC Bioinformatics 2008; 9:335. [PMID: 18687147 PMCID: PMC2562393 DOI: 10.1186/1471-2105-9-335] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2008] [Accepted: 08/08/2008] [Indexed: 11/17/2022] Open
Abstract
Background Both microarrays and quantitative real-time PCR are convenient tools for studying the transcriptional levels of genes. The former is preferable for large scale studies while the latter is a more targeted technique. Because of platform-dependent systematic effects, simple comparisons or merging of datasets obtained by these technologies are difficult, even though they may often be desirable. These difficulties are exacerbated if there is only partial overlap between the experimental conditions and genes probed in the two datasets. Results We show here that the generalized singular value decomposition provides a practical tool for merging a small, targeted dataset obtained by quantitative real-time PCR of specific genes with a much larger microarray dataset. The technique permits, for the first time, the identification of genes present in only one dataset co-expressed with a target gene present exclusively in the other dataset, even when experimental conditions for the two datasets are not identical. With the rapidly increasing number of publically available large scale microarray datasets the latter is frequently the case. The method enables us to discover putative candidate genes involved in the biosynthesis of the (1,3;1,4)-β-D-glucan polysaccharide found in plant cell walls. Conclusion We show that the generalized singular value decomposition provides a viable tool for a combined analysis of two gene expression datasets with only partial overlap of both gene sets and experimental conditions. We illustrate how the decomposition can be optimized self-consistently by using a judicious choice of genes to define it. The ability of the technique to seamlessly define a concept of "co-expression" across both datasets provides an avenue for meaningful data integration. We believe that it will prove to be particularly useful for exploiting large, publicly available, microarray datasets for species with unsequenced genomes by complementing them with more limited in-house expression measurements.
Collapse
|
37
|
Sivaganesan M, Seifring S, Varma M, Haugland RA, Shanks OC. A Bayesian method for calculating real-time quantitative PCR calibration curves using absolute plasmid DNA standards. BMC Bioinformatics 2008; 9:120. [PMID: 18298858 PMCID: PMC2292693 DOI: 10.1186/1471-2105-9-120] [Citation(s) in RCA: 70] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2007] [Accepted: 02/25/2008] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND In real-time quantitative PCR studies using absolute plasmid DNA standards, a calibration curve is developed to estimate an unknown DNA concentration. However, potential differences in the amplification performance of plasmid DNA compared to genomic DNA standards are often ignored in calibration calculations and in some cases impossible to characterize. A flexible statistical method that can account for uncertainty between plasmid and genomic DNA targets, replicate testing, and experiment-to-experiment variability is needed to estimate calibration curve parameters such as intercept and slope. Here we report the use of a Bayesian approach to generate calibration curves for the enumeration of target DNA from genomic DNA samples using absolute plasmid DNA standards. RESULTS Instead of the two traditional methods (classical and inverse), a Monte Carlo Markov Chain (MCMC) estimation was used to generate single, master, and modified calibration curves. The mean and the percentiles of the posterior distribution were used as point and interval estimates of unknown parameters such as intercepts, slopes and DNA concentrations. The software WinBUGS was used to perform all simulations and to generate the posterior distributions of all the unknown parameters of interest. CONCLUSION The Bayesian approach defined in this study allowed for the estimation of DNA concentrations from environmental samples using absolute standard curves generated by real-time qPCR. The approach accounted for uncertainty from multiple sources such as experiment-to-experiment variation, variability between replicate measurements, as well as uncertainty introduced when employing calibration curves generated from absolute plasmid DNA standards.
Collapse
Affiliation(s)
- Mano Sivaganesan
- U.S. Environmental Protection Agency, Office of Research and Development, National Risk Management Research Laboratory, 26 West Martin Luther King Drive, Cincinnati, OH 45268, USA.
| | | | | | | | | |
Collapse
|
38
|
Conlon EM. A Bayesian mixture model for metaanalysis of microarray studies. Funct Integr Genomics 2007; 8:43-53. [PMID: 17879102 DOI: 10.1007/s10142-007-0058-3] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2007] [Revised: 08/10/2007] [Accepted: 08/11/2007] [Indexed: 10/22/2022]
Abstract
The increased availability of microarray data has been calling for statistical methods to integrate findings across studies. A common goal of microarray analysis is to determine differentially expressed genes between two conditions, such as treatment vs control. A recent Bayesian metaanalysis model used a prior distribution for the mean log-expression ratios that was a mixture of two normal distributions. This model centered the prior distribution of differential expression at zero, and separated genes into two groups only: expressed and nonexpressed. Here, we introduce a Bayesian three-component truncated normal mixture prior model that more flexibly assigns prior distributions to the differentially expressed genes and produces three groups of genes: up and downregulated, and nonexpressed. We found in simulations of two and five studies that the three-component model outperformed the two-component model using three comparison measures. When analyzing biological data of Bacillus subtilis, we found that the three-component model discovered more genes and omitted fewer genes for the same levels of posterior probability of differential expression than the two-component model, and discovered more genes for fixed thresholds of Bayesian false discovery. We assumed that the data sets were produced from the same microarray platform and were prescaled.
Collapse
Affiliation(s)
- Erin M Conlon
- Department of Mathematics and Statistics, University of Massachusetts, 710 North Pleasant Street, Amherst, MA 01003-9305, USA.
| |
Collapse
|