1
|
Lagani V, Karozou AD, Gomez-Cabrero D, Silberberg G, Tsamardinos I. A comparative evaluation of data-merging and meta-analysis methods for reconstructing gene-gene interactions. BMC Bioinformatics 2016; 17 Suppl 5:194. [PMID: 27294826 PMCID: PMC4905611 DOI: 10.1186/s12859-016-1038-1] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
Abstract
BACKGROUND We address the problem of integratively analyzing multiple gene expression, microarray datasets in order to reconstruct gene-gene interaction networks. Integrating multiple datasets is generally believed to provide increased statistical power and to lead to a better characterization of the system under study. However, the presence of systematic variation across different studies makes network reverse-engineering tasks particularly challenging. We contrast two approaches that have been frequently used in the literature for addressing systematic biases: meta-analysis methods, which first calculate opportune statistics on single datasets and successively summarize them, and data-merging methods, which directly analyze the pooled data after removing eventual biases. This comparative evaluation is performed on both synthetic and real data, the latter consisting of two manually curated microarray compendia comprising several E. coli and Yeast studies, respectively. Furthermore, the reconstruction of the regulatory network of the transcription factor Ikaros in human Peripheral Blood Mononuclear Cells (PBMCs) is presented as a case-study. RESULTS The meta-analysis and data-merging methods included in our experimentations provided comparable performances on both synthetic and real data. Furthermore, both approaches outperformed (a) the naïve solution of merging data together ignoring possible biases, and (b) the results that are expected when only one dataset out of the available ones is analyzed in isolation. Using correlation statistics proved to be more effective than using p-values for correctly ranking candidate interactions. The results from the PBMC case-study indicate that the findings of the present study generalize to different types of network reconstruction algorithms. CONCLUSIONS Ignoring the systematic variations that differentiate heterogeneous studies can produce results that are statistically indistinguishable from random guessing. Meta-analysis and data merging methods have proved equally effective in addressing this issue, and thus researchers may safely select the approach that best suit their specific application.
Collapse
Affiliation(s)
- Vincenzo Lagani
- />Institute of Computer Science, Foundation for Research and Technology – Hellas, Heraklion, Greece
- />Computer Science Department, University of Crete, Heraklion, Sweden
| | - Argyro D. Karozou
- />Institute of Computer Science, Foundation for Research and Technology – Hellas, Heraklion, Greece
| | - David Gomez-Cabrero
- />Unit of Computational Medicine, Department of Medicine, Karolinska Institutet, 171 77 Stockholm, Sweden
- />Center for Molecular Medicine, Karolinska Institutet, 171 77 Stockholm, Sweden
- />Unit of Clinical Epidemiology, Department of Medicine, Karolinska University Hospital, L8, 17176 Heraklion, Sweden
- />Science for Life Laboratory, 17121 Solna, Sweden
| | - Gilad Silberberg
- />Unit of Computational Medicine, Department of Medicine, Karolinska Institutet, 171 77 Stockholm, Sweden
- />Center for Molecular Medicine, Karolinska Institutet, 171 77 Stockholm, Sweden
- />Unit of Clinical Epidemiology, Department of Medicine, Karolinska University Hospital, L8, 17176 Heraklion, Sweden
- />Science for Life Laboratory, 17121 Solna, Sweden
| | - Ioannis Tsamardinos
- />Institute of Computer Science, Foundation for Research and Technology – Hellas, Heraklion, Greece
- />Computer Science Department, University of Crete, Heraklion, Sweden
| |
Collapse
|
2
|
Guo M, Wang H, Potter SS, Whitsett JA, Xu Y. SINCERA: A Pipeline for Single-Cell RNA-Seq Profiling Analysis. PLoS Comput Biol 2015; 11:e1004575. [PMID: 26600239 PMCID: PMC4658017 DOI: 10.1371/journal.pcbi.1004575] [Citation(s) in RCA: 215] [Impact Index Per Article: 23.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2015] [Accepted: 09/30/2015] [Indexed: 01/15/2023] Open
Abstract
A major challenge in developmental biology is to understand the genetic and cellular processes/programs driving organ formation and differentiation of the diverse cell types that comprise the embryo. While recent studies using single cell transcriptome analysis illustrate the power to measure and understand cellular heterogeneity in complex biological systems, processing large amounts of RNA-seq data from heterogeneous cell populations creates the need for readily accessible tools for the analysis of single-cell RNA-seq (scRNA-seq) profiles. The present study presents a generally applicable analytic pipeline (SINCERA: a computational pipeline for SINgle CEll RNA-seq profiling Analysis) for processing scRNA-seq data from a whole organ or sorted cells. The pipeline supports the analysis for: 1) the distinction and identification of major cell types; 2) the identification of cell type specific gene signatures; and 3) the determination of driving forces of given cell types. We applied this pipeline to the RNA-seq analysis of single cells isolated from embryonic mouse lung at E16.5. Through the pipeline analysis, we distinguished major cell types of fetal mouse lung, including epithelial, endothelial, smooth muscle, pericyte, and fibroblast-like cell types, and identified cell type specific gene signatures, bioprocesses, and key regulators. SINCERA is implemented in R, licensed under the GNU General Public License v3, and freely available from CCHMC PBGE website, https://research.cchmc.org/pbge/sincera.html.
Collapse
Affiliation(s)
- Minzhe Guo
- The Perinatal Institute, Section of Neonatology, Perinatal and Pulmonary Biology, Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, United States of America
- Department of Electrical Engineering and Computing Systems, College of Engineering and Applied Science, University of Cincinnati, Cincinnati, Ohio, United States of America
| | - Hui Wang
- The Perinatal Institute, Section of Neonatology, Perinatal and Pulmonary Biology, Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, United States of America
| | - S. Steven Potter
- Division of Developmental Biology, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio, United States of America
| | - Jeffrey A. Whitsett
- The Perinatal Institute, Section of Neonatology, Perinatal and Pulmonary Biology, Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, United States of America
| | - Yan Xu
- The Perinatal Institute, Section of Neonatology, Perinatal and Pulmonary Biology, Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, United States of America
- Division of Biomedical Informatics, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio, United States of America
- * E-mail:
| |
Collapse
|
3
|
Rawat N, Kiran SP, Du D, Gmitter FG, Deng Z. Comprehensive meta-analysis, co-expression, and miRNA nested network analysis identifies gene candidates in citrus against Huanglongbing disease. BMC PLANT BIOLOGY 2015; 15:184. [PMID: 26215595 PMCID: PMC4517500 DOI: 10.1186/s12870-015-0568-4] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/09/2015] [Accepted: 07/07/2015] [Indexed: 05/20/2023]
Abstract
BACKGROUND Huanglongbing (HLB), the most devastating disease of citrus, is associated with infection by Candidatus Liberibacter asiaticus (CaLas) and is vectored by the Asian citrus psyllid (ACP). Recently, the molecular basis of citrus-HLB interactions has been examined using transcriptome analyses, and these analyses have identified many probe sets and pathways modulated by CaLas infection among different citrus cultivars. However, lack of consistency among reported findings indicates that an integrative approach is needed. This study was designed to identify the candidate probe sets in citrus-HLB interactions using meta-analysis and gene co-expression network modelling. RESULTS Twenty-two publically available transcriptome studies on citrus-HLB interactions, comprising 18 susceptible (S) datasets and four resistant (R) datasets, were investigated using Limma and RankProd methods of meta-analysis. A combined list of 7,412 differentially expressed probe sets was generated using a Teradata in-house Structured Query Language (SQL) script. We identified the 65 most common probe sets modulated in HLB disease among different tissues from the S and R datasets. Gene ontology analysis of these probe sets suggested that carbohydrate metabolism, nutrient transport, and biotic stress were the core pathways that were modulated in citrus by CaLas infection and HLB development. We also identified R-specific probe sets, which encoded leucine-rich repeat proteins, chitinase, constitutive disease resistance (CDR), miraculins, and lectins. Weighted gene co-expression network analysis (WGCNA) was conducted on 3,499 probe sets, and 21 modules with major hub probe sets were identified. Further, a miRNA nested network was created to examine gene regulation of the 3,499 target probe sets. Results suggest that csi-miR167 and csi-miR396 could affect ion transporters and defence response pathways, respectively. CONCLUSION Most of the potential candidate hub probe sets were co-expressed with gibberellin pathway (GA)-related probe sets, implying the role of GA signalling in HLB resistance. Our findings contribute to the integration of existing citrus-HLB transcriptome data that will help to elucidate the holistic picture of the citrus-HLB interaction. The citrus probe sets identified in this analysis signify a robust set of HLB-responsive candidates that are useful for further validation.
Collapse
Affiliation(s)
- Nidhi Rawat
- University of Florida, Institute of Food and Agricultural Sciences, Gulf Coast Research and Education Center, Wimauma, FL, 33598, USA.
| | - Sandhya P Kiran
- Ocimum BioSolutions, Banjara Hills Road No. 1, VI Floor Reliance Classic, Hyderabad, 500039, India.
| | - Dongliang Du
- University of Florida, Institute of Food and Agricultural Sciences, Citrus Research and Education Center, Lake Alfred, FL, 33850, USA.
| | - Fred G Gmitter
- University of Florida, Institute of Food and Agricultural Sciences, Citrus Research and Education Center, Lake Alfred, FL, 33850, USA.
| | - Zhanao Deng
- University of Florida, Institute of Food and Agricultural Sciences, Gulf Coast Research and Education Center, Wimauma, FL, 33598, USA.
| |
Collapse
|
4
|
Synergistic regulatory networks mediated by microRNAs and transcription factors under drought, heat and salt stresses in Oryza Sativa spp. Gene 2015; 555:127-39. [DOI: 10.1016/j.gene.2014.10.054] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2014] [Revised: 09/12/2014] [Accepted: 10/26/2014] [Indexed: 01/16/2023]
|
5
|
Bazil JN, Stamm KD, Li X, Thiagarajan R, Nelson TJ, Tomita-Mitchell A, Beard DA. The inferred cardiogenic gene regulatory network in the mammalian heart. PLoS One 2014; 9:e100842. [PMID: 24971943 PMCID: PMC4074065 DOI: 10.1371/journal.pone.0100842] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2013] [Accepted: 05/31/2014] [Indexed: 12/22/2022] Open
Abstract
Cardiac development is a complex, multiscale process encompassing cell fate adoption, differentiation and morphogenesis. To elucidate pathways underlying this process, a recently developed algorithm to reverse engineer gene regulatory networks was applied to time-course microarray data obtained from the developing mouse heart. Approximately 200 genes of interest were input into the algorithm to generate putative network topologies that are capable of explaining the experimental data via model simulation. To cull specious network interactions, thousands of putative networks are merged and filtered to generate scale-free, hierarchical networks that are statistically significant and biologically relevant. The networks are validated with known gene interactions and used to predict regulatory pathways important for the developing mammalian heart. Area under the precision-recall curve and receiver operator characteristic curve are 9% and 58%, respectively. Of the top 10 ranked predicted interactions, 4 have already been validated. The algorithm is further tested using a network enriched with known interactions and another depleted of them. The inferred networks contained more interactions for the enriched network versus the depleted network. In all test cases, maximum performance of the algorithm was achieved when the purely data-driven method of network inference was combined with a data-independent, functional-based association method. Lastly, the network generated from the list of approximately 200 genes of interest was expanded using gene-profile uniqueness metrics to include approximately 900 additional known mouse genes and to form the most likely cardiogenic gene regulatory network. The resultant network supports known regulatory interactions and contains several novel cardiogenic regulatory interactions. The method outlined herein provides an informative approach to network inference and leads to clear testable hypotheses related to gene regulation.
Collapse
Affiliation(s)
- Jason N. Bazil
- Department of Molecular and Integrative Physiology, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Karl D. Stamm
- Biotechnology and Bioengineering Center, Medical College of Wisconsin, Milwaukee, Wisconsin, United States of America
| | - Xing Li
- Division of Biomedical Statistics and Informatics, Department of Health Sciences Research, Mayo Clinic, Rochester, Minnesota, United States of America
| | - Raghuram Thiagarajan
- Department of Molecular and Integrative Physiology, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Timothy J. Nelson
- Departments of Medicine, Molecular Pharmacology and Experimental Therapeutics, and Mayo Clinic Center for Regenerative Medicine, Rochester, Minnesota, United States of America
| | - Aoy Tomita-Mitchell
- Biotechnology and Bioengineering Center, Medical College of Wisconsin, Milwaukee, Wisconsin, United States of America
| | - Daniel A. Beard
- Department of Molecular and Integrative Physiology, University of Michigan, Ann Arbor, Michigan, United States of America
- * E-mail:
| |
Collapse
|
6
|
DeGNServer: deciphering genome-scale gene networks through high performance reverse engineering analysis. BIOMED RESEARCH INTERNATIONAL 2013; 2013:856325. [PMID: 24328032 PMCID: PMC3847961 DOI: 10.1155/2013/856325] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/22/2013] [Accepted: 10/01/2013] [Indexed: 12/23/2022]
Abstract
Analysis of genome-scale gene networks (GNs) using large-scale gene expression data provides unprecedented opportunities to uncover gene interactions and regulatory networks involved in various biological processes and developmental programs, leading to accelerated discovery of novel knowledge of various biological processes, pathways and systems. The widely used context likelihood of relatedness (CLR) method based on the mutual information (MI) for scoring the similarity of gene pairs is one of the accurate methods currently available for inferring GNs. However, the MI-based reverse engineering method can achieve satisfactory performance only when sample size exceeds one hundred. This in turn limits their applications for GN construction from expression data set with small sample size. We developed a high performance web server, DeGNServer, to reverse engineering and decipher genome-scale networks. It extended the CLR method by integration of different correlation methods that are suitable for analyzing data sets ranging from moderate to large scale such as expression profiles with tens to hundreds of microarray hybridizations, and implemented all analysis algorithms using parallel computing techniques to infer gene-gene association at extraordinary speed. In addition, we integrated the SNBuilder and GeNa algorithms for subnetwork extraction and functional module discovery. DeGNServer is publicly and freely available online.
Collapse
|
7
|
Bhargava A, Clabaugh I, To JP, Maxwell BB, Chiang YH, Schaller GE, Loraine A, Kieber JJ. Identification of cytokinin-responsive genes using microarray meta-analysis and RNA-Seq in Arabidopsis. PLANT PHYSIOLOGY 2013; 162:272-94. [PMID: 23524861 PMCID: PMC3641208 DOI: 10.1104/pp.113.217026] [Citation(s) in RCA: 159] [Impact Index Per Article: 14.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/27/2013] [Accepted: 03/21/2013] [Indexed: 05/17/2023]
Abstract
Cytokinins are N(6)-substituted adenine derivatives that play diverse roles in plant growth and development. We sought to define a robust set of genes regulated by cytokinin as well as to query the response of genes not represented on microarrays. To this end, we performed a meta-analysis of microarray data from a variety of cytokinin-treated samples and used RNA-seq to examine cytokinin-regulated gene expression in Arabidopsis (Arabidopsis thaliana). Microarray meta-analysis using 13 microarray experiments combined with empirically defined filtering criteria identified a set of 226 genes differentially regulated by cytokinin, a subset of which has previously been validated by other methods. RNA-seq validated about 73% of the up-regulated genes identified by this meta-analysis. In silico promoter analysis indicated an overrepresentation of type-B Arabidopsis response regulator binding elements, consistent with the role of type-B Arabidopsis response regulators as primary mediators of cytokinin-responsive gene expression. RNA-seq analysis identified 73 cytokinin-regulated genes that were not represented on the ATH1 microarray. Representative genes were verified using quantitative reverse transcription-polymerase chain reaction and NanoString analysis. Analysis of the genes identified reveals a substantial effect of cytokinin on genes encoding proteins involved in secondary metabolism, particularly those acting in flavonoid and phenylpropanoid biosynthesis, as well as in the regulation of redox state of the cell, particularly a set of glutaredoxin genes. Novel splicing events were found in members of some gene families that are known to play a role in cytokinin signaling or metabolism. The genes identified in this analysis represent a robust set of cytokinin-responsive genes that are useful in the analysis of cytokinin function in plants.
Collapse
|