1
|
Chen L, Liu L, Yang G, Li X, Dai X, Xue L, Yin T. Expression Quantitative Trait Locus of Wood Formation-Related Genes in Salix suchowensis. Int J Mol Sci 2023; 25:247. [PMID: 38203430 PMCID: PMC10778782 DOI: 10.3390/ijms25010247] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2023] [Revised: 12/20/2023] [Accepted: 12/21/2023] [Indexed: 01/12/2024] Open
Abstract
Shrub willows are widely planted for landscaping, soil remediation, and biomass production, due to their rapid growth rates. Identification of regulatory genes in wood formation would provide clues for genetic engineering of willows for improved growth traits on marginal lands. Here, we conducted an expression quantitative trait locus (eQTL) analysis, using a full sibling F1 population of Salix suchowensis, to explore the genetic mechanisms underlying wood formation. Based on variants identified from simplified genome sequencing and gene expression data from RNA sequencing, 16,487 eQTL blocks controlling 5505 genes were identified, including 2148 cis-eQTLs and 16,480 trans-eQTLs. eQTL hotspots were identified, based on eQTL frequency in genomic windows, revealing one hotspot controlling genes involved in wood formation regulation. Regulatory networks were further constructed, resulting in the identification of key regulatory genes, including three transcription factors (JAZ1, HAT22, MYB36) and CLV1, BAM1, CYCB2;4, CDKB2;1, associated with the proliferation and differentiation activity of cambium cells. The enrichment of genes in plant hormone pathways indicates their critical roles in the regulation of wood formation. Our analyses provide a significant groundwork for a comprehensive understanding of the regulatory network of wood formation in S. suchowensis.
Collapse
Affiliation(s)
| | | | | | | | | | - Liangjiao Xue
- State Key Laboratory of Tree Genetics and Breeding, Jiangsu Key Laboratory for Poplar Germplasm Enhancement and Variety Improvement, Co-Innovation Center for Sustainable Forestry in Southern China, Nanjing Forestry University, Nanjing 210037, China
| | - Tongming Yin
- State Key Laboratory of Tree Genetics and Breeding, Jiangsu Key Laboratory for Poplar Germplasm Enhancement and Variety Improvement, Co-Innovation Center for Sustainable Forestry in Southern China, Nanjing Forestry University, Nanjing 210037, China
| |
Collapse
|
2
|
Mahilkar A, Nagendra P, Venkataraman P, Deshmukh S, Saini S. Rapid evolution of pre-zygotic reproductive barriers in allopatric populations. Microbiol Spectr 2023; 11:e0195023. [PMID: 37787555 PMCID: PMC10714765 DOI: 10.1128/spectrum.01950-23] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2023] [Accepted: 08/14/2023] [Indexed: 10/04/2023] Open
Abstract
IMPORTANCE A population diversifies into two or more species-such a process is known as speciation. In sexually reproducing microorganisms, which barriers arise first-pre-mating or post-mating? In this work, we quantify the relative strengths of these barriers and demonstrate that pre-mating barriers arise first in allopatrically evolving populations of yeast, Saccharomyces cerevisiae. These defects arise because of the altered kinetics of mating of the participating groups. Thus, our work provides an understanding of how adaptive changes can lead to diversification among microbial populations.
Collapse
Affiliation(s)
- Anjali Mahilkar
- Department of Chemical Engineering, Indian Institute of Technology Bombay, Mumbai, Powai, Maharashtra, India
| | - Prachitha Nagendra
- Department of Chemical Engineering, Indian Institute of Technology Bombay, Mumbai, Powai, Maharashtra, India
| | - Pavithra Venkataraman
- Department of Chemical Engineering, Indian Institute of Technology Bombay, Mumbai, Powai, Maharashtra, India
| | - Saniya Deshmukh
- Department of Chemical Engineering, Indian Institute of Technology Bombay, Mumbai, Powai, Maharashtra, India
| | - Supreet Saini
- Department of Chemical Engineering, Indian Institute of Technology Bombay, Mumbai, Powai, Maharashtra, India
| |
Collapse
|
3
|
Zhou F, He K, Ni Y. Individualized causal discovery with latent trajectory embedded Bayesian networks. Biometrics 2023; 79:3191-3202. [PMID: 36807295 DOI: 10.1111/biom.13843] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2021] [Revised: 12/08/2022] [Accepted: 02/06/2023] [Indexed: 02/20/2023]
Abstract
Bayesian networks have been widely used to generate causal hypotheses from multivariate data. Despite their popularity, the vast majority of existing causal discovery approaches make the strong assumption of a (partially) homogeneous sampling scheme. However, such assumption can be seriously violated, causing significant biases when the underlying population is inherently heterogeneous. To this end, we propose a novel causal Bayesian network model, termed BN-LTE, that embeds heterogeneous samples onto a low-dimensional manifold and builds Bayesian networks conditional on the embedding. This new framework allows for more precise network inference by improving the estimation resolution from the population level to the observation level. Moreover, while causal Bayesian networks are in general not identifiable with purely observational, cross-sectional data due to Markov equivalence, with the blessing of causal effect heterogeneity, we prove that the proposed BN-LTE is uniquely identifiable under relatively mild assumptions. Through extensive experiments, we demonstrate the superior performance of BN-LTE in causal structure learning as well as inferring observation-specific gene regulatory networks from observational data.
Collapse
Affiliation(s)
- Fangting Zhou
- Institute of Statistics and Big Data, Renmin University of China, Beijing, China
- Department of Statistics, Texas A&M University, College Station, Texas, USA
| | - Kejun He
- Institute of Statistics and Big Data, Renmin University of China, Beijing, China
| | - Yang Ni
- Department of Statistics, Texas A&M University, College Station, Texas, USA
| |
Collapse
|
4
|
Fang WQ, Wu YL, Hwang MJ. A Noise-Tolerating Gene Association Network Uncovering an Oncogenic Regulatory Motif in Lymphoma Transcriptomics. Life (Basel) 2023; 13:1331. [PMID: 37374114 DOI: 10.3390/life13061331] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2023] [Revised: 05/24/2023] [Accepted: 05/26/2023] [Indexed: 06/29/2023] Open
Abstract
In cancer genomics research, gene expressions provide clues to gene regulations implicating patients' risk of survival. Gene expressions, however, fluctuate due to noises arising internally and externally, making their use to infer gene associations, hence regulation mechanisms, problematic. Here, we develop a new regression approach to model gene association networks while considering uncertain biological noises. In a series of simulation experiments accounting for varying levels of biological noises, the new method was shown to be robust and perform better than conventional regression methods, as judged by a number of statistical measures on unbiasedness, consistency and accuracy. Application to infer gene associations in germinal-center B cells led to the discovery of a three-by-two regulatory motif gene expression and a three-gene prognostic signature for diffuse large B-cell lymphoma.
Collapse
Affiliation(s)
- Wei-Quan Fang
- Institute of Biomedical Sciences, Academia Sinica, Taipei 115, Taiwan
- Division of New Drug, Center for Drug Evaluation, Taipei 115, Taiwan
| | - Yu-Le Wu
- Institute of Biomedical Sciences, Academia Sinica, Taipei 115, Taiwan
| | - Ming-Jing Hwang
- Institute of Biomedical Sciences, Academia Sinica, Taipei 115, Taiwan
| |
Collapse
|
5
|
Zhou X, Cai X. Joint eQTL mapping and inference of gene regulatory network improves power of detecting both cis- and trans-eQTLs. Bioinformatics 2021; 38:149-156. [PMID: 34487140 PMCID: PMC8696109 DOI: 10.1093/bioinformatics/btab609] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2020] [Revised: 07/19/2021] [Accepted: 08/25/2021] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION Genetic variations of expression quantitative trait loci (eQTLs) play a critical role in influencing complex traits and diseases development. Two main factors that affect the statistical power of detecting eQTLs are: (i) relatively small size of samples available, and (ii) heavy burden of multiple testing due to a very large number of variants to be tested. The later issue is particularly severe when one tries to identify trans-eQTLs that are far away from the genes they influence. If one can exploit co-expressed genes jointly in eQTL-mapping, effective sample size can be increased. Furthermore, using the structure of the gene regulatory network (GRN) may help to identify trans-eQTLs without increasing multiple testing burden. RESULTS In this article, we use the structure equation model (SEM) to model both GRN and effect of eQTLs on gene expression, and then develop a novel algorithm, named sparse SEM for eQTL mapping (SSEMQ), to conduct joint eQTL mapping and GRN inference. The SEM can exploit co-expressed genes jointly in eQTL mapping and also use GRN to determine trans-eQTLs. Computer simulations demonstrate that our SSEMQ significantly outperforms nine existing eQTL mapping methods. SSEMQ is further used to analyze two real datasets of human breast and whole blood tissues, yielding a number of cis- and trans-eQTLs. AVAILABILITY AND IMPLEMENTATION R package ssemQr is available at https://github.com/Ivis4ml/ssemQr.git. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Xin Zhou
- Department of Electrical and Computer Engineering, University of Miami, Coral Gables, FL 33146, USA
| | | |
Collapse
|
6
|
Ha MJ, Sun W. Estimation of high-dimensional directed acyclic graphs with surrogate intervention. Biostatistics 2020; 21:659-675. [PMID: 30596892 DOI: 10.1093/biostatistics/kxy080] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2017] [Revised: 11/18/2018] [Accepted: 11/25/2018] [Indexed: 11/15/2022] Open
Abstract
Directed acyclic graphs (DAGs) have been used to describe causal relationships between variables. The standard method for determining such relations uses interventional data. For complex systems with high-dimensional data, however, such interventional data are often not available. Therefore, it is desirable to estimate causal structure from observational data without subjecting variables to interventions. Observational data can be used to estimate the skeleton of a DAG and the directions of a limited number of edges. We develop a Bayesian framework to estimate a DAG using surrogate interventional data, where the interventions are applied to a set of external variables, and thus such interventions are considered to be surrogate interventions on the variables of interest. Our work is motivated by expression quantitative trait locus (eQTL) studies, where the variables of interest are the expression of genes, the external variables are DNA variations, and interventions are applied to DNA variants during the process of a randomly selected DNA allele being passed to a child from either parent. Our method, surrogate intervention recovery of a DAG ($\texttt{sirDAG}$), first constructs a DAG skeleton using penalized regressions and the subsequent partial correlation tests, and then estimates the posterior probabilities of all the edge directions after incorporating DNA variant data. We demonstrate the utilities of $\texttt{sirDAG}$ by simulation and an application to an eQTL study for 550 breast cancer patients.
Collapse
Affiliation(s)
- Min Jin Ha
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, 1515 Holcombe Boulevard, Houston, TX, USA
| | - Wei Sun
- Program in Biostatistics and Bioinformatics, Fred Hutchinson Cancer Research Center, 1100 Fairview Ave N, Seattle, WA USA
| |
Collapse
|
7
|
Zheng H, Zeng Z, Wen H, Wang P, Huang C, Huang P, Chen Q, Gong D, Qiu X. Application of Genome-Wide Association Studies in Coronary Artery Disease. Curr Pharm Des 2020; 25:4274-4286. [PMID: 31692429 DOI: 10.2174/1381612825666191105125148] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2019] [Accepted: 10/30/2019] [Indexed: 01/10/2023]
Abstract
Coronary artery disease (CAD) is a complex disease caused by the combination of environmental and genetic factors. It is one of the leading causes of death and disability in the world. Much research has been focussed on CAD genetic mechanism. In recent years, genome-wide association study (GWAS) has developed rapidly around the world. Medical researchers around the world have successfully discovered a series of CAD genetic susceptibility genes or susceptible loci using medical research strategies, leading CAD research toward a new stage. This paper briefly summarizes the important progress made by GWAS for CAD in the world in recent years, and then analyzes the challenges faced by GWAS at this stage and the development trend of future research, to promote the transformation of genetic research results into clinical practice and provide guidance for further exploration of the genetic mechanism of CAD.
Collapse
Affiliation(s)
- Huilei Zheng
- Department of Medical Examination & Health Management, First Affiliated Hospital of Guangxi Medical University, Nanning, Guangxi, China.,Guangxi Key Laboratory of Precision Medicine in Cardio-cerebrovascular Diseases Control and Prevention, Nanning, Guangxi, China.,Guangxi Clinical Research Center for Cardio-cerebrovascular Diseases, Nanning, Guangxi, China
| | - Zhiyu Zeng
- Guangxi Key Laboratory of Precision Medicine in Cardio-cerebrovascular Diseases Control and Prevention, Nanning, Guangxi, China.,Guangxi Clinical Research Center for Cardio-cerebrovascular Diseases, Nanning, Guangxi, China.,Elderly Cardiology Ward, First Affiliated Hospital of Guangxi Medical University, Nanning, Guangxi, China
| | - Hong Wen
- Guangxi Key Laboratory of Precision Medicine in Cardio-cerebrovascular Diseases Control and Prevention, Nanning, Guangxi, China.,Guangxi Clinical Research Center for Cardio-cerebrovascular Diseases, Nanning, Guangxi, China.,Elderly Comprehensive Ward, First Affiliated Hospital of Guangxi Medical University, Nanning, Guangxi, China
| | - Peng Wang
- Department of Medical Examination & Health Management, First Affiliated Hospital of Guangxi Medical University, Nanning, Guangxi, China
| | - Chunxia Huang
- Department of Medical Examination & Health Management, First Affiliated Hospital of Guangxi Medical University, Nanning, Guangxi, China
| | - Ping Huang
- Department of Medical Examination & Health Management, First Affiliated Hospital of Guangxi Medical University, Nanning, Guangxi, China
| | - Qingyun Chen
- Department of Medical Examination & Health Management, First Affiliated Hospital of Guangxi Medical University, Nanning, Guangxi, China
| | - Danping Gong
- Guangxi Key Laboratory of Precision Medicine in Cardio-cerebrovascular Diseases Control and Prevention, Nanning, Guangxi, China.,Guangxi Clinical Research Center for Cardio-cerebrovascular Diseases, Nanning, Guangxi, China.,Elderly Cardiology Ward, First Affiliated Hospital of Guangxi Medical University, Nanning, Guangxi, China
| | - Xiaoling Qiu
- Department of Population Health Science, Duke University School of Medicine, Durham, North Carolina, NC27708, United States.,Guangxi Key Laboratory of Precision Medicine in Cardio-cerebrovascular Diseases Control and Prevention, Nanning, Guangxi, China.,Guangxi Clinical Research Center for Cardio-cerebrovascular Diseases, Nanning, Guangxi, China
| |
Collapse
|
8
|
Matthews ML, Wang JP, Sederoff R, Chiang VL, Williams CM. Modeling cross-regulatory influences on monolignol transcripts and proteins under single and combinatorial gene knockdowns in Populus trichocarpa. PLoS Comput Biol 2020; 16:e1007197. [PMID: 32275650 PMCID: PMC7147730 DOI: 10.1371/journal.pcbi.1007197] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2019] [Accepted: 02/27/2020] [Indexed: 11/18/2022] Open
Abstract
Accurate manipulation of metabolites in monolignol biosynthesis is a key step for controlling lignin content, structure, and other wood properties important to the bioenergy and biomaterial industries. A crucial component of this strategy is predicting how single and combinatorial knockdowns of monolignol specific gene transcripts influence the abundance of monolignol proteins, which are the driving mechanisms of monolignol biosynthesis. Computational models have been developed to estimate protein abundances from transcript perturbations of monolignol specific genes. The accuracy of these models, however, is hindered by their inability to capture indirect regulatory influences on other pathway genes. Here, we examine the manifestation of these indirect influences on transgenic transcript and protein abundances, identifying putative indirect regulatory influences that occur when one or more specific monolignol pathway genes are perturbed. We created a computational model using sparse maximum likelihood to estimate the resulting monolignol transcript and protein abundances in transgenic Populus trichocarpa based on targeted knockdowns of specific monolignol genes. Using in-silico simulations of this model and root mean square error, we showed that our model more accurately estimated transcript and protein abundances, in comparison to previous models, when individual and families of monolignol genes were perturbed. We leveraged insight from the inferred network structure obtained from our model to identify potential genes, including PtrHCT, PtrCAD, and Ptr4CL, involved in post-transcriptional and/or post-translational regulation. Our model provides a useful computational tool for exploring the cascaded impact of single and combinatorial modifications of monolignol specific genes on lignin and other wood properties.
Collapse
Affiliation(s)
- Megan L. Matthews
- Department of Electrical and Computer Engineering, North Carolina State University, Raleigh, North Carolina, United States of America
| | - Jack P. Wang
- State Key Laboratory of Tree Genetics and Breeding, Northeast Forestry University, Harbin, China
- Department of Forestry and Environmental Resources, Forest Biotechnology Group, North Carolina State University, Raleigh, North Carolina, United States of America
| | - Ronald Sederoff
- Department of Forestry and Environmental Resources, Forest Biotechnology Group, North Carolina State University, Raleigh, North Carolina, United States of America
| | - Vincent L. Chiang
- State Key Laboratory of Tree Genetics and Breeding, Northeast Forestry University, Harbin, China
- Department of Forestry and Environmental Resources, Forest Biotechnology Group, North Carolina State University, Raleigh, North Carolina, United States of America
- Department of Forest Biomaterials, North Carolina State University, Raleigh, North Carolina, United States of America
| | - Cranos M. Williams
- Department of Electrical and Computer Engineering, North Carolina State University, Raleigh, North Carolina, United States of America
| |
Collapse
|
9
|
Li Y, Liu D, Li T, Zhu Y. Bayesian differential analysis of gene regulatory networks exploiting genetic perturbations. BMC Bioinformatics 2020; 21:12. [PMID: 31918656 PMCID: PMC6953167 DOI: 10.1186/s12859-019-3314-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2019] [Accepted: 12/12/2019] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Gene regulatory networks (GRNs) can be inferred from both gene expression data and genetic perturbations. Under different conditions, the gene data of the same gene set may be different from each other, which results in different GRNs. Detecting structural difference between GRNs under different conditions is of great significance for understanding gene functions and biological mechanisms. RESULTS In this paper, we propose a Bayesian Fused algorithm to jointly infer differential structures of GRNs under two different conditions. The algorithm is developed for GRNs modeled with structural equation models (SEMs), which makes it possible to incorporate genetic perturbations into models to improve the inference accuracy, so we name it BFDSEM. Different from the naive approaches that separately infer pair-wise GRNs and identify the difference from the inferred GRNs, we first re-parameterize the two SEMs to form an integrated model that takes full advantage of the two groups of gene data, and then solve the re-parameterized model by developing a novel Bayesian fused prior following the criterion that separate GRNs and differential GRN are both sparse. CONCLUSIONS Computer simulations are run on synthetic data to compare BFDSEM to two state-of-the-art joint inference algorithms: FSSEM and ReDNet. The results demonstrate that the performance of BFDSEM is comparable to FSSEM, and is generally better than ReDNet. The BFDSEM algorithm is also applied to a real data set of lung cancer and adjacent normal tissues, the yielded normal GRN and differential GRN are consistent with the reported results in previous literatures. An open-source program implementing BFDSEM is freely available in Additional file 1.
Collapse
Affiliation(s)
- Yan Li
- College of Computer Science and Technology, Jilin University, Changchun, 130012 China
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, 130012 China
| | - Dayou Liu
- College of Computer Science and Technology, Jilin University, Changchun, 130012 China
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, 130012 China
| | - Tengfei Li
- College of Computer Science and Technology, Jilin University, Changchun, 130012 China
| | - Yungang Zhu
- College of Computer Science and Technology, Jilin University, Changchun, 130012 China
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, 130012 China
| |
Collapse
|
10
|
Zhou X, Cai X. Inference of differential gene regulatory networks based on gene expression and genetic perturbation data. Bioinformatics 2020; 36:197-204. [PMID: 31263873 PMCID: PMC6956787 DOI: 10.1093/bioinformatics/btz529] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2018] [Revised: 06/09/2019] [Accepted: 06/28/2019] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Gene regulatory networks (GRNs) of the same organism can be different under different conditions, although the overall network structure may be similar. Understanding the difference in GRNs under different conditions is important to understand condition-specific gene regulation. When gene expression and other relevant data under two different conditions are available, they can be used by an existing network inference algorithm to estimate two GRNs separately, and then to identify the difference between the two GRNs. However, such an approach does not exploit the similarity in two GRNs, and may sacrifice inference accuracy. RESULTS In this paper, we model GRNs with the structural equation model (SEM) that can integrate gene expression and genetic perturbation data, and develop an algorithm named fused sparse SEM (FSSEM), to jointly infer GRNs under two conditions, and then to identify difference of the two GRNs. Computer simulations demonstrate that the FSSEM algorithm outperforms the approaches that estimate two GRNs separately. Analysis of a dataset of lung cancer and another dataset of gastric cancer with FSSEM inferred differential GRNs in cancer versus normal tissues, whose genes with largest network degrees have been reported to be implicated in tumorigenesis. The FSSEM algorithm provides a valuable tool for joint inference of two GRNs and identification of the differential GRN under two conditions. AVAILABILITY AND IMPLEMENTATION The R package fssemR implementing the FSSEM algorithm is available at https://github.com/Ivis4ml/fssemR.git. It is also available on CRAN. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Xin Zhou
- Department of Electrical and Computer Engineering, University of Miami, FL 33146, USA
| | - Xiaodong Cai
- Department of Electrical and Computer Engineering, University of Miami, FL 33146, USA
| |
Collapse
|
11
|
Inferring Gene Regulatory Networks from a Population of Yeast Segregants. Sci Rep 2019; 9:1197. [PMID: 30718595 PMCID: PMC6361976 DOI: 10.1038/s41598-018-37667-4] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2018] [Accepted: 11/30/2018] [Indexed: 12/14/2022] Open
Abstract
Constructing gene regulatory networks is crucial to unraveling the genetic architecture of complex traits and to understanding the mechanisms of diseases. On the basis of gene expression and single nucleotide polymorphism data in the yeast, Saccharomyces cerevisiae, we constructed gene regulatory networks using a two-stage penalized least squares method. A large system of structural equations via optimal prediction of a set of surrogate variables was established at the first stage, followed by consistent selection of regulatory effects at the second stage. Using this approach, we identified subnetworks that were enriched in gene ontology categories, revealing directional regulatory mechanisms controlling these biological pathways. Our mapping and analysis of expression-based quantitative trait loci uncovered a known alteration of gene expression within a biological pathway that results in regulatory effects on companion pathway genes in the phosphocholine network. In addition, we identify nodes in these gene ontology-enriched subnetworks that are coordinately controlled by transcription factors driven by trans-acting expression quantitative trait loci. Altogether, the integration of documented transcription factor regulatory associations with subnetworks defined by a system of structural equations using quantitative trait loci data is an effective means to delineate the transcriptional control of biological pathways.
Collapse
|
12
|
Jurman G, Filosi M, Visintainer R, Riccadonna S, Furlanello C. Stability in GRN Inference. Methods Mol Biol 2019; 1883:323-346. [PMID: 30547407 DOI: 10.1007/978-1-4939-8882-2_14] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
Reconstructing a gene regulatory network from one or more sets of omics measurements has been a major task of computational biology in the last 20 years. Despite an overwhelming number of algorithms proposed to solve the network inference problem either in the general scenario or in an ad-hoc tailored situation, assessing the stability of reconstruction is still an uncharted territory and exploratory studies mainly tackled theoretical aspects. We introduce here empirical stability, which is induced by variability of reconstruction as a function of data subsampling. By evaluating differences between networks that are inferred using different subsets of the same data we obtain quantitative indicators of the robustness of the algorithm, of the noise level affecting the data, and, overall, of the reliability of the reconstructed graph. We show that empirical stability can be used whenever no ground truth is available to compute a direct measure of the similarity between the inferred structure and the true network. The main ingredient here is a suite of indicators, called NetSI, providing statistics of distances between graphs generated by a given algorithm fed with different data subsets, where the chosen metric is the Hamming-Ipsen-Mikhailov (HIM) distance evaluating dissimilarity of graph topologies with shared nodes. Operatively, the NetSI family is demonstrated here on synthetic and high-throughput datasets, inferring graphs at different resolution levels (topology, direction, weight), showing how the stability indicators can be effectively used for the quantitative comparison of the stability of different reconstruction algorithms.
Collapse
Affiliation(s)
| | | | - Roberto Visintainer
- The Microsoft Research - University of Trento Centre for Computational and Systems Biology (COSBI), Rovereto, Italy
| | | | | |
Collapse
|
13
|
Condition-adaptive fused graphical lasso (CFGL): An adaptive procedure for inferring condition-specific gene co-expression network. PLoS Comput Biol 2018; 14:e1006436. [PMID: 30240439 PMCID: PMC6173447 DOI: 10.1371/journal.pcbi.1006436] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2018] [Revised: 10/05/2018] [Accepted: 08/15/2018] [Indexed: 12/14/2022] Open
Abstract
Co-expression network analysis provides useful information for studying gene regulation in biological processes. Examining condition-specific patterns of co-expression can provide insights into the underlying cellular processes activated in a particular condition. One challenge in this type of analysis is that the sample sizes in each condition are usually small, making the statistical inference of co-expression patterns highly underpowered. A joint network construction that borrows information from related structures across conditions has the potential to improve the power of the analysis. One possible approach to constructing the co-expression network is to use the Gaussian graphical model. Though several methods are available for joint estimation of multiple graphical models, they do not fully account for the heterogeneity between samples and between co-expression patterns introduced by condition specificity. Here we develop the condition-adaptive fused graphical lasso (CFGL), a data-driven approach to incorporate condition specificity in the estimation of co-expression networks. We show that this method improves the accuracy with which networks are learned. The application of this method on a rat multi-tissue dataset and The Cancer Genome Atlas (TCGA) breast cancer dataset provides interesting biological insights. In both analyses, we identify numerous modules enriched for Gene Ontology functions and observe that the modules that are upregulated in a particular condition are often involved in condition-specific activities. Interestingly, we observe that the genes strongly associated with survival time in the TCGA dataset are less likely to be network hubs, suggesting that genes associated with cancer progression are likely to govern specific functions or execute final biological functions in pathways, rather than regulating a large number of biological processes. Additionally, we observed that the tumor-specific hub genes tend to have few shared edges with normal tissue, revealing tumor-specific regulatory mechanism. Gene co-expression networks provide insights into the mechanism of cellular activity and gene regulation. Condition-specific mechanisms may be identified by constructing and comparing co-expression networks of multiple conditions. We propose a novel statistical method to jointly construct co-expression networks for gene expression profiles from multiple conditions. By using a data-driven approach to capture condition-specific co-expression patterns, this method is effective in identifying both co-expression patterns that are specific to a condition and that are common across conditions. The application of this method to real datasets reveals interesting biological insights.
Collapse
|
14
|
Zamanighomi M, Zamanian M, Kimber M, Wang Z. Gene Regulatory Network Inference from Perturbed Time-Series Expression Data via Ordered Dynamical Expansion of Non-Steady State Actors. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2018; 15:1093-1106. [PMID: 26701893 DOI: 10.1109/tcbb.2015.2509992] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
The reconstruction of gene regulatory networks from gene expression data has been the subject of intense research activity. A variety of models and methods have been developed to address different aspects of this important problem. However, these techniques are narrowly focused on particular biological and experimental platforms, and require experimental data that are typically unavailable and difficult to ascertain. The more recent availability of higher-throughput sequencing platforms, combined with more precise modes of genetic perturbation, presents an opportunity to formulate more robust and comprehensive approaches to gene network inference. Here, we propose a step-wise framework for identifying gene-gene regulatory interactions that expand from a known point of genetic or chemical perturbation using time series gene expression data. This novel approach sequentially identifies non-steady state genes post-perturbation and incorporates them into a growing series of low-complexity optimization problems. The governing ordinary differential equations of this model are rooted in the biophysics of stochastic molecular events that underlie gene regulation, delineating roles for both protein and RNA-mediated gene regulation. We show the successful application of our core algorithms for network inference using simulated and real datasets.
Collapse
|
15
|
Ni Y, Müller P, Zhu Y, Ji Y. Heterogeneous reciprocal graphical models. Biometrics 2017; 74:606-615. [PMID: 29023632 DOI: 10.1111/biom.12791] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2016] [Revised: 07/01/2017] [Accepted: 09/01/2017] [Indexed: 12/27/2022]
Abstract
We develop novel hierarchical reciprocal graphical models to infer gene networks from heterogeneous data. In the case of data that can be naturally divided into known groups, we propose to connect graphs by introducing a hierarchical prior across group-specific graphs, including a correlation on edge strengths across graphs. Thresholding priors are applied to induce sparsity of the estimated networks. In the case of unknown groups, we cluster subjects into subpopulations and jointly estimate cluster-specific gene networks, again using similar hierarchical priors across clusters. We illustrate the proposed approach by simulation studies and three applications with multiplatform genomic data for multiple cancers.
Collapse
Affiliation(s)
- Yang Ni
- Department of Statistics and Data Sciences, The University of Texas at Austin, Texas, U.S.A
| | - Peter Müller
- Department of Mathematics, The University of Texas at Austin, Texas, U.S.A
| | - Yitan Zhu
- Program for Computational Genomics and Medicine, NorthShore University HealthSystem, Illinois, U.S.A
| | - Yuan Ji
- Program for Computational Genomics and Medicine, NorthShore University HealthSystem, Illinois, U.S.A.,Department of Public Health Sciences, The University of Chicago, Illinois, U.S.A
| |
Collapse
|
16
|
Ju JH, Shenoy SA, Crystal RG, Mezey JG. An independent component analysis confounding factor correction framework for identifying broad impact expression quantitative trait loci. PLoS Comput Biol 2017; 13:e1005537. [PMID: 28505156 PMCID: PMC5448815 DOI: 10.1371/journal.pcbi.1005537] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2016] [Revised: 05/30/2017] [Accepted: 04/28/2017] [Indexed: 11/19/2022] Open
Abstract
Genome-wide expression Quantitative Trait Loci (eQTL) studies in humans have provided numerous insights into the genetics of both gene expression and complex diseases. While the majority of eQTL identified in genome-wide analyses impact a single gene, eQTL that impact many genes are particularly valuable for network modeling and disease analysis. To enable the identification of such broad impact eQTL, we introduce CONFETI: Confounding Factor Estimation Through Independent component analysis. CONFETI is designed to address two conflicting issues when searching for broad impact eQTL: the need to account for non-genetic confounding factors that can lower the power of the analysis or produce broad impact eQTL false positives, and the tendency of methods that account for confounding factors to model broad impact eQTL as non-genetic variation. The key advance of the CONFETI framework is the use of Independent Component Analysis (ICA) to identify variation likely caused by broad impact eQTL when constructing the sample covariance matrix used for the random effect in a mixed model. We show that CONFETI has better performance than other mixed model confounding factor methods when considering broad impact eQTL recovery from synthetic data. We also used the CONFETI framework and these same confounding factor methods to identify eQTL that replicate between matched twin pair datasets in the Multiple Tissue Human Expression Resource (MuTHER), the Depression Genes Networks study (DGN), the Netherlands Study of Depression and Anxiety (NESDA), and multiple tissue types in the Genotype-Tissue Expression (GTEx) consortium. These analyses identified both cis-eQTL and trans-eQTL impacting individual genes, and CONFETI had better or comparable performance to other mixed model confounding factor analysis methods when identifying such eQTL. In these analyses, we were able to identify and replicate a few broad impact eQTL although the overall number was small even when applying CONFETI. In light of these results, we discuss the broad impact eQTL that have been previously reported from the analysis of human data and suggest that considerable caution should be exercised when making biological inferences based on these reported eQTL.
Collapse
Affiliation(s)
- Jin Hyun Ju
- Department of Genetic Medicine, Weill Cornell Medical College, New York, NY, United States of America
- Institute for Computational Biomedicine, Weill Cornell Medical College, New York, NY, United States of America
| | - Sushila A. Shenoy
- Department of Genetic Medicine, Weill Cornell Medical College, New York, NY, United States of America
| | - Ronald G. Crystal
- Department of Genetic Medicine, Weill Cornell Medical College, New York, NY, United States of America
| | - Jason G. Mezey
- Department of Genetic Medicine, Weill Cornell Medical College, New York, NY, United States of America
- Institute for Computational Biomedicine, Weill Cornell Medical College, New York, NY, United States of America
- Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, NY, United States of America
- * E-mail:
| |
Collapse
|
17
|
Zhai J, Hsu CH, Daye ZJ. Ridle for sparse regression with mandatory covariates with application to the genetic assessment of histologic grades of breast cancer. BMC Med Res Methodol 2017; 17:12. [PMID: 28122498 PMCID: PMC5267467 DOI: 10.1186/s12874-017-0291-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2016] [Accepted: 01/06/2017] [Indexed: 12/13/2022] Open
Abstract
Background Many questions in statistical genomics can be formulated in terms of variable selection of candidate biological factors for modeling a trait or quantity of interest. Often, in these applications, additional covariates describing clinical, demographical or experimental effects must be included a priori as mandatory covariates while allowing the selection of a large number of candidate or optional variables. As genomic studies routinely require mandatory covariates, it is of interest to propose principled methods of variable selection that can incorporate mandatory covariates. Methods In this article, we propose the ridge-lasso hybrid estimator (ridle), a new penalized regression method that simultaneously estimates coefficients of mandatory covariates while allowing selection for others. The ridle provides a principled approach to mitigate effects of multicollinearity among the mandatory covariates and possible dependency between mandatory and optional variables. We provide detailed empirical and theoretical studies to evaluate our method. In addition, we develop an efficient algorithm for the ridle. Software, based on efficient Fortran code with R-language wrappers, is publicly and freely available at https://sites.google.com/site/zhongyindaye/software. Results The ridle is useful when mandatory predictors are known to be significant due to prior knowledge or must be kept for additional analysis. Both theoretical and comprehensive simulation studies have shown that the ridle to be advantageous when mandatory covariates are correlated with the irrelevant optional predictors or are highly correlated among themselves. A microarray gene expression analysis of the histologic grades of breast cancer has identified 24 genes, in which 2 genes are selected only by the ridle among current methods and found to be associated with tumor grade. Conclusions In this article, we proposed the ridle as a principled sparse regression method for the selection of optional variables while incorporating mandatory ones. Results suggest that the ridle is advantageous when mandatory covariates are correlated with the irrelevant optional predictors or are highly correlated among themselves. Electronic supplementary material The online version of this article (doi:10.1186/s12874-017-0291-y) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Jing Zhai
- Epidemiology and Biostatistics Department, University of Arizona, Tucson, USA
| | - Chiu-Hsieh Hsu
- Epidemiology and Biostatistics Department, University of Arizona, Tucson, USA
| | - Z John Daye
- Epidemiology and Biostatistics Department, University of Arizona, Tucson, USA.
| |
Collapse
|
18
|
Richardson S, Tseng GC, Sun W. Statistical Methods in Integrative Genomics. ANNUAL REVIEW OF STATISTICS AND ITS APPLICATION 2016; 3:181-209. [PMID: 27482531 PMCID: PMC4963036 DOI: 10.1146/annurev-statistics-041715-033506] [Citation(s) in RCA: 55] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/28/2023]
Abstract
Statistical methods in integrative genomics aim to answer important biology questions by jointly analyzing multiple types of genomic data (vertical integration) or aggregating the same type of data across multiple studies (horizontal integration). In this article, we introduce different types of genomic data and data resources, and then review statistical methods of integrative genomics, with emphasis on the motivation and rationale of these methods. We conclude with some summary points and future research directions.
Collapse
Affiliation(s)
- Sylvia Richardson
- MRC Biostatistics Unit, Cambridge Institute of Public Health, University of Cambridge, CB2 0SR, United Kingdom
| | - George C. Tseng
- Department of Biostatistics, University of Pittsburgh, Pittsburgh, PA 15261
| | - Wei Sun
- Department of Biostatistics, Department of Genetics, University of North Carolina, Chapel Hill, NC 27599
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, Washington 27516
| |
Collapse
|
19
|
Mohammadi M, Sharifi Noghabi H, Abed Hodtani G, Rajabi Mashhadi H. Robust and stable gene selection via Maximum–Minimum Correntropy Criterion. Genomics 2016; 107:83-87. [DOI: 10.1016/j.ygeno.2015.12.006] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2015] [Revised: 12/13/2015] [Accepted: 12/23/2015] [Indexed: 11/17/2022]
|
20
|
Yazdani A, Yazdani A, Samiei A, Boerwinkle E. Generating a robust statistical causal structure over 13 cardiovascular disease risk factors using genomics data. J Biomed Inform 2016; 60:114-9. [PMID: 26827624 DOI: 10.1016/j.jbi.2016.01.012] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2015] [Revised: 01/19/2016] [Accepted: 01/22/2016] [Indexed: 10/22/2022]
Abstract
Understanding causal relationships among large numbers of variables is a fundamental goal of biomedical sciences and can be facilitated by Directed Acyclic Graphs (DAGs) where directed edges between nodes represent the influence of components of the system on each other. In an observational setting, some of the directions are often unidentifiable because of Markov equivalency. Additional exogenous information, such as expert knowledge or genotype data can help establish directionality among the endogenous variables. In this study, we use the method of principle component analysis to extract information across the genome in order to generate a robust statistical causal network among phenotypes, the variables of primary interest. The method is applied to 590,020 SNP genotypes measured on 1596 individuals to generate the statistical causal network of 13 cardiovascular disease risk factor phenotypes. First, principal component analysis was used to capture information across the genome. The principal components were then used to identify a robust causal network structure, GDAG, among the phenotypes. Analyzing a robust causal network over risk factors reveals the flow of information in direct and alternative paths, as well as determining predictors and good targets for intervention. For example, the analysis identified BMI as influencing multiple other risk factor phenotypes and a good target for intervention to lower disease risk.
Collapse
Affiliation(s)
- Azam Yazdani
- Human Genetics Center, UTHealth School of Public Health, 1200 Pressler Street, Suite E-447, Houston, TX 77030, United States.
| | - Akram Yazdani
- Human Genetics Center, UTHealth School of Public Health, 1200 Pressler Street, Suite E-447, Houston, TX 77030, United States
| | - Ahmad Samiei
- Department of Software Systematic, D-14482 Potsdam, Germany
| | - Eric Boerwinkle
- Human Genetics Center, UTHealth School of Public Health, 1200 Pressler Street, Suite E-447, Houston, TX 77030, United States
| |
Collapse
|
21
|
Petralia F, Wang P, Yang J, Tu Z. Integrative random forest for gene regulatory network inference. Bioinformatics 2015; 31:i197-205. [PMID: 26072483 PMCID: PMC4542785 DOI: 10.1093/bioinformatics/btv268] [Citation(s) in RCA: 100] [Impact Index Per Article: 11.1] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open
Abstract
Motivation: Gene regulatory network (GRN) inference based on genomic data is one of the most actively pursued computational biological problems. Because different types of biological data usually provide complementary information regarding the underlying GRN, a model that integrates big data of diverse types is expected to increase both the power and accuracy of GRN inference. Towards this goal, we propose a novel algorithm named iRafNet: integrative random forest for gene regulatory network inference. Results: iRafNet is a flexible, unified integrative framework that allows information from heterogeneous data, such as protein–protein interactions, transcription factor (TF)-DNA-binding, gene knock-down, to be jointly considered for GRN inference. Using test data from the DREAM4 and DREAM5 challenges, we demonstrate that iRafNet outperforms the original random forest based network inference algorithm (GENIE3), and is highly comparable to the community learning approach. We apply iRafNet to construct GRN in Saccharomyces cerevisiae and demonstrate that it improves the performance in predicting TF-target gene regulations and provides additional functional insights to the predicted gene regulations. Availability and implementation: The R code of iRafNet implementation and a tutorial are available at: http://research.mssm.edu/tulab/software/irafnet.html Contact:zhidong.tu@mssm.edu Supplementary information: Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Francesca Petralia
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Pei Wang
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Jialiang Yang
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Zhidong Tu
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| |
Collapse
|
22
|
Mohamed Salleh FH, Arif SM, Zainudin S, Firdaus-Raih M. Reconstructing gene regulatory networks from knock-out data using Gaussian Noise Model and Pearson Correlation Coefficient. Comput Biol Chem 2015; 59 Pt B:3-14. [DOI: 10.1016/j.compbiolchem.2015.04.012] [Citation(s) in RCA: 40] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2014] [Revised: 04/16/2015] [Accepted: 04/27/2015] [Indexed: 11/26/2022]
|
23
|
Fear JM, Arbeitman MN, Salomon MP, Dalton JE, Tower J, Nuzhdin SV, McIntyre LM. The Wright stuff: reimagining path analysis reveals novel components of the sex determination hierarchy in Drosophila melanogaster. BMC SYSTEMS BIOLOGY 2015; 9:53. [PMID: 26335107 PMCID: PMC4558766 DOI: 10.1186/s12918-015-0200-0] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/19/2015] [Accepted: 08/20/2015] [Indexed: 11/10/2022]
Abstract
BACKGROUND The Drosophila sex determination hierarchy is a classic example of a transcriptional regulatory hierarchy, with sex-specific isoforms regulating morphology and behavior. We use a structural equation modeling approach, leveraging natural genetic variation from two studies on Drosophila female head tissues--DSPR collection (596 F1-hybrids from crosses between DSPR sub-populations) and CEGS population (75 F1-hybrids from crosses between DGRP/Winters lines to a reference strain w1118)--to expand understanding of the sex hierarchy gene regulatory network (GRN). This approach is completely generalizable to any natural population, including humans. RESULTS We expanded the sex hierarchy GRN adding novel links among genes, including a link from fruitless (fru) to Sex-lethal (Sxl) identified in both populations. This link is further supported by the presence of fru binding sites in the Sxl locus. 754 candidate genes were added to the pathway, including the splicing factors male-specific lethal 2 and Rm62 as downstream targets of Sxl which are well-supported links in males. Independent studies of doublesex and transformer mutants support many additions, including evidence for a link between the sex hierarchy and metabolism, via Insulin-like receptor. CONCLUSIONS The genes added in the CEGS population were enriched for genes with sex-biased splicing and components of the spliceosome. A common goal of molecular biologists is to expand understanding about regulatory interactions among genes. Using natural alleles we can not only identify novel relationships, but using supervised approaches can order genes into a regulatory hierarchy. Combining these results with independent large effect mutation studies, allows clear candidates for detailed molecular follow-up to emerge.
Collapse
Affiliation(s)
- Justin M Fear
- Department of Molecular Genetics and Microbiology, University of Florida, CGRC Room 116, PO Box 100266, FL 32610-0266, Gainesville, FL, USA.
| | | | - Matthew P Salomon
- Molecular and Computational Biology, University of California, Los Angeles, CA, USA.
| | - Justin E Dalton
- Biomedical Science, Florida State University, Tallahassee, FL, USA.
| | - John Tower
- Molecular and Computational Biology, University of California, Los Angeles, CA, USA.
| | - Sergey V Nuzhdin
- Molecular and Computational Biology, University of California, Los Angeles, CA, USA.
| | - Lauren M McIntyre
- Department of Molecular Genetics and Microbiology, University of Florida, CGRC Room 116, PO Box 100266, FL 32610-0266, Gainesville, FL, USA.
| |
Collapse
|
24
|
Bayesian network reconstruction using systems genetics data: comparison of MCMC methods. Genetics 2015; 199:973-89. [PMID: 25631319 PMCID: PMC4391572 DOI: 10.1534/genetics.114.172619] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2014] [Accepted: 01/26/2015] [Indexed: 12/23/2022] Open
Abstract
Reconstructing biological networks using high-throughput technologies has the potential to produce condition-specific interactomes. But are these reconstructed networks a reliable source of biological interactions? Do some network inference methods offer dramatically improved performance on certain types of networks? To facilitate the use of network inference methods in systems biology, we report a large-scale simulation study comparing the ability of Markov chain Monte Carlo (MCMC) samplers to reverse engineer Bayesian networks. The MCMC samplers we investigated included foundational and state-of-the-art Metropolis-Hastings and Gibbs sampling approaches, as well as novel samplers we have designed. To enable a comprehensive comparison, we simulated gene expression and genetics data from known network structures under a range of biologically plausible scenarios. We examine the overall quality of network inference via different methods, as well as how their performance is affected by network characteristics. Our simulations reveal that network size, edge density, and strength of gene-to-gene signaling are major parameters that differentiate the performance of various samplers. Specifically, more recent samplers including our novel methods outperform traditional samplers for highly interconnected large networks with strong gene-to-gene signaling. Our newly developed samplers show comparable or superior performance to the top existing methods. Moreover, this performance gain is strongest in networks with biologically oriented topology, which indicates that our novel samplers are suitable for inferring biological networks. The performance of MCMC samplers in this simulation framework can guide the choice of methods for network reconstruction using systems genetics data.
Collapse
|
25
|
Abraham G, Bhalala OG, de Bakker PIW, Ripatti S, Inouye M. Towards a molecular systems model of coronary artery disease. Curr Cardiol Rep 2015; 16:488. [PMID: 24743898 PMCID: PMC4050311 DOI: 10.1007/s11886-014-0488-1] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 01/05/2023]
Abstract
Coronary artery disease (CAD) is a complex disease driven by myriad interactions of genetics and environmental factors. Traditionally, studies have analyzed only 1 disease factor at a time, providing useful but limited understanding of the underlying etiology. Recent advances in cost-effective and high-throughput technologies, such as single nucleotide polymorphism (SNP) genotyping, exome/genome/RNA sequencing, gene expression microarrays, and metabolomics assays have enabled the collection of millions of data points in many thousands of individuals. In order to make sense of such 'omics' data, effective analytical methods are needed. We review and highlight some of the main results in this area, focusing on integrative approaches that consider multiple modalities simultaneously. Such analyses have the potential to uncover the genetic basis of CAD, produce genomic risk scores (GRS) for disease prediction, disentangle the complex interactions underlying disease, and predict response to treatment.
Collapse
Affiliation(s)
- Gad Abraham
- Medical Systems Biology, Department of Pathology and Department of Microbiology & Immunology, The University of Melbourne, Parkville, Victoria, 3010, Australia
| | | | | | | | | |
Collapse
|
26
|
Wang H, Paulo J, Kruijer W, Boer M, Jansen H, Tikunov Y, Usadel B, van Heusden S, Bovy A, van Eeuwijk F. Genotype–phenotype modeling considering intermediate level of biological variation: a case study involving sensory traits, metabolites and QTLs in ripe tomatoes. MOLECULAR BIOSYSTEMS 2015; 11:3101-10. [DOI: 10.1039/c5mb00477b] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
We integrate Gaussian graphical modelling and causal inference to infer dependency networks from multilevel phenotypic and omics data.
Collapse
Affiliation(s)
- Huange Wang
- Biometris
- Wageningen University and Research Centre
- 6700AA Wageningen
- The Netherlands
| | - Joao Paulo
- Biometris
- Wageningen University and Research Centre
- 6700AA Wageningen
- The Netherlands
| | - Willem Kruijer
- Biometris
- Wageningen University and Research Centre
- 6700AA Wageningen
- The Netherlands
| | - Martin Boer
- Biometris
- Wageningen University and Research Centre
- 6700AA Wageningen
- The Netherlands
| | - Hans Jansen
- Biometris
- Wageningen University and Research Centre
- 6700AA Wageningen
- The Netherlands
| | - Yury Tikunov
- Plant Research International
- 6700AJ Wageningen
- The Netherlands
| | - Björn Usadel
- Institute for Biology I
- RWTH Aachen University
- 52074 Aachen
- Germany
| | | | - Arnaud Bovy
- Plant Research International
- 6700AJ Wageningen
- The Netherlands
| | - Fred van Eeuwijk
- Biometris
- Wageningen University and Research Centre
- 6700AA Wageningen
- The Netherlands
| |
Collapse
|
27
|
Wang H, van Eeuwijk FA. A new method to infer causal phenotype networks using QTL and phenotypic information. PLoS One 2014; 9:e103997. [PMID: 25144184 PMCID: PMC4140682 DOI: 10.1371/journal.pone.0103997] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2014] [Accepted: 07/06/2014] [Indexed: 11/25/2022] Open
Abstract
In the context of genetics and breeding research on multiple phenotypic traits, reconstructing the directional or causal structure between phenotypic traits is a prerequisite for quantifying the effects of genetic interventions on the traits. Current approaches mainly exploit the genetic effects at quantitative trait loci (QTLs) to learn about causal relationships among phenotypic traits. A requirement for using these approaches is that at least one unique QTL has been identified for each trait studied. However, in practice, especially for molecular phenotypes such as metabolites, this prerequisite is often not met due to limited sample sizes, high noise levels and small QTL effects. Here, we present a novel heuristic search algorithm called the QTL+phenotype supervised orientation (QPSO) algorithm to infer causal directions for edges in undirected phenotype networks. The two main advantages of this algorithm are: first, it does not require QTLs for each and every trait; second, it takes into account associated phenotypic interactions in addition to detected QTLs when orienting undirected edges between traits. We evaluate and compare the performance of QPSO with another state-of-the-art approach, the QTL-directed dependency graph (QDG) algorithm. Simulation results show that our method has broader applicability and leads to more accurate overall orientations. We also illustrate our method with a real-life example involving 24 metabolites and a few major QTLs measured on an association panel of 93 tomato cultivars. Matlab source code implementing the proposed algorithm is freely available upon request.
Collapse
Affiliation(s)
- Huange Wang
- Biometris, Department of Plant Sciences, Wageningen University, Wageningen, The Netherlands
| | - Fred A. van Eeuwijk
- Biometris, Department of Plant Sciences, Wageningen University, Wageningen, The Netherlands
- Centre for BioSystems Genomics, Wageningen, The Netherlands
- Netherlands Metabolomics Centre, Leiden, The Netherlands
| |
Collapse
|
28
|
Inference of SNP-gene regulatory networks by integrating gene expressions and genetic perturbations. BIOMED RESEARCH INTERNATIONAL 2014; 2014:629697. [PMID: 25136606 PMCID: PMC4127230 DOI: 10.1155/2014/629697] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/28/2014] [Accepted: 05/09/2014] [Indexed: 11/18/2022]
Abstract
In order to elucidate the overall relationships between gene expressions and genetic perturbations, we propose a network inference method to infer gene regulatory network where single nucleotide polymorphism (SNP) is involved as a regulator of genes. In the most of the network inferences named as SNP-gene regulatory network (SGRN) inference, pairs of SNP-gene are given by separately performing expression quantitative trait loci (eQTL) mappings. In this paper, we propose a SGRN inference method without predefined eQTL information assuming a gene is regulated by a single SNP at most. To evaluate the performance, the proposed method was applied to random data generated from synthetic networks and parameters. There are three main contributions. First, the proposed method provides both the gene regulatory inference and the eQTL identification. Second, the experimental results demonstrated that integration of multiple methods can produce competitive performances. Lastly, the proposed method was also applied to psychiatric disorder data in order to explore how the method works with real data.
Collapse
|
29
|
Wang J, Yu H, Weng X, Xie W, Xu C, Li X, Xiao J, Zhang Q. An expression quantitative trait loci-guided co-expression analysis for constructing regulatory network using a rice recombinant inbred line population. JOURNAL OF EXPERIMENTAL BOTANY 2014; 65:1069-79. [PMID: 24420573 PMCID: PMC3935569 DOI: 10.1093/jxb/ert464] [Citation(s) in RCA: 52] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/18/2023]
Abstract
The ability to reveal the regulatory architecture of genes at the whole-genome level by constructing a regulatory network is critical for understanding the biological processes and developmental programmes of organisms. Here, we conducted an eQTL-guided function-related co-expression analysis to identify the putative regulators and construct gene regulatory network. We performed an eQTL analysis of 210 recombinant inbred lines (RILs) derived from a cross between two indica rice lines, Zhenshan 97 and Minghui 63, the parents of an elite hybrid, using data obtained by hybridizing RNA samples of flag leaves at the heading stage with Affymetrix whole-genome arrays. Making use of an ultrahigh-density single-nucleotide polymorphism bin map constructed by population sequencing, 13 647 eQTLs for 10 725 e-traits were detected, comprising 5079 cis-eQTLs (37.2%) and 8568 trans-eQTLs (62.8%). The analysis revealed 138 trans-eQTLs hotspots, each of which apparently regulates the expression variations of many genes. Co-expression analysis of functionally related genes within the framework of regulator-target relationships outlined by the eQTLs led to the identification of putative regulators in the system. The usefulness of the strategy was demonstrated with the genes known to be involved in flowering. We also applied this strategy to the analysis of QTLs for yield traits, which also suggested likely candidate genes. eQTL-guided co-expression analysis may provide a promising solution for outlining a framework for the complex regulatory network of an organism.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | - Qifa Zhang
- * To whom correspondence should be addressed.
| |
Collapse
|
30
|
Filosi M, Visintainer R, Riccadonna S, Jurman G, Furlanello C. Stability indicators in network reconstruction. PLoS One 2014; 9:e89815. [PMID: 24587057 PMCID: PMC3937450 DOI: 10.1371/journal.pone.0089815] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2013] [Accepted: 01/27/2014] [Indexed: 11/20/2022] Open
Abstract
The number of available algorithms to infer a biological network from a dataset of high-throughput measurements is overwhelming and keeps growing. However, evaluating their performance is unfeasible unless a ‘gold standard’ is available to measure how close the reconstructed network is to the ground truth. One measure of this is the stability of these predictions to data resampling approaches. We introduce NetSI, a family of Network Stability Indicators, to assess quantitatively the stability of a reconstructed network in terms of inference variability due to data subsampling. In order to evaluate network stability, the main NetSI methods use a global/local network metric in combination with a resampling (bootstrap or cross-validation) procedure. In addition, we provide two normalized variability scores over data resampling to measure edge weight stability and node degree stability, and then introduce a stability ranking for edges and nodes. A complete implementation of the NetSI indicators, including the Hamming-Ipsen-Mikhailov (HIM) network distance adopted in this paper is available with the R package nettools. We demonstrate the use of the NetSI family by measuring network stability on four datasets against alternative network reconstruction methods. First, the effect of sample size on stability of inferred networks is studied in a gold standard framework on yeast-like data from the Gene Net Weaver simulator. We also consider the impact of varying modularity on a set of structurally different networks (50 nodes, from 2 to 10 modules), and then of complex feature covariance structure, showing the different behaviours of standard reconstruction methods based on Pearson correlation, Maximum Information Coefficient (MIC) and False Discovery Rate (FDR) strategy. Finally, we demonstrate a strong combined effect of different reconstruction methods and phenotype subgroups on a hepatocellular carcinoma miRNA microarray dataset (240 subjects), and we validate the analysis on a second dataset (166 subjects) with good reproducibility.
Collapse
Affiliation(s)
- Michele Filosi
- MPBA/Center for Information and Communication Technology, Fondazione Bruno Kessler, Trento, Italy
- CIBIO, University of Trento, Trento, Italy
| | - Roberto Visintainer
- MPBA/Center for Information and Communication Technology, Fondazione Bruno Kessler, Trento, Italy
| | - Samantha Riccadonna
- Department of Computational Biology, Research and Innovation Centre, Fondazione Edmund Mach (FEM), San Michele all'Adige, Italy
| | - Giuseppe Jurman
- MPBA/Center for Information and Communication Technology, Fondazione Bruno Kessler, Trento, Italy
- * E-mail:
| | - Cesare Furlanello
- MPBA/Center for Information and Communication Technology, Fondazione Bruno Kessler, Trento, Italy
| |
Collapse
|
31
|
Zhang L, Kim S. Learning gene networks under SNP perturbations using eQTL datasets. PLoS Comput Biol 2014; 10:e1003420. [PMID: 24586125 PMCID: PMC3937098 DOI: 10.1371/journal.pcbi.1003420] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2013] [Accepted: 11/18/2013] [Indexed: 11/23/2022] Open
Abstract
The standard approach for identifying gene networks is based on experimental perturbations of gene regulatory systems such as gene knock-out experiments, followed by a genome-wide profiling of differential gene expressions. However, this approach is significantly limited in that it is not possible to perturb more than one or two genes simultaneously to discover complex gene interactions or to distinguish between direct and indirect downstream regulations of the differentially-expressed genes. As an alternative, genetical genomics study has been proposed to treat naturally-occurring genetic variants as potential perturbants of gene regulatory system and to recover gene networks via analysis of population gene-expression and genotype data. Despite many advantages of genetical genomics data analysis, the computational challenge that the effects of multifactorial genetic perturbations should be decoded simultaneously from data has prevented a widespread application of genetical genomics analysis. In this article, we propose a statistical framework for learning gene networks that overcomes the limitations of experimental perturbation methods and addresses the challenges of genetical genomics analysis. We introduce a new statistical model, called a sparse conditional Gaussian graphical model, and describe an efficient learning algorithm that simultaneously decodes the perturbations of gene regulatory system by a large number of SNPs to identify a gene network along with expression quantitative trait loci (eQTLs) that perturb this network. While our statistical model captures direct genetic perturbations of gene network, by performing inference on the probabilistic graphical model, we obtain detailed characterizations of how the direct SNP perturbation effects propagate through the gene network to perturb other genes indirectly. We demonstrate our statistical method using HapMap-simulated and yeast eQTL datasets. In particular, the yeast gene network identified computationally by our method under SNP perturbations is well supported by the results from experimental perturbation studies related to DNA replication stress response.
Collapse
Affiliation(s)
- Lingxue Zhang
- Lane Center for Computational Biology, School of Computer Science, Carnegie Mellon University, Pittsburgh, Pennsylvania, United States of America
| | - Seyoung Kim
- Lane Center for Computational Biology, School of Computer Science, Carnegie Mellon University, Pittsburgh, Pennsylvania, United States of America
| |
Collapse
|
32
|
Dong Z, Song T, Yuan C. Inference of gene regulatory networks from genetic perturbations with linear regression model. PLoS One 2013; 8:e83263. [PMID: 24376676 PMCID: PMC3871530 DOI: 10.1371/journal.pone.0083263] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2013] [Accepted: 11/01/2013] [Indexed: 11/19/2022] Open
Abstract
It is an effective strategy to use both genetic perturbation data and gene expression data to infer regulatory networks that aims to improve the detection accuracy of the regulatory relationships among genes. Based on both types of data, the genetic regulatory networks can be accurately modeled by Structural Equation Modeling (SEM). In this paper, a linear regression (LR) model is formulated based on the SEM, and a novel iterative scheme using Bayesian inference is proposed to estimate the parameters of the LR model (LRBI). Comparative evaluations of LRBI with other two algorithms, the Adaptive Lasso (AL-Based) and the Sparsity-aware Maximum Likelihood (SML), are also presented. Simulations show that LRBI has significantly better performance than AL-Based, and overperforms SML in terms of power of detection. Applying the LRBI algorithm to experimental data, we inferred the interactions in a network of 35 yeast genes. An open-source program of the LRBI algorithm is freely available upon request.
Collapse
Affiliation(s)
- Zijian Dong
- School of Electronic Engineering, Huaihai Institute of Technology, Lianyungang, Jiangsu, China ; School of Information Science and Engineering, Southeast University, Nanjing, Jiangsu, China
| | - Tiecheng Song
- School of Information Science and Engineering, Southeast University, Nanjing, Jiangsu, China
| | - Chuang Yuan
- Department of Health Technology and Informatics, The Hong Kong Polytechnic University, Hang Kong, China
| |
Collapse
|
33
|
Peng CH, Jiang YZ, Tai AS, Liu CB, Peng SC, Liao CT, Yen TC, Hsieh WP. Causal inference of gene regulation with subnetwork assembly from genetical genomics data. Nucleic Acids Res 2013; 42:2803-19. [PMID: 24322297 PMCID: PMC3950678 DOI: 10.1093/nar/gkt1277] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
Deciphering the causal networks of gene interactions is critical for identifying disease pathways and disease-causing genes. We introduce a method to reconstruct causal networks based on exploring phenotype-specific modules in the human interactome and including the expression quantitative trait loci (eQTLs) that underlie the joint expression variation of each module. Closely associated eQTLs help anchor the orientation of the network. To overcome the inherent computational complexity of causal network reconstruction, we first deduce the local causality of individual subnetworks using the selected eQTLs and module transcripts. These subnetworks are then integrated to infer a global causal network using a random-field ranking method, which was motivated by animal sociology. We demonstrate how effectively the inferred causality restores the regulatory structure of the networks that mediate lymph node metastasis in oral cancer. Network rewiring clearly characterizes the dynamic regulatory systems of distinct disease states. This study is the first to associate an RXRB-causal network with increased risks of nodal metastasis, tumor relapse, distant metastases and poor survival for oral cancer. Thus, identifying crucial upstream drivers of a signal cascade can facilitate the discovery of potential biomarkers and effective therapeutic targets.
Collapse
Affiliation(s)
- Chien-Hua Peng
- Departments of Resource Center for Clinical Research, Chang Gung Memorial Hospital, Taoyuan 33305, Taiwan, Republic of China, Institute of Statistics, National Tsing Hua University, Hsinchu 30013, Taiwan, Republic of China, Nuclear Medicine and Molecular Imaging Center, Chang Gung Memorial Hospital, Taoyuan 33305, Taiwan, Republic of China and Department of Otorhinolaryngology, Head and Neck Surgery, Chang Gung Memorial Hospital, Taoyuan 33305, Taiwan, Republic of China
| | | | | | | | | | | | | | | |
Collapse
|
34
|
Cai X, Bazerque JA, Giannakis GB. Inference of gene regulatory networks with sparse structural equation models exploiting genetic perturbations. PLoS Comput Biol 2013; 9:e1003068. [PMID: 23717196 PMCID: PMC3662697 DOI: 10.1371/journal.pcbi.1003068] [Citation(s) in RCA: 48] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2012] [Accepted: 03/28/2013] [Indexed: 12/22/2022] Open
Abstract
Integrating genetic perturbations with gene expression data not only improves accuracy of regulatory network topology inference, but also enables learning of causal regulatory relations between genes. Although a number of methods have been developed to integrate both types of data, the desiderata of efficient and powerful algorithms still remains. In this paper, sparse structural equation models (SEMs) are employed to integrate both gene expression data and cis-expression quantitative trait loci (cis-eQTL), for modeling gene regulatory networks in accordance with biological evidence about genes regulating or being regulated by a small number of genes. A systematic inference method named sparsity-aware maximum likelihood (SML) is developed for SEM estimation. Using simulated directed acyclic or cyclic networks, the SML performance is compared with that of two state-of-the-art algorithms: the adaptive Lasso (AL) based scheme, and the QTL-directed dependency graph (QDG) method. Computer simulations demonstrate that the novel SML algorithm offers significantly better performance than the AL-based and QDG algorithms across all sample sizes from 100 to 1,000, in terms of detection power and false discovery rate, in all the cases tested that include acyclic or cyclic networks of 10, 30 and 300 genes. The SML method is further applied to infer a network of 39 human genes that are related to the immune function and are chosen to have a reliable eQTL per gene. The resulting network consists of 9 genes and 13 edges. Most of the edges represent interactions reasonably expected from experimental evidence, while the remaining may just indicate the emergence of new interactions. The sparse SEM and efficient SML algorithm provide an effective means of exploiting both gene expression and perturbation data to infer gene regulatory networks. An open-source computer program implementing the SML algorithm is freely available upon request.
Collapse
Affiliation(s)
- Xiaodong Cai
- Department of Electrical and Computer Engineering, University of Miami, Coral Gables, FL, USA.
| | | | | |
Collapse
|
35
|
Abstract
Omics, including genomics, proteomics, and metabolomics, enable us to explain symbioses in terms of the underlying molecules and their interactions. The central task is to transform molecular catalogs of genes, metabolites, etc., into a dynamic understanding of symbiosis function. We review four exemplars of omics studies that achieve this goal, through defined biological questions relating to metabolic integration and regulation of animal-microbial symbioses, the genetic autonomy of bacterial symbionts, and symbiotic protection of animal hosts from pathogens. As omic datasets become increasingly complex, computationally sophisticated downstream analyses are essential to reveal interactions not evident from visual inspection of the data. We discuss two approaches, phylogenomics and transcriptional clustering, that can divide the primary output of omics studies-long lists of factors-into manageable subsets, and we describe how they have been applied to analyze large datasets and generate testable hypotheses.
Collapse
Affiliation(s)
- J Chaston
- Department of Entomology, Comstock Hall, Cornell University, Ithaca, New York 14853, USA
| | | |
Collapse
|
36
|
Logsdon BA, Hoffman GE, Mezey JG. Mouse obesity network reconstruction with a variational Bayes algorithm to employ aggressive false positive control. BMC Bioinformatics 2012; 13:53. [PMID: 22471599 PMCID: PMC3338387 DOI: 10.1186/1471-2105-13-53] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2011] [Accepted: 04/02/2012] [Indexed: 12/16/2022] Open
Abstract
BACKGROUND We propose a novel variational Bayes network reconstruction algorithm to extract the most relevant disease factors from high-throughput genomic data-sets. Our algorithm is the only scalable method for regularized network recovery that employs Bayesian model averaging and that can internally estimate an appropriate level of sparsity to ensure few false positives enter the model without the need for cross-validation or a model selection criterion. We use our algorithm to characterize the effect of genetic markers and liver gene expression traits on mouse obesity related phenotypes, including weight, cholesterol, glucose, and free fatty acid levels, in an experiment previously used for discovery and validation of network connections: an F2 intercross between the C57BL/6 J and C3H/HeJ mouse strains, where apolipoprotein E is null on the background. RESULTS We identified eleven genes, Gch1, Zfp69, Dlgap1, Gna14, Yy1, Gabarapl1, Folr2, Fdft1, Cnr2, Slc24a3, and Ccl19, and a quantitative trait locus directly connected to weight, glucose, cholesterol, or free fatty acid levels in our network. None of these genes were identified by other network analyses of this mouse intercross data-set, but all have been previously associated with obesity or related pathologies in independent studies. In addition, through both simulations and data analysis we demonstrate that our algorithm achieves superior performance in terms of power and type I error control than other network recovery algorithms that use the lasso and have bounds on type I error control. CONCLUSIONS Our final network contains 118 previously associated and novel genes affecting weight, cholesterol, glucose, and free fatty acid levels that are excellent obesity risk candidates.
Collapse
Affiliation(s)
- Benjamin A Logsdon
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, Washington, USA
- Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York, USA
| | - Gabriel E Hoffman
- Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York, USA
| | - Jason G Mezey
- Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York, USA
- Department of Genetic Medicine, Weill Cornell Medical College, New York, New York, USA
| |
Collapse
|
37
|
Environmental and genetic perturbations reveal different networks of metabolic regulation. Mol Syst Biol 2011; 7:563. [PMID: 22186737 PMCID: PMC3738848 DOI: 10.1038/msb.2011.96] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2011] [Accepted: 10/25/2011] [Indexed: 11/12/2022] Open
Abstract
Measurement of metabolic and physiological parameters in replicated crosses of Drosophila melanogaster inbred lines reveals that environmental and genetic perturbations uncover substantially different networks of metabolic regulation. ![]()
We collected extensive data on enzyme activities and physiological parameters from replicated crosses of D. melanogaster inbred lines. We implemented a multivariate hierarchical Bayesian model to separately assess genetic and environmental covariation among system components and infer metabolic regulatory networks. Networks revealed by both environmental and genetic perturbations are similar among populations and between sexes. Environmental and genetic networks differ substantially, suggesting that environmental changes and mutations would have different systemic effects even when their primary targets are the same.
Progress in systems biology depends on accurate descriptions of biological networks. Connections in a regulatory network are identified as correlations of gene expression across a set of environmental or genetic perturbations. To use this information to predict system behavior, we must test how the nature of perturbations affects topologies of networks they reveal. To probe this question, we focused on metabolism of Drosophila melanogaster. Our source of perturbations is a set of crosses among 92 wild-derived lines from five populations, replicated in a manner permitting separate assessment of the effects of genetic variation and environmental fluctuation. We directly assayed activities of enzymes and levels of metabolites. Using a multivariate Bayesian model, we estimated covariance among metabolic parameters and built fine-grained probabilistic models of network topology. The environmental and genetic co-regulation networks are substantially the same among five populations. However, genetic and environmental perturbations reveal qualitative differences in metabolic regulation, suggesting that environmental shifts, such as diet modifications, produce different systemic effects than genetic changes, even if the primary targets are the same.
Collapse
|
38
|
Yang B, Navarro N, Noguera J, Muñoz M, Guo T, Yang K, Ma J, Folch J, Huang L, Pérez-Enciso M. Building phenotype networks to improve QTL detection: a comparative analysis of fatty acid and fat traits in pigs. J Anim Breed Genet 2011; 128:329-43. [DOI: 10.1111/j.1439-0388.2011.00928.x] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|