1
|
Fei Y, Yu H, Wu Y, Gong S. The causal relationship between immune cells and ankylosing spondylitis: a bidirectional Mendelian randomization study. Arthritis Res Ther 2024; 26:24. [PMID: 38229175 PMCID: PMC10790477 DOI: 10.1186/s13075-024-03266-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2023] [Accepted: 01/09/2024] [Indexed: 01/18/2024] Open
Abstract
BACKGROUND Ankylosing spondylitis (AS) is one of several disorders known as seronegative spinal arthritis (SpA), the origin of which is unknown. Existing epidemiological data show that inflammatory and immunological factors are important in the development of AS. Previous research on the connection between immunological inflammation and AS, however, has shown inconclusive results. METHODS To evaluate the causal association between immunological characteristics and AS, a bidirectional, two-sample Mendelian randomization (MR) approach was performed in this study. We investigated the causal connection between 731 immunological feature characteristic cells and AS risk using large, publically available genome-wide association studies. RESULTS After FDR correction, two immunophenotypes were found to be significantly associated with AS risk: CD14 - CD16 + monocyte (OR, 0.669; 95% CI, 0.544 ~ 0.823; P = 1.46 × 10-4; PFDR = 0.043), CD33dim HLA DR + CD11b + (OR, 0.589; 95% CI = 0.446 ~ 0.780; P = 2.12 × 10-4; PFDR = 0.043). AS had statistically significant effects on six immune traits: CD8 on HLA DR + CD8 + T cell (OR, 1.029; 95% CI, 1.015 ~ 1.043; P = 4.46 × 10-5; PFDR = 0.014), IgD on IgD + CD24 + B cell (OR, 0.973; 95% CI, 0.960 ~ 0.987; P = 1.2 × 10-4; PFDR = 0.021), IgD on IgD + CD38 - unswitched memory B cell (OR, 0.962; 95% CI, 0.945 ~ 0.980; P = 3.02 × 10-5; PFDR = 0.014), CD8 + natural killer T %lymphocyte (OR, 0.973; 95% CI, 0.959 ~ 0.987; P = 1.92 × 10-4; PFDR = 0.021), CD8 + natural killer T %T cell (OR, 0.973; 95% CI, 0.959 ~ 0.987; P = 1.65 × 10-4; PFDR = 0.021). CONCLUSION Our findings extend genetic research into the intimate link between immune cells and AS, which can help guide future clinical and basic research.
Collapse
Affiliation(s)
- Yuchang Fei
- Department of Integrated Chinese and Western Medicine, The First People's Hospital of Jiashan, Jiashan Hospital Affiliated of Jiaxing University, Jiaxing, Zhejiang, China.
| | - Huan Yu
- The Department of Traditional Chinese Medicine, The First Affiliated Hospital of Ningbo University, Ningbo, Zhejiang, China
| | - Yulun Wu
- Center for Rehabilitation Medicine, Rehabilitation & Sports Medicine Research Institute of Zhejiang Province, Department of Rehabilitation Medicine, Zhejiang Provincial People's Hospital, Affiliated People's Hospital, Hangzhou Medical College, Hangzhou, Zhejiang, China
| | - Shanshan Gong
- Department of Gastroenterology, The Third Affiliated Hospital of Zhejiang Chinese Medical University, Hangzhou, Zhejiang, China
| |
Collapse
|
2
|
Lovis C, Zhang K, Li C, Jiang X, Kim Y. Scalable Causal Structure Learning: Scoping Review of Traditional and Deep Learning Algorithms and New Opportunities in Biomedicine. JMIR Med Inform 2023; 11:e38266. [PMID: 36649070 PMCID: PMC9890349 DOI: 10.2196/38266] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2022] [Revised: 08/30/2022] [Accepted: 09/18/2022] [Indexed: 02/04/2023] Open
Abstract
BACKGROUND Causal structure learning refers to a process of identifying causal structures from observational data, and it can have multiple applications in biomedicine and health care. OBJECTIVE This paper provides a practical review and tutorial on scalable causal structure learning models with examples of real-world data to help health care audiences understand and apply them. METHODS We reviewed traditional (combinatorial and score-based) methods for causal structure discovery and machine learning-based schemes. Various traditional approaches have been studied to tackle this problem, the most important among these being the Peter Spirtes and Clark Glymour algorithms. This was followed by analyzing the literature on score-based methods, which are computationally faster. Owing to the continuous constraint on acyclicity, there are new deep learning approaches to the problem in addition to traditional and score-based methods. Such methods can also offer scalability, particularly when there is a large amount of data involving multiple variables. Using our own evaluation metrics and experiments on linear, nonlinear, and benchmark Sachs data, we aimed to highlight the various advantages and disadvantages associated with these methods for the health care community. We also highlighted recent developments in biomedicine where causal structure learning can be applied to discover structures such as gene networks, brain connectivity networks, and those in cancer epidemiology. RESULTS We also compared the performance of traditional and machine learning-based algorithms for causal discovery over some benchmark data sets. Directed Acyclic Graph-Graph Neural Network has the lowest structural hamming distance (19) and false positive rate (0.13) based on the Sachs data set, whereas Greedy Equivalence Search and Max-Min Hill Climbing have the best false discovery rate (0.68) and true positive rate (0.56), respectively. CONCLUSIONS Machine learning-based approaches, including deep learning, have many advantages over traditional approaches, such as scalability, including a greater number of variables, and potentially being applied in a wide range of biomedical applications, such as genetics, if sufficient data are available. Furthermore, these models are more flexible than traditional models and are poised to positively affect many applications in the future.
Collapse
Affiliation(s)
| | - Kai Zhang
- School of Biomedical Informatics, University of Texas Health Science Center at Houston, HOUSTON, TX, United States
| | - Can Li
- School of Biomedical Informatics, University of Texas Health Science Center at Houston, HOUSTON, TX, United States
| | - Xiaoqian Jiang
- School of Biomedical Informatics, University of Texas Health Science Center at Houston, HOUSTON, TX, United States
| | - Yejin Kim
- School of Biomedical Informatics, University of Texas Health Science Center at Houston, HOUSTON, TX, United States
| |
Collapse
|
3
|
Fan Z, Kernan KF, Sriram A, Benos PV, Canna SW, Carcillo JA, Kim S, Park HJ. Deep neural networks with knockoff features identify nonlinear causal relations and estimate effect sizes in complex biological systems. Gigascience 2022; 12:giad044. [PMID: 37395630 PMCID: PMC10316696 DOI: 10.1093/gigascience/giad044] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2022] [Revised: 01/31/2023] [Accepted: 05/29/2023] [Indexed: 07/04/2023] Open
Abstract
BACKGROUND Learning the causal structure helps identify risk factors, disease mechanisms, and candidate therapeutics for complex diseases. However, although complex biological systems are characterized by nonlinear associations, existing bioinformatic methods of causal inference cannot identify the nonlinear relationships and estimate their effect size. RESULTS To overcome these limitations, we developed the first computational method that explicitly learns nonlinear causal relations and estimates the effect size using a deep neural network approach coupled with the knockoff framework, named causal directed acyclic graphs using deep learning variable selection (DAG-deepVASE). Using simulation data of diverse scenarios and identifying known and novel causal relations in molecular and clinical data of various diseases, we demonstrated that DAG-deepVASE consistently outperforms existing methods in identifying true and known causal relations. In the analyses, we also illustrate how identifying nonlinear causal relations and estimating their effect size help understand the complex disease pathobiology, which is not possible using other methods. CONCLUSIONS With these advantages, the application of DAG-deepVASE can help identify driver genes and therapeutic agents in biomedical studies and clinical trials.
Collapse
Affiliation(s)
- Zhenjiang Fan
- Department of Computer Science, University of Pittsburgh, Pittsburgh, PA 15213, USA
| | - Kate F Kernan
- Division of Pediatric Critical Care Medicine, Department of Critical Care Medicine, Children's Hospital of Pittsburgh, Center for Critical Care Nephrology and Clinical Research Investigation and Systems Modeling of Acute Illness Center, University of Pittsburgh, Pittsburgh, PA 15260,USA
| | - Aditya Sriram
- Department of Human Genetics, University of Pittsburgh, Pittsburgh, PA 15213, USA
| | - Panayiotis V Benos
- Department of Epidemiology, University of Florida, Gainesville, FL 32610, USA
| | - Scott W Canna
- Pediatric Rheumatology, The Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Joseph A Carcillo
- Division of Pediatric Critical Care Medicine, Department of Critical Care Medicine, Children's Hospital of Pittsburgh, Center for Critical Care Nephrology and Clinical Research Investigation and Systems Modeling of Acute Illness Center, University of Pittsburgh, Pittsburgh, PA 15260,USA
| | - Soyeon Kim
- Division of Pediatric Pulmonary Medicine, Children's Hospital of Pittsburgh, Pittsburgh, PA 15224, USA
- Department of Pediatrics, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15224, USA
| | - Hyun Jung Park
- Department of Human Genetics, University of Pittsburgh, Pittsburgh, PA 15213, USA
| |
Collapse
|
4
|
Bankier S, Michoel T. eQTLs as causal instruments for the reconstruction of hormone linked gene networks. Front Endocrinol (Lausanne) 2022; 13:949061. [PMID: 36060942 PMCID: PMC9428692 DOI: 10.3389/fendo.2022.949061] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/20/2022] [Accepted: 07/25/2022] [Indexed: 11/17/2022] Open
Abstract
Hormones act within in highly dynamic systems and much of the phenotypic response to variation in hormone levels is mediated by changes in gene expression. The increase in the number and power of large genetic association studies has led to the identification of hormone linked genetic variants. However, the biological mechanisms underpinning the majority of these loci are poorly understood. The advent of affordable, high throughput next generation sequencing and readily available transcriptomic databases has shown that many of these genetic variants also associate with variation in gene expression levels as expression Quantitative Trait Loci (eQTLs). In addition to further dissecting complex genetic variation, eQTLs have been applied as tools for causal inference. Many hormone networks are driven by transcription factors, and many of these genes can be linked to eQTLs. In this mini-review, we demonstrate how causal inference and gene networks can be used to describe the impact of hormone linked genetic variation upon the transcriptome within an endocrinology context.
Collapse
|
5
|
Li L, Huang L, Yang A, Feng X, Mo Z, Zhang H, Yang X. Causal Relationship Between Complement C3, C4, and Nonalcoholic Fatty Liver Disease: Bidirectional Mendelian Randomization Analysis. PHENOMICS (CHAM, SWITZERLAND) 2021; 1:211-221. [PMID: 36939807 PMCID: PMC9590569 DOI: 10.1007/s43657-021-00023-0] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/25/2020] [Revised: 08/07/2021] [Accepted: 08/18/2021] [Indexed: 02/07/2023]
Abstract
The complement system is activated during the development of nonalcoholic fatty liver disease (NAFLD). We aimed to evaluate the causal relationship between serum C3 and C4 levels and NAFLD. After exclusion criteria, a total of 1600 Chinese Han men from the Fangchenggang Area Male Health and Examination Survey cohort were enrolled in cross-sectional analysis, while 572 participants were included in the longitudinal analysis (average follow-up of 4 years). We performed a bidirectional Mendelian randomization (MR) analysis using two C3-related, eight C4-related and three NAFLD-related gene loci as instrumental variables to evaluate the causal associations between C3, C4, and NAFLD risk in cross-sectional analysis. Per SD increase in C3 levels was significantly associated with higher risk of NAFLD (OR = 1.65, 95% CI 1.40, 1.94) in cross-sectional analysis while C4 was not (OR = 1.04, 95% CI 0.89, 1.21). Longitudinal analysis produced similar results (HRC3 = 1.20, 95% CI 1.02, 1.42; HRC4 = 1.10, 95% CI 0.94, 1.28). In MR analysis, there were no causal relationships for genetically determined C3 levels and NAFLD risk using unweighted or weighted GRS_C3 (βE_unweighted = -0.019, 95% CI -0.019, -0.019, p = 0.202; βE_weighted = -0.019, 95% CI -0.019, -0.019, p = 0.322). Conversely, serum C3 levels were significantly effected by the genetically determined NAFLD (βE_unweighted = 0.020, 95% CI 0.020, 0.020, p = 0.004; βE_weighted = 0.021, 95% CI 0.020, 0.021, p = 0.004). Neither the direction from C4 to NAFLD nor the one from NAFLD to C4 showed significant association. Our results support that the change in serum C3 levels but not C4 levels might be caused by NAFLD in Chinese Han men. Supplementary Information The online version contains supplementary material available at 10.1007/s43657-021-00023-0.
Collapse
Affiliation(s)
- Longman Li
- grid.256607.00000 0004 1798 2653Center for Genomic and Personalized Medicine, Guangxi Key Laboratory for Genomic and Personalized Medicine, Guangxi Collaborative Innovation Center for Genomic and Personalized Medicine, Guangxi Medical University, Nanning, 530021 Guangxi China
- Nanhu Zhuxi Community Healthcare Center, Qingxiu District, Nanning, 530021 Guangxi China
- grid.412594.fDepartment of Urology, Institute of Urology and Nephrology, The First Affiliated Hospital of Guangxi Medical University, Nanning, 530021 Guangxi China
| | - Lulu Huang
- grid.256607.00000 0004 1798 2653Center for Genomic and Personalized Medicine, Guangxi Key Laboratory for Genomic and Personalized Medicine, Guangxi Collaborative Innovation Center for Genomic and Personalized Medicine, Guangxi Medical University, Nanning, 530021 Guangxi China
| | - Aimin Yang
- grid.194645.b0000000121742757School of Public Health, The University of Hong Kong, Hong Kong SAR, 999077 China
| | - Xiuming Feng
- grid.256607.00000 0004 1798 2653Center for Genomic and Personalized Medicine, Guangxi Key Laboratory for Genomic and Personalized Medicine, Guangxi Collaborative Innovation Center for Genomic and Personalized Medicine, Guangxi Medical University, Nanning, 530021 Guangxi China
- grid.256607.00000 0004 1798 2653Department of Occupational Health and Environmental Health, School of Public Health, Guangxi Medical University, Nanning, 530021 Guangxi China
| | - Zengnan Mo
- grid.256607.00000 0004 1798 2653Center for Genomic and Personalized Medicine, Guangxi Key Laboratory for Genomic and Personalized Medicine, Guangxi Collaborative Innovation Center for Genomic and Personalized Medicine, Guangxi Medical University, Nanning, 530021 Guangxi China
- grid.412594.fDepartment of Urology, Institute of Urology and Nephrology, The First Affiliated Hospital of Guangxi Medical University, Nanning, 530021 Guangxi China
| | - Haiying Zhang
- grid.256607.00000 0004 1798 2653Center for Genomic and Personalized Medicine, Guangxi Key Laboratory for Genomic and Personalized Medicine, Guangxi Collaborative Innovation Center for Genomic and Personalized Medicine, Guangxi Medical University, Nanning, 530021 Guangxi China
- grid.256607.00000 0004 1798 2653Department of Occupational Health and Environmental Health, School of Public Health, Guangxi Medical University, Nanning, 530021 Guangxi China
| | - Xiaobo Yang
- grid.256607.00000 0004 1798 2653Center for Genomic and Personalized Medicine, Guangxi Key Laboratory for Genomic and Personalized Medicine, Guangxi Collaborative Innovation Center for Genomic and Personalized Medicine, Guangxi Medical University, Nanning, 530021 Guangxi China
- grid.256607.00000 0004 1798 2653Department of Occupational Health and Environmental Health, School of Public Health, Guangxi Medical University, Nanning, 530021 Guangxi China
- grid.440719.f0000 0004 1800 187XDepartment of Public Health, School of Medicine, Guangxi University of Science and Technology, Liuzhou, 545006 Guangxi China
| |
Collapse
|
6
|
Ha MJ, Sun W. Estimation of high-dimensional directed acyclic graphs with surrogate intervention. Biostatistics 2020; 21:659-675. [PMID: 30596892 DOI: 10.1093/biostatistics/kxy080] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2017] [Revised: 11/18/2018] [Accepted: 11/25/2018] [Indexed: 11/15/2022] Open
Abstract
Directed acyclic graphs (DAGs) have been used to describe causal relationships between variables. The standard method for determining such relations uses interventional data. For complex systems with high-dimensional data, however, such interventional data are often not available. Therefore, it is desirable to estimate causal structure from observational data without subjecting variables to interventions. Observational data can be used to estimate the skeleton of a DAG and the directions of a limited number of edges. We develop a Bayesian framework to estimate a DAG using surrogate interventional data, where the interventions are applied to a set of external variables, and thus such interventions are considered to be surrogate interventions on the variables of interest. Our work is motivated by expression quantitative trait locus (eQTL) studies, where the variables of interest are the expression of genes, the external variables are DNA variations, and interventions are applied to DNA variants during the process of a randomly selected DNA allele being passed to a child from either parent. Our method, surrogate intervention recovery of a DAG ($\texttt{sirDAG}$), first constructs a DAG skeleton using penalized regressions and the subsequent partial correlation tests, and then estimates the posterior probabilities of all the edge directions after incorporating DNA variant data. We demonstrate the utilities of $\texttt{sirDAG}$ by simulation and an application to an eQTL study for 550 breast cancer patients.
Collapse
Affiliation(s)
- Min Jin Ha
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, 1515 Holcombe Boulevard, Houston, TX, USA
| | - Wei Sun
- Program in Biostatistics and Bioinformatics, Fred Hutchinson Cancer Research Center, 1100 Fairview Ave N, Seattle, WA USA
| |
Collapse
|
7
|
Li L, Huang L, Huang S, Luo X, Zhang H, Mo Z, Wu T, Yang X. Non-linear association of serum molybdenum and linear association of serum zinc with nonalcoholic fatty liver disease: Multiple-exposure and Mendelian randomization approach. THE SCIENCE OF THE TOTAL ENVIRONMENT 2020; 720:137655. [PMID: 32146412 DOI: 10.1016/j.scitotenv.2020.137655] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/06/2019] [Revised: 02/27/2020] [Accepted: 02/29/2020] [Indexed: 06/10/2023]
Abstract
The homeostasis imbalance of metals is closely associated with nonalcoholic fatty liver disease (NAFLD). A total of 1594 and 566 Chinese Han men were enrolled in cross-sectional and longitudinal analyses, respectively. We measured the serum concentrations of 22 metals by ICP-MS. The traditional and the LASSO regression methods were used to construct multiple-metals models, respectively. We performed Mendelian randomization (MR) analysis to confirm the causal relationship between NAFLD and metals using three NAFLD-related SNPs as instrumental variable. After adjustment in the six-metal model, only depressed molybdenum and elevated zinc were associated with a higher NAFLD risk, in both cross-sectional and longitudinal analyses. In the twelve-metal model, similar results were still observed. Moreover, dose-response relationships were non-linear for molybdenum and positively linear for zinc with NAFLD risk. In MR analysis, no causal associations were found from NAFLD to molybdenum and zinc. Our results support that serum molybdenum levels were non-linearly associated with NAFLD risk in Chinese men, whereas serum zinc levels showed a positively linear association. Moreover, MR analysis indicated the changes in serum molybdenum and zinc levels might be not caused by NAFLD, further confirmed our findings in cross-sectional and longitudinal analyses.
Collapse
Affiliation(s)
- Longman Li
- Department of Occupational Health and Environmental Health, School of Public Health, Guangxi Medical University, Nanning, Guangxi, China; Center for Genomic and Personalized Medicine, Guangxi Medical University, Nanning, Guangxi, China
| | - Lulu Huang
- Department of Occupational Health and Environmental Health, School of Public Health, Guangxi Medical University, Nanning, Guangxi, China; Center for Genomic and Personalized Medicine, Guangxi Medical University, Nanning, Guangxi, China
| | - Sifang Huang
- Department of Occupational Health and Environmental Health, School of Public Health, Guangxi Medical University, Nanning, Guangxi, China; Center for Genomic and Personalized Medicine, Guangxi Medical University, Nanning, Guangxi, China
| | - Xiaoyu Luo
- Department of Occupational Health and Environmental Health, School of Public Health, Guangxi Medical University, Nanning, Guangxi, China; Center for Genomic and Personalized Medicine, Guangxi Medical University, Nanning, Guangxi, China
| | - Haiying Zhang
- Department of Occupational Health and Environmental Health, School of Public Health, Guangxi Medical University, Nanning, Guangxi, China; Center for Genomic and Personalized Medicine, Guangxi Medical University, Nanning, Guangxi, China
| | - Zengnan Mo
- Center for Genomic and Personalized Medicine, Guangxi Medical University, Nanning, Guangxi, China; Institute of Urology and Nephrology, First Affiliated Hospital of Guangxi Medical University, Nanning, Guangxi, China
| | - Tangchun Wu
- Department of Occupational Health and Environmental Health, School of Public Health, Guangxi Medical University, Nanning, Guangxi, China; Department of Occupational and Environmental Health, Key Laboratory of Environment and Health, Ministry of Education and State Key Laboratory of Environmental Health (Incubating), School of Public Health, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, Hubei, China.
| | - Xiaobo Yang
- Department of Occupational Health and Environmental Health, School of Public Health, Guangxi Medical University, Nanning, Guangxi, China; Center for Genomic and Personalized Medicine, Guangxi Medical University, Nanning, Guangxi, China.
| |
Collapse
|
8
|
Wang L, Audenaert P, Michoel T. High-Dimensional Bayesian Network Inference From Systems Genetics Data Using Genetic Node Ordering. Front Genet 2019; 10:1196. [PMID: 31921278 PMCID: PMC6933017 DOI: 10.3389/fgene.2019.01196] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2019] [Accepted: 10/29/2019] [Indexed: 11/23/2022] Open
Abstract
Studying the impact of genetic variation on gene regulatory networks is essential to understand the biological mechanisms by which genetic variation causes variation in phenotypes. Bayesian networks provide an elegant statistical approach for multi-trait genetic mapping and modelling causal trait relationships. However, inferring Bayesian gene networks from high-dimensional genetics and genomics data is challenging, because the number of possible networks scales super-exponentially with the number of nodes, and the computational cost of conventional Bayesian network inference methods quickly becomes prohibitive. We propose an alternative method to infer high-quality Bayesian gene networks that easily scales to thousands of genes. Our method first reconstructs a node ordering by conducting pairwise causal inference tests between genes, which then allows to infer a Bayesian network via a series of independent variable selection problems, one for each gene. We demonstrate using simulated and real systems genetics data that this results in a Bayesian network with equal, and sometimes better, likelihood than the conventional methods, while having a significantly higher overlap with groundtruth networks and being orders of magnitude faster. Moreover our method allows for a unified false discovery rate control across genes and individual edges, and thus a rigorous and easily interpretable way for tuning the sparsity level of the inferred network. Bayesian network inference using pairwise node ordering is a highly efficient approach for reconstructing gene regulatory networks when prior information for the inclusion of edges exists or can be inferred from the available data.
Collapse
Affiliation(s)
- Lingfei Wang
- Division of Genetics and Genomics, The Roslin Institute, The University of Edinburgh, Easter Bush Campus, Midlothian, United Kingdom
- Broad Institute of Harvard and MIT, Cambridge, MA, United States
- Department of Molecular Biology, Massachusetts General Hospital, Boston, MA, United States
| | - Pieter Audenaert
- IDLab, Ghent University—imec, Ghent, Belgium
- Bioinformatics Institute Ghent, Ghent University, Ghent, Belgium
| | - Tom Michoel
- Division of Genetics and Genomics, The Roslin Institute, The University of Edinburgh, Easter Bush Campus, Midlothian, United Kingdom
- Computational Biology Unit, Department of Informatics, University of Bergen, Bergen, Norway
| |
Collapse
|
9
|
Bustos-Korts D, Malosetti M, Chenu K, Chapman S, Boer MP, Zheng B, van Eeuwijk FA. From QTLs to Adaptation Landscapes: Using Genotype-To-Phenotype Models to Characterize G×E Over Time. FRONTIERS IN PLANT SCIENCE 2019; 10:1540. [PMID: 31867027 PMCID: PMC6904366 DOI: 10.3389/fpls.2019.01540] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/28/2019] [Accepted: 11/04/2019] [Indexed: 05/18/2023]
Abstract
Genotype by environment interaction (G×E) for the target trait, e.g. yield, is an emerging property of agricultural systems and results from the interplay between a hierarchy of secondary traits involving the capture and allocation of environmental resources during the growing season. This hierarchy of secondary traits ranges from basic traits that correspond to response mechanisms/sensitivities, to intermediate traits that integrate a larger number of processes over time and therefore show a larger amount of G×E. Traits underlying yield differ in their contribution to adaptation across environmental conditions and have different levels of G×E. Here, we provide a framework to study the performance of genotype to phenotype (G2P) modeling approaches. We generate and analyze response surfaces, or adaptation landscapes, for yield and yield related traits, emphasizing the organization of the traits in a hierarchy and their development and interactions over time. We use the crop growth model APSIM-wheat with genotype-dependent parameters as a tool to simulate non-linear trait responses over time with complex trait dependencies and apply it to wheat crops in Australia. For biological realism, APSIM parameters were given a genetic basis of 300 QTLs sampled from a gamma distribution whose shape and rate parameters were estimated from real wheat data. In the simulations, the hierarchical organization of the traits and their interactions over time cause G×E for yield even when underlying traits do not show G×E. Insight into how G×E arises during growth and development helps to improve the accuracy of phenotype predictions within and across environments and to optimize trial networks. We produced a tangible simulated adaptation landscape for yield that we first investigated for its biological credibility by statistical models for G×E that incorporate genotypic and environmental covariables. Subsequently, the simulated trait data were used to evaluate statistical genotype-to-phenotype models for multiple traits and environments and to characterize relationships between traits over time and across environments, as a way to identify traits that could be useful to select for specific adaptation. Designed appropriately, these types of simulated landscapes might also serve as a basis to train other, more deep learning methodologies in order to transfer such network models to real-world situations.
Collapse
Affiliation(s)
| | - Marcos Malosetti
- Biometris, Wageningen University and Research Centre, Wageningen, Netherlands
| | - Karine Chenu
- Queensland Alliance for Agriculture and Food Innovation, The University of Queensland, Toowoomba, QLD, Australia
| | - Scott Chapman
- Agriculture and Food, CSIRO, Queensland Bioscience Precinct, St Lucia, QLD, Australia
- School of Agriculture and Food Sciences, The University of Queensland, Gatton, QLD, Australia
| | - Martin P. Boer
- Biometris, Wageningen University and Research Centre, Wageningen, Netherlands
| | - Bangyou Zheng
- Agriculture and Food, CSIRO, Queensland Bioscience Precinct, St Lucia, QLD, Australia
| | - Fred A. van Eeuwijk
- Biometris, Wageningen University and Research Centre, Wageningen, Netherlands
| |
Collapse
|
10
|
Jiang D, Armour CR, Hu C, Mei M, Tian C, Sharpton TJ, Jiang Y. Microbiome Multi-Omics Network Analysis: Statistical Considerations, Limitations, and Opportunities. Front Genet 2019; 10:995. [PMID: 31781153 PMCID: PMC6857202 DOI: 10.3389/fgene.2019.00995] [Citation(s) in RCA: 83] [Impact Index Per Article: 16.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2019] [Accepted: 09/18/2019] [Indexed: 12/21/2022] Open
Abstract
The advent of large-scale microbiome studies affords newfound analytical opportunities to understand how these communities of microbes operate and relate to their environment. However, the analytical methodology needed to model microbiome data and integrate them with other data constructs remains nascent. This emergent analytical toolset frequently ports over techniques developed in other multi-omics investigations, especially the growing array of statistical and computational techniques for integrating and representing data through networks. While network analysis has emerged as a powerful approach to modeling microbiome data, oftentimes by integrating these data with other types of omics data to discern their functional linkages, it is not always evident if the statistical details of the approach being applied are consistent with the assumptions of microbiome data or how they impact data interpretation. In this review, we overview some of the most important network methods for integrative analysis, with an emphasis on methods that have been applied or have great potential to be applied to the analysis of multi-omics integration of microbiome data. We compare advantages and disadvantages of various statistical tools, assess their applicability to microbiome data, and discuss their biological interpretability. We also highlight on-going statistical challenges and opportunities for integrative network analysis of microbiome data.
Collapse
Affiliation(s)
- Duo Jiang
- Department of Statistics, Oregon State University, Corvallis, OR, United States
| | - Courtney R Armour
- Department of Microbiology, Oregon State University, Corvallis, OR, United States
| | - Chenxiao Hu
- Department of Statistics, Oregon State University, Corvallis, OR, United States
| | - Meng Mei
- Department of Statistics, Oregon State University, Corvallis, OR, United States
| | - Chuan Tian
- Department of Statistics, Oregon State University, Corvallis, OR, United States
| | - Thomas J Sharpton
- Department of Statistics, Oregon State University, Corvallis, OR, United States
- Department of Microbiology, Oregon State University, Corvallis, OR, United States
| | - Yuan Jiang
- Department of Statistics, Oregon State University, Corvallis, OR, United States
| |
Collapse
|
11
|
Rojo C, Zhang Q, Keleş S. iFunMed: Integrative functional mediation analysis of GWAS and eQTL studies. Genet Epidemiol 2019; 43:742-760. [PMID: 31328826 DOI: 10.1002/gepi.22217] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2019] [Revised: 04/17/2019] [Accepted: 05/07/2019] [Indexed: 11/08/2022]
Abstract
Genome-wide association studies (GWAS) have successfully identified thousands of genetic variants contributing to disease and other phenotypes. However, significant obstacles hamper our ability to elucidate causal variants, identify genes affected by causal variants, and characterize the mechanisms by which genotypes influence phenotypes. The increasing availability of genome-wide functional annotation data is providing unique opportunities to incorporate prior information into the analysis of GWAS to better understand the impact of variants on disease etiology. Although there have been many advances in incorporating prior information into prioritization of trait-associated variants in GWAS, functional annotation data have played a secondary role in the joint analysis of GWAS and molecular (i.e., expression) quantitative trait loci (eQTL) data in assessing evidence for association. To address this, we develop a novel mediation framework, iFunMed, to integrate GWAS and eQTL data with the utilization of publicly available functional annotation data. iFunMed extends the scope of standard mediation analysis by incorporating information from multiple genetic variants at a time and leveraging variant-level summary statistics. Data-driven computational experiments convey how informative annotations improve single-nucleotide polymorphism (SNP) selection performance while emphasizing robustness of iFunMed to noninformative annotations. Application to Framingham Heart Study data indicates that iFunMed is able to boost detection of SNPs with mediation effects that can be attributed to regulatory mechanisms.
Collapse
Affiliation(s)
- Constanza Rojo
- Department of Statistics, University of Wisconsin-Madison, Madison, Wisconsin
| | - Qi Zhang
- Department of Statistics, University of Nebraska-Lincoln, Lincoln, Nebraska
| | - Sündüz Keleş
- Department of Statistics, University of Wisconsin-Madison, Madison, Wisconsin.,Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, Wisconsin
| |
Collapse
|
12
|
Yu H, Blair RH. Integration of probabilistic regulatory networks into constraint-based models of metabolism with applications to Alzheimer's disease. BMC Bioinformatics 2019; 20:386. [PMID: 31291905 PMCID: PMC6617954 DOI: 10.1186/s12859-019-2872-8] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2019] [Accepted: 05/02/2019] [Indexed: 01/08/2023] Open
Abstract
Background Mathematical models of biological networks can provide important predictions and insights into complex disease. Constraint-based models of cellular metabolism and probabilistic models of gene regulatory networks are two distinct areas that have progressed rapidly in parallel over the past decade. In principle, gene regulatory networks and metabolic networks underly the same complex phenotypes and diseases. However, systematic integration of these two model systems remains a fundamental challenge. Results In this work, we address this challenge by fusing probabilistic models of gene regulatory networks into constraint-based models of metabolism. The novel approach utilizes probabilistic reasoning in BN models of regulatory networks serves as the “glue” that enables a natural interface between the two systems. Probabilistic reasoning is used to predict and quantify system-wide effects of perturbation to the regulatory network in the form of constraints for flux variability analysis. In this setting, both regulatory and metabolic networks inherently account for uncertainty. Applications leverage constraint-based metabolic models of brain metabolism and gene regulatory networks parameterized by gene expression data from the hippocampus to investigate the role of the HIF-1 pathway in Alzheimer’s disease. Integrated models support HIF-1A as effective target to reduce the effects of hypoxia in Alzheimer’s disease. However, HIF-1A activation is far less effective in shifting metabolism when compared to brain metabolism in healthy controls. Conclusions The direct integration of probabilistic regulatory networks into constraint-based models of metabolism provides novel insights into how perturbations in the regulatory network may influence metabolic states. Predictive modeling of enzymatic activity can be facilitated using probabilistic reasoning, thereby extending the predictive capacity of the network. This framework for model integration is generalizable to other systems. Electronic supplementary material The online version of this article (10.1186/s12859-019-2872-8) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Han Yu
- State University of New York at Buffalo, 3435 Main Street, Buffalo, 14214, US
| | | |
Collapse
|
13
|
Rezaei Tabar V, Zareifard H, Salimi S, Plewczynski D. Learning directed acyclic graphs by determination of candidate causes for discrete variables. J STAT COMPUT SIM 2019. [DOI: 10.1080/00949655.2019.1604709] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
Affiliation(s)
- Vahid Rezaei Tabar
- Department of Statistics, Faculty of Mathematical Sciences and Computer, Allameh Tabataba'i University, Tehran, Iran
- Laboratory of Functional and Structural Genomics, Center of New Technologies, University of Warsaw, Warsaw, Poland
| | | | - Selva Salimi
- Department of Information Technology Management, Faculty of Management, Kharazmi University, Tehran, Iran
| | - Dariusz Plewczynski
- Laboratory of Functional and Structural Genomics, Center of New Technologies, University of Warsaw, Warsaw, Poland
- Faculty of Mathematics and Information Science, Warsaw University of Technology, Warsaw, Poland
| |
Collapse
|
14
|
Causal phenotypic networks for egg traits in an F 2 chicken population. Mol Genet Genomics 2019; 294:1455-1462. [PMID: 31240383 DOI: 10.1007/s00438-019-01588-2] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2019] [Accepted: 06/17/2019] [Indexed: 12/24/2022]
Abstract
Traditional single-trait genetic analyses, such as quantitative trait locus (QTL) and genome-wide association studies (GWAS), have been used to understand genotype-phenotype relationships for egg traits in chickens. Even though these techniques can detect potential genes of major effect, they cannot reveal cryptic causal relationships among QTLs and phenotypes. Thus, to better understand the relationships involving multiple genes and phenotypes of interest, other data analysis techniques must be used. Here, we utilized a QTL-directed dependency graph (QDG) mapping approach for a joint analysis of chicken egg traits, so that functional relationships and potential causal effects between them could be investigated. The QDG mapping identified a total of 17 QTLs affecting 24 egg traits that formed three independent networks of phenotypic trait groups (eggshell color, egg production, and size and weight of egg components), clearly distinguishing direct and indirect effects of QTLs towards correlated traits. For example, the network of size and weight of egg components contained 13 QTLs and 18 traits that are densely connected to each other. This indicates complex relationships between genotype and phenotype involving both direct and indirect effects of QTLs on the studied traits. Most of the QTLs were commonly identified by both the traditional (single-trait) mapping and the QDG approach. The network analysis, however, offers additional insight regarding the source and characterization of pleiotropy affecting egg traits. As such, the QDG analysis provides a substantial step forward, revealing cryptic relationships among QTLs and phenotypes, especially regarding direct and indirect QTL effects as well as potential causal relationships between traits, which can be used, for example, to optimize management practices and breeding strategies for the improvement of the traits.
Collapse
|
15
|
Glymour C, Zhang K, Spirtes P. Review of Causal Discovery Methods Based on Graphical Models. Front Genet 2019; 10:524. [PMID: 31214249 PMCID: PMC6558187 DOI: 10.3389/fgene.2019.00524] [Citation(s) in RCA: 130] [Impact Index Per Article: 26.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2018] [Accepted: 05/13/2019] [Indexed: 12/11/2022] Open
Abstract
A fundamental task in various disciplines of science, including biology, is to find underlying causal relations and make use of them. Causal relations can be seen if interventions are properly applied; however, in many cases they are difficult or even impossible to conduct. It is then necessary to discover causal relations by analyzing statistical properties of purely observational data, which is known as causal discovery or causal structure search. This paper aims to give a introduction to and a brief review of the computational methods for causal discovery that were developed in the past three decades, including constraint-based and score-based methods and those based on functional causal models, supplemented by some illustrations and applications.
Collapse
Affiliation(s)
- Clark Glymour
- Department of Philosophy, Carnegie Mellon University, Pittsburgh, PA, United States
| | - Kun Zhang
- Department of Philosophy, Carnegie Mellon University, Pittsburgh, PA, United States
| | - Peter Spirtes
- Department of Philosophy, Carnegie Mellon University, Pittsburgh, PA, United States
| |
Collapse
|
16
|
Tasaki S, Gaiteri C, Mostafavi S, Yu L, Wang Y, De Jager PL, Bennett DA. Multi-omic Directed Networks Describe Features of Gene Regulation in Aged Brains and Expand the Set of Genes Driving Cognitive Decline. Front Genet 2018; 9:294. [PMID: 30140277 PMCID: PMC6095043 DOI: 10.3389/fgene.2018.00294] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2018] [Accepted: 07/13/2018] [Indexed: 01/10/2023] Open
Abstract
Multiple aspects of molecular regulation, including genetics, epigenetics, and mRNA collectively influence the development of age-related neurologic diseases. Therefore, with the ultimate goal of understanding molecular systems associated with cognitive decline, we infer directed interactions among regulatory elements in the local regulatory vicinity of individual genes based on brain multi-omics data from 413 individuals. These local regulatory networks (LRNs) capture the influences of genetics and epigenetics on gene expression in older adults. LRNs were confirmed through correspondence to known transcription biophysics. To relate LRNs to age-related neurologic diseases, we then incorporate common neuropathologies and measures of cognitive decline into this framework. This step identifies a specific set of largely neuronal genes, such as STAU1 and SEMA3F, predicted to control cognitive decline in older adults. These predictions are validated in separate cohorts by comparison to genetic associations for general cognition. LRNs are shared through www.molecular.network on the Rush Alzheimer’s Disease Center Resource Sharing Hub (www.radc.rush.edu).
Collapse
Affiliation(s)
- Shinya Tasaki
- Rush Alzheimer's Disease Center, Rush University Medical Center, Chicago, IL, United States
| | - Chris Gaiteri
- Rush Alzheimer's Disease Center, Rush University Medical Center, Chicago, IL, United States
| | - Sara Mostafavi
- Department of Statistics, Department of Medical Genetics, University of British Columbia, Vancouver, BC, Canada
| | - Lei Yu
- Rush Alzheimer's Disease Center, Rush University Medical Center, Chicago, IL, United States
| | - Yanling Wang
- Rush Alzheimer's Disease Center, Rush University Medical Center, Chicago, IL, United States
| | - Philip L De Jager
- Center for Translational and Computational Neuroimmunology, Department of Neurology, Columbia University Medical Center, New York, NY, United States.,Cell Circuits Program, Broad Institute, Cambridge, MA, United States
| | - David A Bennett
- Rush Alzheimer's Disease Center, Rush University Medical Center, Chicago, IL, United States
| |
Collapse
|
17
|
Abstract
The majority of gene loci that have been associated with type 2 diabetes play a role in pancreatic islet function. To evaluate the role of islet gene expression in the etiology of diabetes, we sensitized a genetically diverse mouse population with a Western diet high in fat (45% kcal) and sucrose (34%) and carried out genome-wide association mapping of diabetes-related phenotypes. We quantified mRNA abundance in the islets and identified 18,820 expression QTL. We applied mediation analysis to identify candidate causal driver genes at loci that affect the abundance of numerous transcripts. These include two genes previously associated with monogenic diabetes (PDX1 and HNF4A), as well as three genes with nominal association with diabetes-related traits in humans (FAM83E, IL6ST, and SAT2). We grouped transcripts into gene modules and mapped regulatory loci for modules enriched with transcripts specific for α-cells, and another specific for δ-cells. However, no single module enriched for β-cell-specific transcripts, suggesting heterogeneity of gene expression patterns within the β-cell population. A module enriched in transcripts associated with branched-chain amino acid metabolism was the most strongly correlated with physiological traits that reflect insulin resistance. Although the mice in this study were not overtly diabetic, the analysis of pancreatic islet gene expression under dietary-induced stress enabled us to identify correlated variation in groups of genes that are functionally linked to diabetes-associated physiological traits. Our analysis suggests an expected degree of concordance between diabetes-associated loci in the mouse and those found in human populations, and demonstrates how the mouse can provide evidence to support nominal associations found in human genome-wide association mapping.
Collapse
|
18
|
Lepik K, Annilo T, Kukuškina V, Kisand K, Kutalik Z, Peterson P, Peterson H. C-reactive protein upregulates the whole blood expression of CD59 - an integrative analysis. PLoS Comput Biol 2017; 13:e1005766. [PMID: 28922377 PMCID: PMC5609773 DOI: 10.1371/journal.pcbi.1005766] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2017] [Revised: 09/22/2017] [Accepted: 09/01/2017] [Indexed: 12/21/2022] Open
Abstract
Elevated C-reactive protein (CRP) concentrations in the blood are associated with acute and chronic infections and inflammation. Nevertheless, the functional role of increased CRP in multiple bacterial and viral infections as well as in chronic inflammatory diseases remains unclear. Here, we studied the relationship between CRP and gene expression levels in the blood in 491 individuals from the Estonian Biobank cohort, to elucidate the role of CRP in these inflammatory mechanisms. As a result, we identified a set of 1,614 genes associated with changes in CRP levels with a high proportion of interferon-stimulated genes. Further, we performed likelihood-based causality model selection and Mendelian randomization analysis to discover causal links between CRP and the expression of CRP-associated genes. Strikingly, our computational analysis and cell culture stimulation assays revealed increased CRP levels to drive the expression of complement regulatory protein CD59, suggesting CRP to have a critical role in protecting blood cells from the adverse effects of the immune defence system. Our results show the benefit of integrative analysis approaches in hypothesis-free uncovering of causal relationships between traits. Chronic inflammation is associated with chronic diseases, morbidity and mortality while lower base inflammation levels are thought to be predictive of healthy aging. Thus, to pursue a long and healthy lifespan, it is essential to understand the inflammatory regulatory mechanisms. To that end, we studied the functional role of C-reactive protein (CRP)–an inflammatory biomarker that is used to measure cardiovascular risk in clinical practice. There is evidence for a strong genetic component of elevated CRP levels but it is still unclear if it has a direct impact on the processes that lead to inflammatory diseases. In order to elucidate the function of CRP in the blood, we used statistical methods for causal inference to infer causal relationships between changes in CRP and gene expression levels. Our statistical analysis and cell culture experiments suggest that CRP drives the expression of complement regulatory protein CD59. Thus, CRP can have a functional role in protecting human blood cells from the adverse effects of the immune defence system.
Collapse
Affiliation(s)
- Kaido Lepik
- Institute of Computer Science, University of Tartu, Tartu, Estonia
- Institute of Social and Preventive Medicine, Lausanne University Hospital, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
- * E-mail:
| | - Tarmo Annilo
- Estonian Genome Center, University of Tartu, Tartu, Estonia
| | | | | | - Kai Kisand
- Molecular Pathology, Institute of Biomedical and Translational Medicine, University of Tartu, Tartu, Estonia
| | - Zoltán Kutalik
- Institute of Social and Preventive Medicine, Lausanne University Hospital, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Pärt Peterson
- Molecular Pathology, Institute of Biomedical and Translational Medicine, University of Tartu, Tartu, Estonia
| | - Hedi Peterson
- Institute of Computer Science, University of Tartu, Tartu, Estonia
- Quretec Ltd, Tartu, Estonia
| |
Collapse
|
19
|
Bayesian Networks Illustrate Genomic and Residual Trait Connections in Maize ( Zea mays L.). G3-GENES GENOMES GENETICS 2017. [PMID: 28637811 PMCID: PMC5555481 DOI: 10.1534/g3.117.044263] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Relationships among traits were investigated on the genomic and residual levels using novel methodology. This included inference on these relationships via Bayesian networks and an assessment of the networks with structural equation models. The methodology employed three steps. First, a Bayesian multiple-trait Gaussian model was fitted to the data to decompose phenotypic values into their genomic and residual components. Second, genomic and residual network structures among traits were learned from estimates of these two components. Network learning was performed using six different algorithmic settings for comparison, of which two were score-based and four were constraint-based approaches. Third, structural equation model analyses ranked the networks in terms of goodness of fit and predictive ability, and compared them with the standard multiple-trait fully recursive network. The methodology was applied to experimental data representing the European heterotic maize pools Dent and Flint (Zea mays L.). Inferences on genomic and residual trait connections were depicted separately as directed acyclic graphs. These graphs provide information beyond mere pairwise genetic or residual associations between traits, illustrating for example conditional independencies and hinting at potential causal links among traits. Network analysis suggested some genetic correlations as potentially spurious. Genomic and residual networks were compared between Dent and Flint.
Collapse
|
20
|
Abstract
High-throughput technologies have revolutionized medical research. The advent of genotyping arrays enabled large-scale genome-wide association studies and methods for examining global transcript levels, which gave rise to the field of “integrative genetics”. Other omics technologies, such as proteomics and metabolomics, are now often incorporated into the everyday methodology of biological researchers. In this review, we provide an overview of such omics technologies and focus on methods for their integration across multiple omics layers. As compared to studies of a single omics type, multi-omics offers the opportunity to understand the flow of information that underlies disease.
Collapse
Affiliation(s)
- Yehudit Hasin
- Department of Medicine, University of California, 10833 Le Conte Avenue, A2-237 CHS, Los Angeles, CA, 90095, USA.,Department of Human Genetics, University of California, 10833 Le Conte Avenue, A2-237 CHS, Los Angeles, CA, 90095, USA
| | - Marcus Seldin
- Department of Medicine, University of California, 10833 Le Conte Avenue, A2-237 CHS, Los Angeles, CA, 90095, USA
| | - Aldons Lusis
- Department of Medicine, University of California, 10833 Le Conte Avenue, A2-237 CHS, Los Angeles, CA, 90095, USA. .,Department of Microbiology, Immunology and Molecular Genetics, 10833 Le Conte Avenue, A2-237 CHS, Los Angeles, CA, 90095, USA. .,Department of Human Genetics, University of California, 10833 Le Conte Avenue, A2-237 CHS, Los Angeles, CA, 90095, USA.
| |
Collapse
|
21
|
Wang P, Rahman M, Jin L, Xiong M. A new statistical framework for genetic pleiotropic analysis of high dimensional phenotype data. BMC Genomics 2016; 17:881. [PMID: 27821073 PMCID: PMC5100198 DOI: 10.1186/s12864-016-3169-1] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2015] [Accepted: 10/18/2016] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The widely used genetic pleiotropic analyses of multiple phenotypes are often designed for examining the relationship between common variants and a few phenotypes. They are not suited for both high dimensional phenotypes and high dimensional genotype (next-generation sequencing) data. To overcome limitations of the traditional genetic pleiotropic analysis of multiple phenotypes, we develop sparse structural equation models (SEMs) as a general framework for a new paradigm of genetic analysis of multiple phenotypes. To incorporate both common and rare variants into the analysis, we extend the traditional multivariate SEMs to sparse functional SEMs. To deal with high dimensional phenotype and genotype data, we employ functional data analysis and the alternative direction methods of multiplier (ADMM) techniques to reduce data dimension and improve computational efficiency. RESULTS Using large scale simulations we showed that the proposed methods have higher power to detect true causal genetic pleiotropic structure than other existing methods. Simulations also demonstrate that the gene-based pleiotropic analysis has higher power than the single variant-based pleiotropic analysis. The proposed method is applied to exome sequence data from the NHLBI's Exome Sequencing Project (ESP) with 11 phenotypes, which identifies a network with 137 genes connected to 11 phenotypes and 341 edges. Among them, 114 genes showed pleiotropic genetic effects and 45 genes were reported to be associated with phenotypes in the analysis or other cardiovascular disease (CVD) related phenotypes in the literature. CONCLUSIONS Our proposed sparse functional SEMs can incorporate both common and rare variants into the analysis and the ADMM algorithm can efficiently solve the penalized SEMs. Using this model we can jointly infer genetic architecture and casual phenotype network structure, and decompose the genetic effect into direct, indirect and total effect. Using large scale simulations we showed that the proposed methods have higher power to detect true causal genetic pleiotropic structure than other existing methods.
Collapse
Affiliation(s)
- Panpan Wang
- Human Genetics Center, Department of Biostatistics, University of Texas School of Public Health, Houston, TX, 77030, USA.,State Key Laboratory of Genetic Engineering and Ministry of Education Key Laboratory of Contemporary Anthropology, Collaborative Innovation Center for Genetics and Development, School of Life Sciences and Institutes of Biomedical Sciences, Fudan University, Shanghai, 200433, China
| | - Mohammad Rahman
- Human Genetics Center, Department of Biostatistics, University of Texas School of Public Health, Houston, TX, 77030, USA
| | - Li Jin
- State Key Laboratory of Genetic Engineering and Ministry of Education Key Laboratory of Contemporary Anthropology, Collaborative Innovation Center for Genetics and Development, School of Life Sciences and Institutes of Biomedical Sciences, Fudan University, Shanghai, 200433, China.
| | - Momiao Xiong
- Human Genetics Center, Department of Biostatistics, University of Texas School of Public Health, Houston, TX, 77030, USA. .,Human Genetics Center, The University of Texas Health Science Center at Houston, P.O. Box 20186, Houston, TX, 77225, USA.
| |
Collapse
|
22
|
Han SW, Chen G, Cheon MS, Zhong H. Estimation of Directed Acyclic Graphs Through Two-stage Adaptive Lasso for Gene Network Inference. J Am Stat Assoc 2016; 111:1004-1019. [PMID: 28239216 DOI: 10.1080/01621459.2016.1142880] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
Abstract
Graphical models are a popular approach to find dependence and conditional independence relationships between gene expressions. Directed acyclic graphs (DAGs) are a special class of directed graphical models, where all the edges are directed edges and contain no directed cycles. The DAGs are well known models for discovering causal relationships between genes in gene regulatory networks. However, estimating DAGs without assuming known ordering is challenging due to high dimensionality, the acyclic constraints, and the presence of equivalence class from observational data. To overcome these challenges, we propose a two-stage adaptive Lasso approach, called NS-DIST, which performs neighborhood selection (NS) in stage 1, and then estimates DAGs by the Discrete Improving Search with Tabu (DIST) algorithm within the selected neighborhood. Simulation studies are presented to demonstrate the effectiveness of the method and its computational efficiency. Two real data examples are used to demonstrate the practical usage of our method for gene regulatory network inference.
Collapse
Affiliation(s)
- Sung Won Han
- Division of Biostatistics, Departments of Population Health, New York University, New York, NY, USA, 10016
| | - Gong Chen
- Pharmaceutical Sciences, Pharma Early Research and Development, Roche Innovation Center New York, New York, NY, USA
| | - Myun-Seok Cheon
- School of Industrial and System Engineering, Georgia Institute of Technology, Atlanta, GA, USA, 30332
| | - Hua Zhong
- Division of Biostatistics, Departments of Population Health, New York University, New York, NY, USA, 10016
| |
Collapse
|
23
|
Richardson S, Tseng GC, Sun W. Statistical Methods in Integrative Genomics. ANNUAL REVIEW OF STATISTICS AND ITS APPLICATION 2016; 3:181-209. [PMID: 27482531 PMCID: PMC4963036 DOI: 10.1146/annurev-statistics-041715-033506] [Citation(s) in RCA: 55] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/28/2023]
Abstract
Statistical methods in integrative genomics aim to answer important biology questions by jointly analyzing multiple types of genomic data (vertical integration) or aggregating the same type of data across multiple studies (horizontal integration). In this article, we introduce different types of genomic data and data resources, and then review statistical methods of integrative genomics, with emphasis on the motivation and rationale of these methods. We conclude with some summary points and future research directions.
Collapse
Affiliation(s)
- Sylvia Richardson
- MRC Biostatistics Unit, Cambridge Institute of Public Health, University of Cambridge, CB2 0SR, United Kingdom
| | - George C. Tseng
- Department of Biostatistics, University of Pittsburgh, Pittsburgh, PA 15261
| | - Wei Sun
- Department of Biostatistics, Department of Genetics, University of North Carolina, Chapel Hill, NC 27599
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, Washington 27516
| |
Collapse
|
24
|
Haycock PC, Burgess S, Wade KH, Bowden J, Relton C, Davey Smith G. Best (but oft-forgotten) practices: the design, analysis, and interpretation of Mendelian randomization studies. Am J Clin Nutr 2016; 103:965-78. [PMID: 26961927 PMCID: PMC4807699 DOI: 10.3945/ajcn.115.118216] [Citation(s) in RCA: 341] [Impact Index Per Article: 42.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2015] [Accepted: 02/02/2016] [Indexed: 01/14/2023] Open
Abstract
Mendelian randomization (MR) is an increasingly important tool for appraising causality in observational epidemiology. The technique exploits the principle that genotypes are not generally susceptible to reverse causation bias and confounding, reflecting their fixed nature and Mendel’s first and second laws of inheritance. The approach is, however, subject to important limitations and assumptions that, if unaddressed or compounded by poor study design, can lead to erroneous conclusions. Nevertheless, the advent of 2-sample approaches (in which exposure and outcome are measured in separate samples) and the increasing availability of open-access data from large consortia of genome-wide association studies and population biobanks mean that the approach is likely to become routine practice in evidence synthesis and causal inference research. In this article we provide an overview of the design, analysis, and interpretation of MR studies, with a special emphasis on assumptions and limitations. We also consider different analytic strategies for strengthening causal inference. Although impossible to prove causality with any single approach, MR is a highly cost-effective strategy for prioritizing intervention targets for disease prevention and for strengthening the evidence base for public health policy.
Collapse
Affiliation(s)
- Philip C Haycock
- Medical Research Council (MRC) Integrative Epidemiology Unit, University of Bristol, Bristol, United Kingdom; and
| | | | - Kaitlin H Wade
- Medical Research Council (MRC) Integrative Epidemiology Unit, University of Bristol, Bristol, United Kingdom; and
| | - Jack Bowden
- Medical Research Council (MRC) Integrative Epidemiology Unit, University of Bristol, Bristol, United Kingdom; and
- MRC Biostatistics Unit, University of Cambridge, United Kingdom
| | - Caroline Relton
- Medical Research Council (MRC) Integrative Epidemiology Unit, University of Bristol, Bristol, United Kingdom; and
| | - George Davey Smith
- Medical Research Council (MRC) Integrative Epidemiology Unit, University of Bristol, Bristol, United Kingdom; and
| |
Collapse
|
25
|
Moharil J, May P, Gaile DP, Blair RH. Belief propagation in genotype-phenotype networks. Stat Appl Genet Mol Biol 2016; 15:39-53. [PMID: 26910752 DOI: 10.1515/sagmb-2015-0058] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Graphical models have proven to be a valuable tool for connecting genotypes and phenotypes. Structural learning of phenotype-genotype networks has received considerable attention in the post-genome era. In recent years, a dozen different methods have emerged for network inference, which leverage natural variation that arises in certain genetic populations. The structure of the network itself can be used to form hypotheses based on the inferred direct and indirect network relationships, but represents a premature endpoint to the graphical analyses. In this work, we extend this endpoint. We examine the unexplored problem of perturbing a given network structure, and quantifying the system-wide effects on the network in a node-wise manner. The perturbation is achieved through the setting of values of phenotype node(s), which may reflect an inhibition or activation, and propagating this information through the entire network. We leverage belief propagation methods in Conditional Gaussian Bayesian Networks (CG-BNs), in order to absorb and propagate phenotypic evidence through the network. We show that the modeling assumptions adopted for genotype-phenotype networks represent an important sub-class of CG-BNs, which possess properties that ensure exact inference in the propagation scheme. The system-wide effects of the perturbation are quantified in a node-wise manner through the comparison of perturbed and unperturbed marginal distributions using a symmetric Kullback-Leibler divergence. Applications to kidney and skin cancer expression quantitative trait loci (eQTL) data from different mus musculus populations are presented. System-wide effects in the network were predicted and visualized across a spectrum of evidence. Sub-pathways and regions of the network responded in concert, suggesting co-regulation and coordination throughout the network in response to phenotypic changes. We demonstrate how these predicted system-wide effects can be examined in connection with estimated class probabilities for covariates of interest, e.g. cancer status. Despite the uncertainty in the network structure, we demonstrate the system-wide predictions are stable across an ensemble of highly likely networks. A software package, geneNetBP, which implements our approach, was developed in the R programming language.
Collapse
|
26
|
Yazdani A, Yazdani A, Samiei A, Boerwinkle E. Generating a robust statistical causal structure over 13 cardiovascular disease risk factors using genomics data. J Biomed Inform 2016; 60:114-9. [PMID: 26827624 DOI: 10.1016/j.jbi.2016.01.012] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2015] [Revised: 01/19/2016] [Accepted: 01/22/2016] [Indexed: 10/22/2022]
Abstract
Understanding causal relationships among large numbers of variables is a fundamental goal of biomedical sciences and can be facilitated by Directed Acyclic Graphs (DAGs) where directed edges between nodes represent the influence of components of the system on each other. In an observational setting, some of the directions are often unidentifiable because of Markov equivalency. Additional exogenous information, such as expert knowledge or genotype data can help establish directionality among the endogenous variables. In this study, we use the method of principle component analysis to extract information across the genome in order to generate a robust statistical causal network among phenotypes, the variables of primary interest. The method is applied to 590,020 SNP genotypes measured on 1596 individuals to generate the statistical causal network of 13 cardiovascular disease risk factor phenotypes. First, principal component analysis was used to capture information across the genome. The principal components were then used to identify a robust causal network structure, GDAG, among the phenotypes. Analyzing a robust causal network over risk factors reveals the flow of information in direct and alternative paths, as well as determining predictors and good targets for intervention. For example, the analysis identified BMI as influencing multiple other risk factor phenotypes and a good target for intervention to lower disease risk.
Collapse
Affiliation(s)
- Azam Yazdani
- Human Genetics Center, UTHealth School of Public Health, 1200 Pressler Street, Suite E-447, Houston, TX 77030, United States.
| | - Akram Yazdani
- Human Genetics Center, UTHealth School of Public Health, 1200 Pressler Street, Suite E-447, Houston, TX 77030, United States
| | - Ahmad Samiei
- Department of Software Systematic, D-14482 Potsdam, Germany
| | - Eric Boerwinkle
- Human Genetics Center, UTHealth School of Public Health, 1200 Pressler Street, Suite E-447, Houston, TX 77030, United States
| |
Collapse
|
27
|
|
28
|
Peñagaricano F, Valente BD, Steibel JP, Bates RO, Ernst CW, Khatib H, Rosa GJM. Exploring causal networks underlying fat deposition and muscularity in pigs through the integration of phenotypic, genotypic and transcriptomic data. BMC SYSTEMS BIOLOGY 2015; 9:58. [PMID: 26376630 PMCID: PMC4574162 DOI: 10.1186/s12918-015-0207-6] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/04/2015] [Accepted: 09/04/2015] [Indexed: 12/23/2022]
Abstract
BACKGROUND Joint modeling and analysis of phenotypic, genotypic and transcriptomic data have the potential to uncover the genetic control of gene activity and phenotypic variation, as well as shed light on the manner and extent of connectedness among these variables. Current studies mainly report associations, i.e. undirected connections among variables without causal interpretation. Knowledge regarding causal relationships among genes and phenotypes can be used to predict the behavior of complex systems, as well as to optimize management practices and selection strategies. Here, we performed a multistep procedure for inferring causal networks underlying carcass fat deposition and muscularity in pigs using multi-omics data obtained from an F2 Duroc x Pietrain resource pig population. RESULTS We initially explored marginal associations between genotypes and phenotypic and expression traits through whole-genome scans, and then, in genomic regions with multiple significant hits, we assessed gene-phenotype network reconstruction using causal structural learning algorithms. One genomic region on SSC6 showed significant associations with three relevant phenotypes, off-midline10th-rib backfat thickness, loin muscle weight, and average intramuscular fat percentage, and also with the expression of seven genes, including ZNF24, SSX2IP, and AKR7A2. The inferred network indicated that the genotype affects the three phenotypes mainly through the expression of several genes. Among the phenotypes, fat deposition traits negatively affected loin muscle weight. CONCLUSIONS Our findings shed light on the antagonist relationship between carcass fat deposition and lean meat content in pigs. In addition, the procedure described in this study has the potential to unravel gene-phenotype networks underlying complex phenotypes.
Collapse
Affiliation(s)
- Francisco Peñagaricano
- Department of Animal Sciences, University of Wisconsin-Madison, Madison, WI, 53706, USA.
- Present Address: Department of Animal Sciences, and University of Florida Genetics Institute, University of Florida, Gainesville, FL, 326111, USA.
| | - Bruno D Valente
- Department of Animal Sciences, University of Wisconsin-Madison, Madison, WI, 53706, USA.
- Dairy Science, University of Wisconsin-Madison, Madison, WI, 53706, USA.
| | - Juan P Steibel
- Department of Animal Science, Michigan State University, East Lansing, MI, 48824, USA.
| | - Ronald O Bates
- Department of Animal Science, Michigan State University, East Lansing, MI, 48824, USA.
| | - Catherine W Ernst
- Department of Animal Science, Michigan State University, East Lansing, MI, 48824, USA.
| | - Hasan Khatib
- Department of Animal Sciences, University of Wisconsin-Madison, Madison, WI, 53706, USA.
| | - Guilherme J M Rosa
- Department of Animal Sciences, University of Wisconsin-Madison, Madison, WI, 53706, USA.
- Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI, 53706, USA.
| |
Collapse
|
29
|
Fear JM, Arbeitman MN, Salomon MP, Dalton JE, Tower J, Nuzhdin SV, McIntyre LM. The Wright stuff: reimagining path analysis reveals novel components of the sex determination hierarchy in Drosophila melanogaster. BMC SYSTEMS BIOLOGY 2015; 9:53. [PMID: 26335107 PMCID: PMC4558766 DOI: 10.1186/s12918-015-0200-0] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/19/2015] [Accepted: 08/20/2015] [Indexed: 11/10/2022]
Abstract
BACKGROUND The Drosophila sex determination hierarchy is a classic example of a transcriptional regulatory hierarchy, with sex-specific isoforms regulating morphology and behavior. We use a structural equation modeling approach, leveraging natural genetic variation from two studies on Drosophila female head tissues--DSPR collection (596 F1-hybrids from crosses between DSPR sub-populations) and CEGS population (75 F1-hybrids from crosses between DGRP/Winters lines to a reference strain w1118)--to expand understanding of the sex hierarchy gene regulatory network (GRN). This approach is completely generalizable to any natural population, including humans. RESULTS We expanded the sex hierarchy GRN adding novel links among genes, including a link from fruitless (fru) to Sex-lethal (Sxl) identified in both populations. This link is further supported by the presence of fru binding sites in the Sxl locus. 754 candidate genes were added to the pathway, including the splicing factors male-specific lethal 2 and Rm62 as downstream targets of Sxl which are well-supported links in males. Independent studies of doublesex and transformer mutants support many additions, including evidence for a link between the sex hierarchy and metabolism, via Insulin-like receptor. CONCLUSIONS The genes added in the CEGS population were enriched for genes with sex-biased splicing and components of the spliceosome. A common goal of molecular biologists is to expand understanding about regulatory interactions among genes. Using natural alleles we can not only identify novel relationships, but using supervised approaches can order genes into a regulatory hierarchy. Combining these results with independent large effect mutation studies, allows clear candidates for detailed molecular follow-up to emerge.
Collapse
Affiliation(s)
- Justin M Fear
- Department of Molecular Genetics and Microbiology, University of Florida, CGRC Room 116, PO Box 100266, FL 32610-0266, Gainesville, FL, USA.
| | | | - Matthew P Salomon
- Molecular and Computational Biology, University of California, Los Angeles, CA, USA.
| | - Justin E Dalton
- Biomedical Science, Florida State University, Tallahassee, FL, USA.
| | - John Tower
- Molecular and Computational Biology, University of California, Los Angeles, CA, USA.
| | - Sergey V Nuzhdin
- Molecular and Computational Biology, University of California, Los Angeles, CA, USA.
| | - Lauren M McIntyre
- Department of Molecular Genetics and Microbiology, University of Florida, CGRC Room 116, PO Box 100266, FL 32610-0266, Gainesville, FL, USA.
| |
Collapse
|
30
|
Ziegler A, Mwambi H, König IR. Mendelian Randomization versus Path Models: Making Causal Inferences in Genetic Epidemiology. Hum Hered 2015. [PMID: 26201704 DOI: 10.1159/000381338] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
Abstract
OBJECTIVE The term Mendelian randomization is popular in the current literature. The first aim of this work is to describe the idea of Mendelian randomization studies and the assumptions required for drawing valid conclusions. The second aim is to contrast Mendelian randomization and path modeling when different 'omics' levels are considered jointly. METHODS We define Mendelian randomization as introduced by Katan in 1986, and review its crucial assumptions. We introduce path models as the relevant additional component to the current use of Mendelian randomization studies in 'omics'. Real data examples for the association between lipid levels and coronary artery disease illustrate the use of path models. RESULTS Numerous assumptions underlie Mendelian randomization, and they are difficult to be fulfilled in applications. Path models are suitable for investigating causality, and they should not be mixed up with the term Mendelian randomization. In many applications, path modeling would be the appropriate analysis in addition to a simple Mendelian randomization analysis. CONCLUSIONS Mendelian randomization and path models use different concepts for causal inference. Path modeling but not simple Mendelian randomization analysis is well suited to study causality with different levels of 'omics' data.
Collapse
Affiliation(s)
- Andreas Ziegler
- Institut für Medizinische Biometrie und Statistik, Universität zu Lübeck, Universitätsklinikum Schleswig-Holstein, Campus Lübeck, Lübeck, Germany
| | | | | |
Collapse
|
31
|
Using molecular genetic information to infer causality in observational data: Mendelian randomisation. Curr Opin Behav Sci 2015. [DOI: 10.1016/j.cobeha.2014.08.002] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
32
|
Lovell JT, Mullen JL, Lowry DB, Awole K, Richards JH, Sen S, Verslues PE, Juenger TE, McKay JK. Exploiting Differential Gene Expression and Epistasis to Discover Candidate Genes for Drought-Associated QTLs in Arabidopsis thaliana. THE PLANT CELL 2015; 27:969-83. [PMID: 25873386 PMCID: PMC4558705 DOI: 10.1105/tpc.15.00122] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/08/2015] [Revised: 03/13/2015] [Accepted: 04/01/2015] [Indexed: 05/09/2023]
Abstract
Soil water availability represents one of the most important selective agents for plants in nature and the single greatest abiotic determinant of agricultural productivity, yet the genetic bases of drought acclimation responses remain poorly understood. Here, we developed a systems-genetic approach to characterize quantitative trait loci (QTLs), physiological traits and genes that affect responses to soil moisture deficit in the TSUxKAS mapping population of Arabidopsis thaliana. To determine the effects of candidate genes underlying QTLs, we analyzed gene expression as a covariate within the QTL model in an effort to mechanistically link markers, RNA expression, and the phenotype. This strategy produced ranked lists of candidate genes for several drought-associated traits, including water use efficiency, growth, abscisic acid concentration (ABA), and proline concentration. As a proof of concept, we recovered known causal loci for several QTLs. For other traits, including ABA, we identified novel loci not previously associated with drought. Furthermore, we documented natural variation at two key steps in proline metabolism and demonstrated that the mitochondrial genome differentially affects genomic QTLs to influence proline accumulation. These findings demonstrate that linking genome, transcriptome, and phenotype data holds great promise to extend the utility of genetic mapping, even when QTL effects are modest or complex.
Collapse
Affiliation(s)
- John T Lovell
- Department of Integrative Biology, University of Texas, Austin, Texas 78712 Department of BioAgricultural Sciences and Pest Management, Colorado State University, Fort Collins, Colorado 80523
| | - Jack L Mullen
- Department of BioAgricultural Sciences and Pest Management, Colorado State University, Fort Collins, Colorado 80523
| | - David B Lowry
- Department of Plant Biology, Michigan State University, East Lansing, Michigan 48824
| | - Kedija Awole
- Department of BioAgricultural Sciences and Pest Management, Colorado State University, Fort Collins, Colorado 80523
| | - James H Richards
- Department of Land, Air, and Water Resources, University of California, Davis, California 95616
| | - Saunak Sen
- Department of Epidemiology and Biostatistics, University of California, San Francisco, California 94143
| | - Paul E Verslues
- Institute of Plant and Microbial Biology, Academia Sinica, Taipei 115, Taiwan
| | - Thomas E Juenger
- Department of Integrative Biology, University of Texas, Austin, Texas 78712 Institute of Cellular and Molecular Biology, University of Texas, Austin, Texas 78712
| | - John K McKay
- Department of BioAgricultural Sciences and Pest Management, Colorado State University, Fort Collins, Colorado 80523
| |
Collapse
|
33
|
Oren Y, Nachshon A, Frishberg A, Wilentzik R, Gat-Viks I. Linking traits based on their shared molecular mechanisms. eLife 2015; 4. [PMID: 25781485 PMCID: PMC4362207 DOI: 10.7554/elife.04346] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2014] [Accepted: 02/20/2015] [Indexed: 12/29/2022] Open
Abstract
There is growing recognition that co-morbidity and co-occurrence of disease traits are often determined by shared genetic and molecular mechanisms. In most cases, however, the specific mechanisms that lead to such trait-trait relationships are yet unknown. Here we present an analysis of a broad spectrum of behavioral and physiological traits together with gene-expression measurements across genetically diverse mouse strains. We develop an unbiased methodology that constructs potentially overlapping groups of traits and resolves their underlying combination of genetic loci and molecular mechanisms. For example, our method predicts that genetic variation in the Klf7 gene may influence gene transcripts in bone marrow-derived myeloid cells, which in turn affect 17 behavioral traits following morphine injection; this predicted effect of Klf7 is consistent with an in vitro perturbation of Klf7 in bone marrow cells. Our analysis demonstrates the utility of studying hidden causative mechanisms that lead to relationships between complex traits.
Collapse
Affiliation(s)
- Yael Oren
- Department of Cell Research and Immunology, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv, Israel
| | - Aharon Nachshon
- Department of Cell Research and Immunology, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv, Israel
| | - Amit Frishberg
- Department of Cell Research and Immunology, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv, Israel
| | - Roni Wilentzik
- Department of Cell Research and Immunology, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv, Israel
| | - Irit Gat-Viks
- Department of Cell Research and Immunology, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv, Israel
| |
Collapse
|
34
|
Bayesian network reconstruction using systems genetics data: comparison of MCMC methods. Genetics 2015; 199:973-89. [PMID: 25631319 PMCID: PMC4391572 DOI: 10.1534/genetics.114.172619] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2014] [Accepted: 01/26/2015] [Indexed: 12/23/2022] Open
Abstract
Reconstructing biological networks using high-throughput technologies has the potential to produce condition-specific interactomes. But are these reconstructed networks a reliable source of biological interactions? Do some network inference methods offer dramatically improved performance on certain types of networks? To facilitate the use of network inference methods in systems biology, we report a large-scale simulation study comparing the ability of Markov chain Monte Carlo (MCMC) samplers to reverse engineer Bayesian networks. The MCMC samplers we investigated included foundational and state-of-the-art Metropolis-Hastings and Gibbs sampling approaches, as well as novel samplers we have designed. To enable a comprehensive comparison, we simulated gene expression and genetics data from known network structures under a range of biologically plausible scenarios. We examine the overall quality of network inference via different methods, as well as how their performance is affected by network characteristics. Our simulations reveal that network size, edge density, and strength of gene-to-gene signaling are major parameters that differentiate the performance of various samplers. Specifically, more recent samplers including our novel methods outperform traditional samplers for highly interconnected large networks with strong gene-to-gene signaling. Our newly developed samplers show comparable or superior performance to the top existing methods. Moreover, this performance gain is strongest in networks with biologically oriented topology, which indicates that our novel samplers are suitable for inferring biological networks. The performance of MCMC samplers in this simulation framework can guide the choice of methods for network reconstruction using systems genetics data.
Collapse
|
35
|
Fondi M, Liò P. Multi -omics and metabolic modelling pipelines: challenges and tools for systems microbiology. Microbiol Res 2015; 171:52-64. [PMID: 25644953 DOI: 10.1016/j.micres.2015.01.003] [Citation(s) in RCA: 100] [Impact Index Per Article: 11.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2014] [Revised: 01/02/2015] [Accepted: 01/03/2015] [Indexed: 12/27/2022]
Abstract
Integrated -omics approaches are quickly spreading across microbiology research labs, leading to (i) the possibility of detecting previously hidden features of microbial cells like multi-scale spatial organization and (ii) tracing molecular components across multiple cellular functional states. This promises to reduce the knowledge gap between genotype and phenotype and poses new challenges for computational microbiologists. We underline how the capability to unravel the complexity of microbial life will strongly depend on the integration of the huge and diverse amount of information that can be derived today from -omics experiments. In this work, we present opportunities and challenges of multi -omics data integration in current systems biology pipelines. We here discuss which layers of biological information are important for biotechnological and clinical purposes, with a special focus on bacterial metabolism and modelling procedures. A general review of the most recent computational tools for performing large-scale datasets integration is also presented, together with a possible framework to guide the design of systems biology experiments by microbiologists.
Collapse
Affiliation(s)
- Marco Fondi
- Florence Computational Biology Group (ComBo), University of Florence, Via Madonna del Piano 6, Sesto Fiorentino, Florence 50019, Italy; Laboratory of Microbial and Molecular Evolution, Department of Biology, University of Florence, Via Madonna del Piano 6, Sesto Fiorentino, Florence 50019, Italy.
| | - Pietro Liò
- University of Cambridge, Computer Laboratory, 15 JJ Thomson Avenue, CB3 0FD Cambridge, UK
| |
Collapse
|
36
|
Wang H, Paulo J, Kruijer W, Boer M, Jansen H, Tikunov Y, Usadel B, van Heusden S, Bovy A, van Eeuwijk F. Genotype–phenotype modeling considering intermediate level of biological variation: a case study involving sensory traits, metabolites and QTLs in ripe tomatoes. MOLECULAR BIOSYSTEMS 2015; 11:3101-10. [DOI: 10.1039/c5mb00477b] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
We integrate Gaussian graphical modelling and causal inference to infer dependency networks from multilevel phenotypic and omics data.
Collapse
Affiliation(s)
- Huange Wang
- Biometris
- Wageningen University and Research Centre
- 6700AA Wageningen
- The Netherlands
| | - Joao Paulo
- Biometris
- Wageningen University and Research Centre
- 6700AA Wageningen
- The Netherlands
| | - Willem Kruijer
- Biometris
- Wageningen University and Research Centre
- 6700AA Wageningen
- The Netherlands
| | - Martin Boer
- Biometris
- Wageningen University and Research Centre
- 6700AA Wageningen
- The Netherlands
| | - Hans Jansen
- Biometris
- Wageningen University and Research Centre
- 6700AA Wageningen
- The Netherlands
| | - Yury Tikunov
- Plant Research International
- 6700AJ Wageningen
- The Netherlands
| | - Björn Usadel
- Institute for Biology I
- RWTH Aachen University
- 52074 Aachen
- Germany
| | | | - Arnaud Bovy
- Plant Research International
- 6700AJ Wageningen
- The Netherlands
| | - Fred van Eeuwijk
- Biometris
- Wageningen University and Research Centre
- 6700AA Wageningen
- The Netherlands
| |
Collapse
|
37
|
Abstract
Expression quantitative trait loci (eQTL) mapping constitutes a challenging problem due to, among other reasons, the high-dimensional multivariate nature of gene-expression traits. Next to the expression heterogeneity produced by confounding factors and other sources of unwanted variation, indirect effects spread throughout genes as a result of genetic, molecular, and environmental perturbations. From a multivariate perspective one would like to adjust for the effect of all of these factors to end up with a network of direct associations connecting the path from genotype to phenotype. In this article we approach this challenge with mixed graphical Markov models, higher-order conditional independences, and q-order correlation graphs. These models show that additive genetic effects propagate through the network as function of gene-gene correlations. Our estimation of the eQTL network underlying a well-studied yeast data set leads to a sparse structure with more direct genetic and regulatory associations that enable a straightforward comparison of the genetic control of gene expression across chromosomes. Interestingly, it also reveals that eQTLs explain most of the expression variability of network hub genes.
Collapse
|
38
|
Davey Smith G, Hemani G. Mendelian randomization: genetic anchors for causal inference in epidemiological studies. Hum Mol Genet 2014; 23:R89-98. [PMID: 25064373 PMCID: PMC4170722 DOI: 10.1093/hmg/ddu328] [Citation(s) in RCA: 2003] [Impact Index Per Article: 200.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2014] [Revised: 06/19/2014] [Accepted: 06/20/2014] [Indexed: 12/13/2022] Open
Abstract
Observational epidemiological studies are prone to confounding, reverse causation and various biases and have generated findings that have proved to be unreliable indicators of the causal effects of modifiable exposures on disease outcomes. Mendelian randomization (MR) is a method that utilizes genetic variants that are robustly associated with such modifiable exposures to generate more reliable evidence regarding which interventions should produce health benefits. The approach is being widely applied, and various ways to strengthen inference given the known potential limitations of MR are now available. Developments of MR, including two-sample MR, bidirectional MR, network MR, two-step MR, factorial MR and multiphenotype MR, are outlined in this review. The integration of genetic information into population-based epidemiological studies presents translational opportunities, which capitalize on the investment in genomic discovery research.
Collapse
Affiliation(s)
- George Davey Smith
- MRC Integrative Epidemiology Unit (IEU) at the University of Bristol, School of Social and Community Medicine, Bristol, UK
| | - Gibran Hemani
- MRC Integrative Epidemiology Unit (IEU) at the University of Bristol, School of Social and Community Medicine, Bristol, UK
| |
Collapse
|
39
|
Wang H, van Eeuwijk FA. A new method to infer causal phenotype networks using QTL and phenotypic information. PLoS One 2014; 9:e103997. [PMID: 25144184 PMCID: PMC4140682 DOI: 10.1371/journal.pone.0103997] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2014] [Accepted: 07/06/2014] [Indexed: 11/25/2022] Open
Abstract
In the context of genetics and breeding research on multiple phenotypic traits, reconstructing the directional or causal structure between phenotypic traits is a prerequisite for quantifying the effects of genetic interventions on the traits. Current approaches mainly exploit the genetic effects at quantitative trait loci (QTLs) to learn about causal relationships among phenotypic traits. A requirement for using these approaches is that at least one unique QTL has been identified for each trait studied. However, in practice, especially for molecular phenotypes such as metabolites, this prerequisite is often not met due to limited sample sizes, high noise levels and small QTL effects. Here, we present a novel heuristic search algorithm called the QTL+phenotype supervised orientation (QPSO) algorithm to infer causal directions for edges in undirected phenotype networks. The two main advantages of this algorithm are: first, it does not require QTLs for each and every trait; second, it takes into account associated phenotypic interactions in addition to detected QTLs when orienting undirected edges between traits. We evaluate and compare the performance of QPSO with another state-of-the-art approach, the QTL-directed dependency graph (QDG) algorithm. Simulation results show that our method has broader applicability and leads to more accurate overall orientations. We also illustrate our method with a real-life example involving 24 metabolites and a few major QTLs measured on an association panel of 93 tomato cultivars. Matlab source code implementing the proposed algorithm is freely available upon request.
Collapse
Affiliation(s)
- Huange Wang
- Biometris, Department of Plant Sciences, Wageningen University, Wageningen, The Netherlands
| | - Fred A. van Eeuwijk
- Biometris, Department of Plant Sciences, Wageningen University, Wageningen, The Netherlands
- Centre for BioSystems Genomics, Wageningen, The Netherlands
- Netherlands Metabolomics Centre, Leiden, The Netherlands
| |
Collapse
|
40
|
Zhang L, Kim S. Learning gene networks under SNP perturbations using eQTL datasets. PLoS Comput Biol 2014; 10:e1003420. [PMID: 24586125 PMCID: PMC3937098 DOI: 10.1371/journal.pcbi.1003420] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2013] [Accepted: 11/18/2013] [Indexed: 11/23/2022] Open
Abstract
The standard approach for identifying gene networks is based on experimental perturbations of gene regulatory systems such as gene knock-out experiments, followed by a genome-wide profiling of differential gene expressions. However, this approach is significantly limited in that it is not possible to perturb more than one or two genes simultaneously to discover complex gene interactions or to distinguish between direct and indirect downstream regulations of the differentially-expressed genes. As an alternative, genetical genomics study has been proposed to treat naturally-occurring genetic variants as potential perturbants of gene regulatory system and to recover gene networks via analysis of population gene-expression and genotype data. Despite many advantages of genetical genomics data analysis, the computational challenge that the effects of multifactorial genetic perturbations should be decoded simultaneously from data has prevented a widespread application of genetical genomics analysis. In this article, we propose a statistical framework for learning gene networks that overcomes the limitations of experimental perturbation methods and addresses the challenges of genetical genomics analysis. We introduce a new statistical model, called a sparse conditional Gaussian graphical model, and describe an efficient learning algorithm that simultaneously decodes the perturbations of gene regulatory system by a large number of SNPs to identify a gene network along with expression quantitative trait loci (eQTLs) that perturb this network. While our statistical model captures direct genetic perturbations of gene network, by performing inference on the probabilistic graphical model, we obtain detailed characterizations of how the direct SNP perturbation effects propagate through the gene network to perturb other genes indirectly. We demonstrate our statistical method using HapMap-simulated and yeast eQTL datasets. In particular, the yeast gene network identified computationally by our method under SNP perturbations is well supported by the results from experimental perturbation studies related to DNA replication stress response.
Collapse
Affiliation(s)
- Lingxue Zhang
- Lane Center for Computational Biology, School of Computer Science, Carnegie Mellon University, Pittsburgh, Pennsylvania, United States of America
| | - Seyoung Kim
- Lane Center for Computational Biology, School of Computer Science, Carnegie Mellon University, Pittsburgh, Pennsylvania, United States of America
| |
Collapse
|
41
|
Dong Z, Song T, Yuan C. Inference of gene regulatory networks from genetic perturbations with linear regression model. PLoS One 2013; 8:e83263. [PMID: 24376676 PMCID: PMC3871530 DOI: 10.1371/journal.pone.0083263] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2013] [Accepted: 11/01/2013] [Indexed: 11/19/2022] Open
Abstract
It is an effective strategy to use both genetic perturbation data and gene expression data to infer regulatory networks that aims to improve the detection accuracy of the regulatory relationships among genes. Based on both types of data, the genetic regulatory networks can be accurately modeled by Structural Equation Modeling (SEM). In this paper, a linear regression (LR) model is formulated based on the SEM, and a novel iterative scheme using Bayesian inference is proposed to estimate the parameters of the LR model (LRBI). Comparative evaluations of LRBI with other two algorithms, the Adaptive Lasso (AL-Based) and the Sparsity-aware Maximum Likelihood (SML), are also presented. Simulations show that LRBI has significantly better performance than AL-Based, and overperforms SML in terms of power of detection. Applying the LRBI algorithm to experimental data, we inferred the interactions in a network of 35 yeast genes. An open-source program of the LRBI algorithm is freely available upon request.
Collapse
Affiliation(s)
- Zijian Dong
- School of Electronic Engineering, Huaihai Institute of Technology, Lianyungang, Jiangsu, China ; School of Information Science and Engineering, Southeast University, Nanjing, Jiangsu, China
| | - Tiecheng Song
- School of Information Science and Engineering, Southeast University, Nanjing, Jiangsu, China
| | - Chuang Yuan
- Department of Health Technology and Informatics, The Hong Kong Polytechnic University, Hang Kong, China
| |
Collapse
|
42
|
Peng CH, Jiang YZ, Tai AS, Liu CB, Peng SC, Liao CT, Yen TC, Hsieh WP. Causal inference of gene regulation with subnetwork assembly from genetical genomics data. Nucleic Acids Res 2013; 42:2803-19. [PMID: 24322297 PMCID: PMC3950678 DOI: 10.1093/nar/gkt1277] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
Deciphering the causal networks of gene interactions is critical for identifying disease pathways and disease-causing genes. We introduce a method to reconstruct causal networks based on exploring phenotype-specific modules in the human interactome and including the expression quantitative trait loci (eQTLs) that underlie the joint expression variation of each module. Closely associated eQTLs help anchor the orientation of the network. To overcome the inherent computational complexity of causal network reconstruction, we first deduce the local causality of individual subnetworks using the selected eQTLs and module transcripts. These subnetworks are then integrated to infer a global causal network using a random-field ranking method, which was motivated by animal sociology. We demonstrate how effectively the inferred causality restores the regulatory structure of the networks that mediate lymph node metastasis in oral cancer. Network rewiring clearly characterizes the dynamic regulatory systems of distinct disease states. This study is the first to associate an RXRB-causal network with increased risks of nodal metastasis, tumor relapse, distant metastases and poor survival for oral cancer. Thus, identifying crucial upstream drivers of a signal cascade can facilitate the discovery of potential biomarkers and effective therapeutic targets.
Collapse
Affiliation(s)
- Chien-Hua Peng
- Departments of Resource Center for Clinical Research, Chang Gung Memorial Hospital, Taoyuan 33305, Taiwan, Republic of China, Institute of Statistics, National Tsing Hua University, Hsinchu 30013, Taiwan, Republic of China, Nuclear Medicine and Molecular Imaging Center, Chang Gung Memorial Hospital, Taoyuan 33305, Taiwan, Republic of China and Department of Otorhinolaryngology, Head and Neck Surgery, Chang Gung Memorial Hospital, Taoyuan 33305, Taiwan, Republic of China
| | | | | | | | | | | | | | | |
Collapse
|
43
|
Cai X, Bazerque JA, Giannakis GB. Inference of gene regulatory networks with sparse structural equation models exploiting genetic perturbations. PLoS Comput Biol 2013; 9:e1003068. [PMID: 23717196 PMCID: PMC3662697 DOI: 10.1371/journal.pcbi.1003068] [Citation(s) in RCA: 48] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2012] [Accepted: 03/28/2013] [Indexed: 12/22/2022] Open
Abstract
Integrating genetic perturbations with gene expression data not only improves accuracy of regulatory network topology inference, but also enables learning of causal regulatory relations between genes. Although a number of methods have been developed to integrate both types of data, the desiderata of efficient and powerful algorithms still remains. In this paper, sparse structural equation models (SEMs) are employed to integrate both gene expression data and cis-expression quantitative trait loci (cis-eQTL), for modeling gene regulatory networks in accordance with biological evidence about genes regulating or being regulated by a small number of genes. A systematic inference method named sparsity-aware maximum likelihood (SML) is developed for SEM estimation. Using simulated directed acyclic or cyclic networks, the SML performance is compared with that of two state-of-the-art algorithms: the adaptive Lasso (AL) based scheme, and the QTL-directed dependency graph (QDG) method. Computer simulations demonstrate that the novel SML algorithm offers significantly better performance than the AL-based and QDG algorithms across all sample sizes from 100 to 1,000, in terms of detection power and false discovery rate, in all the cases tested that include acyclic or cyclic networks of 10, 30 and 300 genes. The SML method is further applied to infer a network of 39 human genes that are related to the immune function and are chosen to have a reliable eQTL per gene. The resulting network consists of 9 genes and 13 edges. Most of the edges represent interactions reasonably expected from experimental evidence, while the remaining may just indicate the emergence of new interactions. The sparse SEM and efficient SML algorithm provide an effective means of exploiting both gene expression and perturbation data to infer gene regulatory networks. An open-source computer program implementing the SML algorithm is freely available upon request.
Collapse
Affiliation(s)
- Xiaodong Cai
- Department of Electrical and Computer Engineering, University of Miami, Coral Gables, FL, USA.
| | | | | |
Collapse
|
44
|
Abstract
Current efforts in systems genetics have focused on the development of statistical approaches that aim to disentangle causal relationships among molecular phenotypes in segregating populations. Reverse engineering of transcriptional networks plays a key role in the understanding of gene regulation. However, transcriptional regulation is only one possible mechanism, as methylation, phosphorylation, direct protein-protein interaction, transcription factor binding, etc., can also contribute to gene regulation. These additional modes of regulation can be interpreted as unobserved variables in the transcriptional gene network and can potentially affect its reconstruction accuracy. We develop tests of causal direction for a pair of phenotypes that may be embedded in a more complicated but unobserved network by extending Vuong's selection tests for misspecified models. Our tests provide a significance level, which is unavailable for the widely used AIC and BIC criteria. We evaluate the performance of our tests against the AIC, BIC, and a recently published causality inference test in simulation studies. We compare the precision of causal calls using biologically validated causal relationships extracted from a database of 247 knockout experiments in yeast. Our model selection tests are more precise, showing greatly reduced false-positive rates compared to the alternative approaches. In practice, this is a useful feature since follow-up studies tend to be time consuming and expensive and, hence, it is important for the experimentalist to have causal predictions with low false-positive rates.
Collapse
|
45
|
Shah SH, Kraus WE, Newgard CB. Metabolomic profiling for the identification of novel biomarkers and mechanisms related to common cardiovascular diseases: form and function. Circulation 2012; 126:1110-20. [PMID: 22927473 DOI: 10.1161/circulationaha.111.060368] [Citation(s) in RCA: 267] [Impact Index Per Article: 22.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Affiliation(s)
- Svati H Shah
- Sarah W. Stedman Nutrition and Metabolism Center, Duke University Medical Center, Duke Independence Park Facility, 4321 Medical Park Drive, Durham, NC 27704, USA.
| | | | | |
Collapse
|
46
|
Bello N, Stevenson J, Tempelman R. Invited review: Milk production and reproductive performance: Modern interdisciplinary insights into an enduring axiom. J Dairy Sci 2012; 95:5461-75. [DOI: 10.3168/jds.2012-5564] [Citation(s) in RCA: 40] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2012] [Accepted: 06/05/2012] [Indexed: 11/19/2022]
|
47
|
Nuzhdin SV, Friesen ML, McIntyre LM. Genotype-phenotype mapping in a post-GWAS world. Trends Genet 2012; 28:421-6. [PMID: 22818580 DOI: 10.1016/j.tig.2012.06.003] [Citation(s) in RCA: 52] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2012] [Revised: 05/22/2012] [Accepted: 06/18/2012] [Indexed: 01/18/2023]
Abstract
Understanding how metabolic reactions, cell signaling, and developmental pathways translate the genome of an organism into its phenotype is a grand challenge in biology. Genome-wide association studies (GWAS) statistically connect genotypes to phenotypes, without any recourse to known molecular interactions, whereas a molecular biology approach directly ties gene function to phenotype through gene regulatory networks (GRNs). Using natural variation in allele-specific expression, GWAS and GRN approaches can be merged into a single framework via structural equation modeling (SEM). This approach leverages the myriad of polymorphisms in natural populations to elucidate and quantitate the molecular pathways that underlie phenotypic variation. The SEM framework can be used to quantitate a GRN, evaluate its consistency across environments or sexes, identify the differences in GRNs between species, and annotate GRNs de novo in non-model organisms.
Collapse
Affiliation(s)
- Sergey V Nuzhdin
- University of Southern California, Program in Molecular and Computational Biology, Department of Biology, Los Angeles, CA 90089, USA.
| | | | | |
Collapse
|
48
|
Edwards D, Wang L, Sørensen P. Network-enabled gene expression analysis. BMC Bioinformatics 2012; 13:167. [PMID: 22799258 PMCID: PMC3556136 DOI: 10.1186/1471-2105-13-167] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2011] [Accepted: 06/28/2012] [Indexed: 01/17/2023] Open
Abstract
BACKGROUND Although genome-scale expression experiments are performed routinely in biomedical research, methods of analysis remain simplistic and their interpretation challenging. The conventional approach is to compare the expression of each gene, one at a time, between treatment groups. This implicitly treats the gene expression levels as independent, but they are in fact highly interdependent, and exploiting this enables substantial power gains to be realized. RESULTS We assume that information on the dependence structure between the expression levels of a set of genes is available in the form of a Bayesian network (directed acyclic graph), derived from external resources. We show how to analyze gene expression data conditional on this network. Genes whose expression is directly affected by treatment may be identified using tests for the independence of each gene and treatment, conditional on the parents of the gene in the network. We apply this approach to two datasets: one from a hepatotoxicity study in rats using a PPAR pathway, and the other from a study of the effects of smoking on the epithelial transcriptome, using a global transcription factor network. CONCLUSIONS The proposed method is straightforward, simple to implement, gives rise to substantial power gains, and may assist in relating the experimental results to the underlying biology.
Collapse
Affiliation(s)
- David Edwards
- Department of Molecular Biology and Genetics, Aarhus University, Blichers Allé 20, 8830 Tjele, Denmark
| | - Lei Wang
- Department of Molecular Biology and Genetics, Aarhus University, Blichers Allé 20, 8830 Tjele, Denmark
| | - Peter Sørensen
- Department of Molecular Biology and Genetics, Aarhus University, Blichers Allé 20, 8830 Tjele, Denmark
| |
Collapse
|
49
|
Blair RH, Trichler DL, Gaille DP. Mathematical and statistical modeling in cancer systems biology. Front Physiol 2012; 3:227. [PMID: 22754537 PMCID: PMC3385354 DOI: 10.3389/fphys.2012.00227] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2012] [Accepted: 06/05/2012] [Indexed: 11/13/2022] Open
Abstract
Cancer is a major health problem with high mortality rates. In the post-genome era, investigators have access to massive amounts of rapidly accumulating high-throughput data in publicly available databases, some of which are exclusively devoted to housing Cancer data. However, data interpretation efforts have not kept pace with data collection, and gained knowledge is not necessarily translating into better diagnoses and treatments. A fundamental problem is to integrate and interpret data to further our understanding in Cancer Systems Biology. Viewing cancer as a network provides insights into the complex mechanisms underlying the disease. Mathematical and statistical models provide an avenue for cancer network modeling. In this article, we review two widely used modeling paradigms: deterministic metabolic models and statistical graphical models. The strength of these approaches lies in their flexibility and predictive power. Once a model has been validated, it can be used to make predictions and generate hypotheses. We describe a number of diverse applications to Cancer Biology, including, the system-wide effects of drug-treatments, disease prognosis, tumor classification, forecasting treatment outcomes, and survival predictions.
Collapse
Affiliation(s)
- Rachael Hageman Blair
- Department of Biostatistics, State University of New York at BuffaloBuffalo, NY, USA
| | - David L. Trichler
- Department of Biostatistics, State University of New York at BuffaloBuffalo, NY, USA
- Department of Biostatistics, University of TorontoToronto, ON, Canada
| | - Daniel P. Gaille
- Department of Biostatistics, State University of New York at BuffaloBuffalo, NY, USA
| |
Collapse
|
50
|
|