1
|
Wang S, Myers AJ, Irvine EB, Wang C, Maiello P, Rodgers MA, Tomko J, Kracinovsky K, Borish HJ, Chao MC, Mugahid D, Darrah PA, Seder RA, Roederer M, Scanga CA, Lin PL, Alter G, Fortune SM, Flynn JL, Lauffenburger DA. Markov field network model of multi-modal data predicts effects of immune system perturbations on intravenous BCG vaccination in macaques. Cell Syst 2024:S2405-4712(24)00298-9. [PMID: 39504969 DOI: 10.1016/j.cels.2024.10.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2023] [Revised: 07/09/2024] [Accepted: 10/09/2024] [Indexed: 11/08/2024]
Abstract
Analysis of multi-modal datasets can identify multi-scale interactions underlying biological systems but can be beset by spurious connections due to indirect impacts propagating through an unmapped biological network. For example, studies in macaques have shown that Bacillus Calmette-Guerin (BCG) vaccination by an intravenous route protects against tuberculosis, correlating with changes across various immune data modes. To eliminate spurious correlations and identify critical immune interactions in a public multi-modal dataset (systems serology, cytokines, and cytometry) of vaccinated macaques, we applied Markov fields (MFs), a data-driven approach that explains vaccine efficacy and immune correlations via multivariate network paths, without requiring large numbers of samples (i.e., macaques) relative to multivariate features. We find that integrating multiple data modes with MFs helps remove spurious connections. Finally, we used the MF to predict outcomes of perturbations at various immune nodes, including an experimentally validated B cell depletion that induced network-wide shifts without reducing vaccine protection.
Collapse
Affiliation(s)
- Shu Wang
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA 02142, USA
| | - Amy J Myers
- Department of Microbiology and Molecular Genetics, University of Pittsburgh School of Medicine and Center for Vaccine Research, University of Pittsburgh, Pittsburgh, PA 15261, USA
| | - Edward B Irvine
- Ragon Institute of Massachusetts General Hospital, MIT and Harvard, Cambridge, MA 02139, USA; Department of Immunology and Infectious Diseases, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA
| | - Chuangqi Wang
- Department of Immunology and Microbiology, University of Colorado, Anschuntz Medical Campus, Aurora, CO 80045, USA
| | - Pauline Maiello
- Department of Microbiology and Molecular Genetics, University of Pittsburgh School of Medicine and Center for Vaccine Research, University of Pittsburgh, Pittsburgh, PA 15261, USA
| | - Mark A Rodgers
- Department of Microbiology and Molecular Genetics, University of Pittsburgh School of Medicine and Center for Vaccine Research, University of Pittsburgh, Pittsburgh, PA 15261, USA
| | - Jaime Tomko
- Department of Microbiology and Molecular Genetics, University of Pittsburgh School of Medicine and Center for Vaccine Research, University of Pittsburgh, Pittsburgh, PA 15261, USA
| | - Kara Kracinovsky
- Department of Microbiology and Molecular Genetics, University of Pittsburgh School of Medicine and Center for Vaccine Research, University of Pittsburgh, Pittsburgh, PA 15261, USA
| | - H Jacob Borish
- Department of Microbiology and Molecular Genetics, University of Pittsburgh School of Medicine and Center for Vaccine Research, University of Pittsburgh, Pittsburgh, PA 15261, USA
| | - Michael C Chao
- Department of Immunology and Infectious Diseases, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA
| | - Douaa Mugahid
- Department of Immunology and Infectious Diseases, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA
| | - Patricia A Darrah
- Vaccine Research Center, National Institute of Allergy and Infectious Diseases (NIAID), National Institutes of Health (NIH), Bethesda, MD 20814, USA
| | - Robert A Seder
- Vaccine Research Center, National Institute of Allergy and Infectious Diseases (NIAID), National Institutes of Health (NIH), Bethesda, MD 20814, USA
| | - Mario Roederer
- Vaccine Research Center, National Institute of Allergy and Infectious Diseases (NIAID), National Institutes of Health (NIH), Bethesda, MD 20814, USA
| | - Charles A Scanga
- Department of Microbiology and Molecular Genetics, University of Pittsburgh School of Medicine and Center for Vaccine Research, University of Pittsburgh, Pittsburgh, PA 15261, USA
| | - Philana Ling Lin
- Department of Pediatrics, University of Pittsburgh School of Medicine, UPMC Children's Hospital of Pittsburgh, and Center for Vaccine Research, University of Pittsburgh, Pittsburgh, PA 15620, USA
| | - Galit Alter
- Ragon Institute of Massachusetts General Hospital, MIT and Harvard, Cambridge, MA 02139, USA
| | - Sarah M Fortune
- Ragon Institute of Massachusetts General Hospital, MIT and Harvard, Cambridge, MA 02139, USA; Department of Immunology and Infectious Diseases, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA
| | - JoAnne L Flynn
- Department of Microbiology and Molecular Genetics, University of Pittsburgh School of Medicine and Center for Vaccine Research, University of Pittsburgh, Pittsburgh, PA 15261, USA
| | - Douglas A Lauffenburger
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA 02142, USA.
| |
Collapse
|
2
|
Xi X, Ruffieux H. A modeling framework for detecting and leveraging node-level information in Bayesian network inference. Biostatistics 2024:kxae021. [PMID: 38916966 DOI: 10.1093/biostatistics/kxae021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2023] [Revised: 03/11/2024] [Accepted: 06/02/2024] [Indexed: 06/27/2024] Open
Abstract
Bayesian graphical models are powerful tools to infer complex relationships in high dimension, yet are often fraught with computational and statistical challenges. If exploited in a principled way, the increasing information collected alongside the data of primary interest constitutes an opportunity to mitigate these difficulties by guiding the detection of dependence structures. For instance, gene network inference may be informed by the use of publicly available summary statistics on the regulation of genes by genetic variants. Here we present a novel Gaussian graphical modeling framework to identify and leverage information on the centrality of nodes in conditional independence graphs. Specifically, we consider a fully joint hierarchical model to simultaneously infer (i) sparse precision matrices and (ii) the relevance of node-level information for uncovering the sought-after network structure. We encode such information as candidate auxiliary variables using a spike-and-slab submodel on the propensity of nodes to be hubs, which allows hypothesis-free selection and interpretation of a sparse subset of relevant variables. As efficient exploration of large posterior spaces is needed for real-world applications, we develop a variational expectation conditional maximization algorithm that scales inference to hundreds of samples, nodes and auxiliary variables. We illustrate and exploit the advantages of our approach in simulations and in a gene network study which identifies hub genes involved in biological pathways relevant to immune-mediated diseases.
Collapse
Affiliation(s)
- Xiaoyue Xi
- MRC Biostatistics Unit, University of Cambridge, East Forvie Building, Forvie Site, Robinson Way, Cambridge CB2 0SR, United Kingdom
| | - Hélène Ruffieux
- MRC Biostatistics Unit, University of Cambridge, East Forvie Building, Forvie Site, Robinson Way, Cambridge CB2 0SR, United Kingdom
| |
Collapse
|
3
|
Wang H, Qiu Y, Guo H, Yin Y, Liu P. Information-incorporated gene network construction with FDR control. Bioinformatics 2024; 40:btae125. [PMID: 38430463 PMCID: PMC10937901 DOI: 10.1093/bioinformatics/btae125] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2023] [Revised: 02/08/2024] [Accepted: 02/29/2024] [Indexed: 03/03/2024] Open
Abstract
MOTIVATION Large-scale gene expression studies allow gene network construction to uncover associations among genes. To study direct associations among genes, partial correlation-based networks are preferred over marginal correlations. However, FDR control for partial correlation-based network construction is not well-studied. In addition, currently available partial correlation-based methods cannot take existing biological knowledge to help network construction while controlling FDR. RESULTS In this paper, we propose a method called Partial Correlation Graph with Information Incorporation (PCGII). PCGII estimates partial correlations between each pair of genes by regularized node-wise regression that can incorporate prior knowledge while controlling the effects of all other genes. It handles high-dimensional data where the number of genes can be much larger than the sample size and controls FDR at the same time. We compare PCGII with several existing approaches through extensive simulation studies and demonstrate that PCGII has better FDR control and higher power. We apply PCGII to a plant gene expression dataset where it recovers confirmed regulatory relationships and a hub node, as well as several direct associations that shed light on potential functional relationships in the system. We also introduce a method to supplement observed data with a pseudogene to apply PCGII when no prior information is available, which also allows checking FDR control and power for real data analysis. AVAILABILITY AND IMPLEMENTATION R package is freely available for download at https://cran.r-project.org/package=PCGII.
Collapse
Affiliation(s)
- Hao Wang
- Department of Statistics, Iowa State University, Ames, IA 50010, United States
| | - Yumou Qiu
- Department of Statistics, Iowa State University, Ames, IA 50010, United States
| | - Hongqing Guo
- Department of Genetics, Development and Cell Biology, Iowa State University, Ames, IA 50010, United States
| | - Yanhai Yin
- Department of Genetics, Development and Cell Biology, Iowa State University, Ames, IA 50010, United States
| | - Peng Liu
- Department of Statistics, Iowa State University, Ames, IA 50010, United States
| |
Collapse
|
4
|
Buck L, Schmidt T, Feist M, Schwarzfischer P, Kube D, Oefner PJ, Zacharias HU, Altenbuchinger M, Dettmer K, Gronwald W, Spang R. Anomaly detection in mixed high-dimensional molecular data. Bioinformatics 2023; 39:btad501. [PMID: 37584673 PMCID: PMC10457663 DOI: 10.1093/bioinformatics/btad501] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2023] [Revised: 07/21/2023] [Accepted: 08/14/2023] [Indexed: 08/17/2023] Open
Abstract
MOTIVATION Mixed molecular data combines continuous and categorical features of the same samples, such as OMICS profiles with genotypes, diagnoses, or patient sex. Like all high-dimensional molecular data, it is prone to incorrect values that can stem from various sources for example the technical limitations of the measurement devices, errors in the sample preparation, or contamination. Most anomaly detection algorithms identify complete samples as outliers or anomalies. However, in most cases, not all measurements of those samples are erroneous but only a few one-dimensional features within the samples are incorrect. These one-dimensional data errors are continuous measurements that are either located outside or inside the normal ranges of their features but in both cases show atypical values given all other continuous and categorical features in the sample. Additionally, categorical anomalies can occur for example when the genotype or diagnosis was submitted wrongly. RESULTS We introduce ADMIRE (Anomaly Detection using MIxed gRaphical modEls), a novel approach for the detection and correction of anomalies in mixed high-dimensional data. Hereby, we focus on the detection of single (one-dimensional) data errors in the categorical and continuous features of a sample. For that the joint distribution of continuous and categorical features is learned by mixed graphical models, anomalies are detected by the difference between measured and model-based estimations and are corrected using imputation. We evaluated ADMIRE in simulation and by screening for anomalies in one of our own metabolic datasets. In simulation experiments, ADMIRE outperformed the state-of-the-art methods of Local Outlier Factor, stray, and Isolation Forest. AVAILABILITY AND IMPLEMENTATION All data and code is available at https://github.com/spang-lab/adadmire. ADMIRE is implemented in a Python package called adadmire which can be found at https://pypi.org/project/adadmire.
Collapse
Affiliation(s)
- Lena Buck
- Department of Statistical Bioinformatics, University of Regensburg, 93040 Regensburg, Germany
| | - Tobias Schmidt
- Department of Statistical Bioinformatics, University of Regensburg, 93040 Regensburg, Germany
| | - Maren Feist
- Department of Hematology and Medical Oncology, University Medicine Gottingen, 37075 Gottingen, Germany
| | | | - Dieter Kube
- Department of Hematology and Medical Oncology, University Medicine Gottingen, 37075 Gottingen, Germany
| | - Peter J Oefner
- Institute of Functional Genomics, University of Regensburg, 93040 Regensburg, Germany
| | - Helena U Zacharias
- Peter L. Reichertz Institute for Medical Informatics of TU Braunschweig and Hannover Medical School, Hannover Medical School, 30625 Hannover, Germany
| | - Michael Altenbuchinger
- Department of Medical Bioinformatics, University Medical Center Göttingen, 37075 Göttingen, Germany
| | - Katja Dettmer
- Institute of Functional Genomics, University of Regensburg, 93040 Regensburg, Germany
| | - Wolfram Gronwald
- Institute of Functional Genomics, University of Regensburg, 93040 Regensburg, Germany
| | - Rainer Spang
- Department of Statistical Bioinformatics, University of Regensburg, 93040 Regensburg, Germany
| |
Collapse
|
5
|
Seal S, Li Q, Basner EB, Saba LM, Kechris K. RCFGL: Rapid Condition adaptive Fused Graphical Lasso and application to modeling brain region co-expression networks. PLoS Comput Biol 2023; 19:e1010758. [PMID: 36607897 PMCID: PMC9821764 DOI: 10.1371/journal.pcbi.1010758] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2022] [Accepted: 11/24/2022] [Indexed: 01/07/2023] Open
Abstract
Inferring gene co-expression networks is a useful process for understanding gene regulation and pathway activity. The networks are usually undirected graphs where genes are represented as nodes and an edge represents a significant co-expression relationship. When expression data of multiple (p) genes in multiple (K) conditions (e.g., treatments, tissues, strains) are available, joint estimation of networks harnessing shared information across them can significantly increase the power of analysis. In addition, examining condition-specific patterns of co-expression can provide insights into the underlying cellular processes activated in a particular condition. Condition adaptive fused graphical lasso (CFGL) is an existing method that incorporates condition specificity in a fused graphical lasso (FGL) model for estimating multiple co-expression networks. However, with computational complexity of O(p2K log K), the current implementation of CFGL is prohibitively slow even for a moderate number of genes and can only be used for a maximum of three conditions. In this paper, we propose a faster alternative of CFGL named rapid condition adaptive fused graphical lasso (RCFGL). In RCFGL, we incorporate the condition specificity into another popular model for joint network estimation, known as fused multiple graphical lasso (FMGL). We use a more efficient algorithm in the iterative steps compared to CFGL, enabling faster computation with complexity of O(p2K) and making it easily generalizable for more than three conditions. We also present a novel screening rule to determine if the full network estimation problem can be broken down into estimation of smaller disjoint sub-networks, thereby reducing the complexity further. We demonstrate the computational advantage and superior performance of our method compared to two non-condition adaptive methods, FGL and FMGL, and one condition adaptive method, CFGL in both simulation study and real data analysis. We used RCFGL to jointly estimate the gene co-expression networks in different brain regions (conditions) using a cohort of heterogeneous stock rats. We also provide an accommodating C and Python based package that implements RCFGL.
Collapse
Affiliation(s)
- Souvik Seal
- Department of Biostatistics and Informatics, Colorado School of Public Health, University of Colorado Anschutz Medical Campus, Aurora, Colorado, United States of America
| | - Qunhua Li
- Department of Statistics, Pennsylvania State University, University Park, Pennsylvania, United States of America
| | - Elle Butler Basner
- Department of Statistics, Pennsylvania State University, University Park, Pennsylvania, United States of America
| | - Laura M. Saba
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of Colorado Anschutz Medical Campus, Aurora, Colorado, United States of America
| | - Katerina Kechris
- Department of Biostatistics and Informatics, Colorado School of Public Health, University of Colorado Anschutz Medical Campus, Aurora, Colorado, United States of America
| |
Collapse
|
6
|
Galindez G, Sadegh S, Baumbach J, Kacprowski T, List M. Network-based approaches for modeling disease regulation and progression. Comput Struct Biotechnol J 2022; 21:780-795. [PMID: 36698974 PMCID: PMC9841310 DOI: 10.1016/j.csbj.2022.12.022] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2022] [Revised: 12/14/2022] [Accepted: 12/14/2022] [Indexed: 12/23/2022] Open
Abstract
Molecular interaction networks lay the foundation for studying how biological functions are controlled by the complex interplay of genes and proteins. Investigating perturbed processes using biological networks has been instrumental in uncovering mechanisms that underlie complex disease phenotypes. Rapid advances in omics technologies have prompted the generation of high-throughput datasets, enabling large-scale, network-based analyses. Consequently, various modeling techniques, including network enrichment, differential network extraction, and network inference, have proven to be useful for gaining new mechanistic insights. We provide an overview of recent network-based methods and their core ideas to facilitate the discovery of disease modules or candidate mechanisms. Knowledge generated from these computational efforts will benefit biomedical research, especially drug development and precision medicine. We further discuss current challenges and provide perspectives in the field, highlighting the need for more integrative and dynamic network approaches to model disease development and progression.
Collapse
Affiliation(s)
- Gihanna Galindez
- Division Data Science in Biomedicine, Peter L. Reichertz Institute for Medical Informatics of Technische Universität Braunschweig and Hannover Medical School, Braunschweig, Germany
- Braunschweig Integrated Centre of Systems Biology (BRICS), TU Braunschweig, Braunschweig, Germany
| | - Sepideh Sadegh
- Chair of Experimental Bioinformatics, TUM School of Life Sciences Weihenstephan, Technical University of Munich, Freising, Germany
- Institute for Computational Systems Biology, University of Hamburg, Hamburg, Germany
| | - Jan Baumbach
- Institute for Computational Systems Biology, University of Hamburg, Hamburg, Germany
- Department of Mathematics and Computer Science, University of Southern Denmark, Odense, Denmark
| | - Tim Kacprowski
- Division Data Science in Biomedicine, Peter L. Reichertz Institute for Medical Informatics of Technische Universität Braunschweig and Hannover Medical School, Braunschweig, Germany
- Braunschweig Integrated Centre of Systems Biology (BRICS), TU Braunschweig, Braunschweig, Germany
| | - Markus List
- Chair of Experimental Bioinformatics, TUM School of Life Sciences Weihenstephan, Technical University of Munich, Freising, Germany
| |
Collapse
|
7
|
Frank B, Ally M, Brekke B, Zetterberg H, Blennow K, Sugarman MA, Ashton NJ, Karikari TK, Tripodis Y, Martin B, Palmisano JN, Steinberg EG, Simkina I, Turk KW, Budson AE, O’Connor MK, Au R, Goldstein LE, Jun GR, Kowall NW, Stein TD, McKee AC, Killiany R, Qiu WQ, Stern RA, Mez J, Alosco ML. Plasma p-tau 181 shows stronger network association to Alzheimer's disease dementia than neurofilament light and total tau. Alzheimers Dement 2022; 18:1523-1536. [PMID: 34854549 PMCID: PMC9160800 DOI: 10.1002/alz.12508] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2021] [Revised: 07/07/2021] [Accepted: 09/22/2021] [Indexed: 01/29/2023]
Abstract
INTRODUCTION We examined the ability of plasma hyperphosphorylated tau (p-tau)181 to detect cognitive impairment due to Alzheimer's disease (AD) independently and in combination with plasma total tau (t-tau) and neurofilament light (NfL). METHODS Plasma samples were analyzed using the Simoa platform for 235 participants with normal cognition (NC), 181 with mild cognitive impairment due to AD (MCI), and 153 with AD dementia. Statistical approaches included multinomial regression and Gaussian graphical models (GGMs) to assess a network of plasma biomarkers, neuropsychological tests, and demographic variables. RESULTS Plasma p-tau181 discriminated AD dementia from NC, but not MCI, and correlated with dementia severity and worse neuropsychological test performance. Plasma NfL similarly discriminated diagnostic groups. Unlike plasma NfL or t-tau, p-tau181 had a direct association with cognitive diagnosis in a bootstrapped GGM. DISCUSSION These results support plasma p-tau181 for the detection of AD dementia and the use of blood-based biomarkers for optimal disease detection.
Collapse
Affiliation(s)
- Brandon Frank
- Boston University Alzheimer’s Disease Center and CTE
CenterBoston University School of Medicine, Boston, Massachusetts, USA
- U.S. Department of Veteran Affairs, VA Bedford Healthcare
System, Bedford, Massachusetts, USA
| | - Madeline Ally
- Boston University Alzheimer’s Disease Center and CTE
CenterBoston University School of Medicine, Boston, Massachusetts, USA
| | - Bailee Brekke
- Boston University Alzheimer’s Disease Center and CTE
CenterBoston University School of Medicine, Boston, Massachusetts, USA
| | - Henrik Zetterberg
- Department of Neurodegenerative Disease, UCL Institute of
Neurology, London, UK
- UK Dementia Research Institute at UCL, London, UK
- Clinical Neurochemistry Laboratory, Sahlgrenska University
Hospital, Mölndal, Sweden
- Department of Psychiatry and Neurochemistry, Institute of
Neuroscience and Physiology, Sahlgrenska Academy at the University of Gothenburg,
Gothenburg, Sweden
| | - Kaj Blennow
- Clinical Neurochemistry Laboratory, Sahlgrenska University
Hospital, Mölndal, Sweden
- Department of Psychiatry and Neurochemistry, Institute of
Neuroscience and Physiology, Sahlgrenska Academy at the University of Gothenburg,
Gothenburg, Sweden
| | - Michael A. Sugarman
- Boston University Alzheimer’s Disease Center and CTE
CenterBoston University School of Medicine, Boston, Massachusetts, USA
- U.S. Department of Veteran Affairs, VA Bedford Healthcare
System, Bedford, Massachusetts, USA
| | - Nicholas J. Ashton
- Clinical Neurochemistry Laboratory, Sahlgrenska University
Hospital, Mölndal, Sweden
- Department of Psychiatry and Neurochemistry, Institute of
Neuroscience and Physiology, Sahlgrenska Academy at the University of Gothenburg,
Gothenburg, Sweden
| | - Thomas K. Karikari
- Clinical Neurochemistry Laboratory, Sahlgrenska University
Hospital, Mölndal, Sweden
- Department of Psychiatry and Neurochemistry, Institute of
Neuroscience and Physiology, Sahlgrenska Academy at the University of Gothenburg,
Gothenburg, Sweden
| | - Yorghos Tripodis
- Boston University Alzheimer’s Disease Center and CTE
CenterBoston University School of Medicine, Boston, Massachusetts, USA
- Department of Biostatistics, Boston University School of
Public Health, Boston, Massachusetts, USA
| | - Brett Martin
- Boston University Alzheimer’s Disease Center and CTE
CenterBoston University School of Medicine, Boston, Massachusetts, USA
- Biostatistics and Epidemiology Data Analytics Center,
Boston University School of Public Health, Boston, Massachusetts, USA
| | - Joseph N. Palmisano
- Boston University Alzheimer’s Disease Center and CTE
CenterBoston University School of Medicine, Boston, Massachusetts, USA
- Biostatistics and Epidemiology Data Analytics Center,
Boston University School of Public Health, Boston, Massachusetts, USA
| | - Eric G. Steinberg
- Boston University Alzheimer’s Disease Center and CTE
CenterBoston University School of Medicine, Boston, Massachusetts, USA
| | - Irene Simkina
- Department of Medicine, Boston University School of
Medicine, Boston, Massachusetts, USA
| | - Katherine W. Turk
- Boston University Alzheimer’s Disease Center and CTE
CenterBoston University School of Medicine, Boston, Massachusetts, USA
- Department of Neurology, Boston University School of
Medicine, Boston, Massachusetts, USA
- U.S. Department of Veteran Affairs, VA Boston Healthcare
System, Jamaica Plain, Massachusetts, USA
| | - Andrew E. Budson
- Boston University Alzheimer’s Disease Center and CTE
CenterBoston University School of Medicine, Boston, Massachusetts, USA
- Department of Neurology, Boston University School of
Medicine, Boston, Massachusetts, USA
- U.S. Department of Veteran Affairs, VA Boston Healthcare
System, Jamaica Plain, Massachusetts, USA
| | - Maureen K. O’Connor
- Boston University Alzheimer’s Disease Center and CTE
CenterBoston University School of Medicine, Boston, Massachusetts, USA
- U.S. Department of Veteran Affairs, VA Bedford Healthcare
System, Bedford, Massachusetts, USA
| | - Rhoda Au
- Boston University Alzheimer’s Disease Center and CTE
CenterBoston University School of Medicine, Boston, Massachusetts, USA
- Department of Neurology, Boston University School of
Medicine, Boston, Massachusetts, USA
- Department of Anatomy & Neurobiology, Boston
University School of Medicine, Boston, Massachusetts, USA
- Framingham Heart Study, Boston University School of
Medicine, Boston, Massachusetts, USA
- Department of Epidemiology, Boston University School of
Public Health, Boston, Massachusetts, USA
| | - Lee E. Goldstein
- Boston University Alzheimer’s Disease Center and CTE
CenterBoston University School of Medicine, Boston, Massachusetts, USA
- Department of Pathology and Laboratory Medicine, Boston
University School of Medicine, Boston, Massachusetts, USA
- Departments of Psychiatry and Ophthalmology, Boston
University School of Medicine, Boston, Massachusetts, USA
- Departments of Biomedical, Electrical & Computer
Engineering, Boston University College of Engineering, Boston, Massachusetts,
USA
| | - Gyungah R. Jun
- Department of Medicine, Boston University School of
Medicine, Boston, Massachusetts, USA
| | - Neil W. Kowall
- Boston University Alzheimer’s Disease Center and CTE
CenterBoston University School of Medicine, Boston, Massachusetts, USA
- Department of Neurology, Boston University School of
Medicine, Boston, Massachusetts, USA
- Department of Pathology and Laboratory Medicine, Boston
University School of Medicine, Boston, Massachusetts, USA
- U.S. Department of Veteran Affairs, VA Boston Healthcare
System, Jamaica Plain, Massachusetts, USA
| | - Thor D. Stein
- Boston University Alzheimer’s Disease Center and CTE
CenterBoston University School of Medicine, Boston, Massachusetts, USA
- U.S. Department of Veteran Affairs, VA Bedford Healthcare
System, Bedford, Massachusetts, USA
- Department of Pathology and Laboratory Medicine, Boston
University School of Medicine, Boston, Massachusetts, USA
- U.S. Department of Veteran Affairs, VA Boston Healthcare
System, Jamaica Plain, Massachusetts, USA
| | - Ann C. McKee
- Boston University Alzheimer’s Disease Center and CTE
CenterBoston University School of Medicine, Boston, Massachusetts, USA
- U.S. Department of Veteran Affairs, VA Bedford Healthcare
System, Bedford, Massachusetts, USA
- Department of Neurology, Boston University School of
Medicine, Boston, Massachusetts, USA
- Department of Pathology and Laboratory Medicine, Boston
University School of Medicine, Boston, Massachusetts, USA
- U.S. Department of Veteran Affairs, VA Boston Healthcare
System, Jamaica Plain, Massachusetts, USA
| | - Ronald Killiany
- Boston University Alzheimer’s Disease Center and CTE
CenterBoston University School of Medicine, Boston, Massachusetts, USA
- Department of Neurology, Boston University School of
Medicine, Boston, Massachusetts, USA
- Department of Anatomy & Neurobiology, Boston
University School of Medicine, Boston, Massachusetts, USA
- Center for Biomedical Imaging, Boston University School
of Medicine, Boston, Massachusetts, USA
| | - Wei Qiao Qiu
- Boston University Alzheimer’s Disease Center and CTE
CenterBoston University School of Medicine, Boston, Massachusetts, USA
- Department of Psychiatry, Boston University School of
Medicine, Boston, Massachusetts, USA
- Department of Pharmacology & Experimental
Therapeutics, Boston University School of Medicine, Boston, Massachusetts, USA
| | - Robert A. Stern
- Boston University Alzheimer’s Disease Center and CTE
CenterBoston University School of Medicine, Boston, Massachusetts, USA
- Department of Neurology, Boston University School of
Medicine, Boston, Massachusetts, USA
- Department of Anatomy & Neurobiology, Boston
University School of Medicine, Boston, Massachusetts, USA
- Department of Neurosurgery, Boston University School of
Medicine, Boston, Massachusetts, USA
| | - Jesse Mez
- Boston University Alzheimer’s Disease Center and CTE
CenterBoston University School of Medicine, Boston, Massachusetts, USA
- Department of Neurology, Boston University School of
Medicine, Boston, Massachusetts, USA
- Framingham Heart Study, Boston University School of
Medicine, Boston, Massachusetts, USA
| | - Michael L. Alosco
- Boston University Alzheimer’s Disease Center and CTE
CenterBoston University School of Medicine, Boston, Massachusetts, USA
- Department of Neurology, Boston University School of
Medicine, Boston, Massachusetts, USA
| |
Collapse
|
8
|
Sparse precision matrix estimation with missing observations. Comput Stat 2022. [DOI: 10.1007/s00180-022-01265-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|
9
|
Aluru M, Shrivastava H, Chockalingam SP, Shivakumar S, Aluru S. EnGRaiN: a supervised ensemble learning method for recovery of large-scale gene regulatory networks. Bioinformatics 2022; 38:1312-1319. [PMID: 34888624 DOI: 10.1093/bioinformatics/btab829] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2021] [Revised: 10/29/2021] [Accepted: 12/03/2021] [Indexed: 01/05/2023] Open
Abstract
MOTIVATION Reconstruction of genome-scale networks from gene expression data is an actively studied problem. A wide range of methods that differ between the types of interactions they uncover with varying trade-offs between sensitivity and specificity have been proposed. To leverage benefits of multiple such methods, ensemble network methods that combine predictions from resulting networks have been developed, promising results better than or as good as the individual networks. Perhaps owing to the difficulty in obtaining accurate training examples, these ensemble methods hitherto are unsupervised. RESULTS In this article, we introduce EnGRaiN, the first supervised ensemble learning method to construct gene networks. The supervision for training is provided by small training datasets of true edge connections (positives) and edges known to be absent (negatives) among gene pairs. We demonstrate the effectiveness of EnGRaiN using simulated datasets as well as a curated collection of Arabidopsis thaliana datasets we created from microarray datasets available from public repositories. EnGRaiN shows better results not only in terms of receiver operating characteristic and PR characteristics for both real and simulated datasets compared with unsupervised methods for ensemble network construction, but also generates networks that can be mined for elucidating complex biological interactions. AVAILABILITY AND IMPLEMENTATION EnGRaiN software and the datasets used in the study are publicly available at the github repository: https://github.com/AluruLab/EnGRaiN. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Maneesha Aluru
- Department of Biology, Georgia Institute of Technology, Atlanta, GA 30308, USA
| | | | - Sriram P Chockalingam
- Institute for Data Engineering and Science, Georgia Institute of Technology, Atlanta, GA 30308, USA
| | - Shruti Shivakumar
- Department of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, GA 30308, USA
| | - Srinivas Aluru
- Institute for Data Engineering and Science, Georgia Institute of Technology, Atlanta, GA 30308, USA.,Department of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, GA 30308, USA
| |
Collapse
|
10
|
Kuismin M, Dodangeh F, Sillanpää MJ. Gap-com: general model selection criterion for sparse undirected gene networks with nontrivial community structure. G3 (BETHESDA, MD.) 2022; 12:jkab437. [PMID: 35100338 PMCID: PMC9210289 DOI: 10.1093/g3journal/jkab437] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/22/2021] [Accepted: 12/06/2021] [Indexed: 06/14/2023]
Abstract
We introduce a new model selection criterion for sparse complex gene network modeling where gene co-expression relationships are estimated from data. This is a novel formulation of the gap statistic and it can be used for the optimal choice of a regularization parameter in graphical models. Our criterion favors gene network structure which differs from a trivial gene interaction structure obtained totally at random. We call the criterion the gap-com statistic (gap community statistic). The idea of the gap-com statistic is to examine the difference between the observed and the expected counts of communities (clusters) where the expected counts are evaluated using either data permutations or reference graph (the Erdős-Rényi graph) resampling. The latter represents a trivial gene network structure determined by chance. We put emphasis on complex network inference because the structure of gene networks is usually nontrivial. For example, some of the genes can be clustered together or some genes can be hub genes. We evaluate the performance of the gap-com statistic in graphical model selection and compare its performance to some existing methods using simulated and real biological data examples.
Collapse
Affiliation(s)
- Markku Kuismin
- Research Unit of Mathematical Sciences, University of Oulu, Oulu FI-90014, Finland
- Biocenter Oulu, University of Oulu, Oulu FI-90014, Finland
- School of Computing, University of Eastern Finland, Joensuu FI-80101, Finland
| | - Fatemeh Dodangeh
- Research Unit of Mathematical Sciences, University of Oulu, Oulu FI-90014, Finland
| | - Mikko J Sillanpää
- Research Unit of Mathematical Sciences, University of Oulu, Oulu FI-90014, Finland
- Biocenter Oulu, University of Oulu, Oulu FI-90014, Finland
- Infotech Oulu, University of Oulu, Oulu FI-90014, Finland
| |
Collapse
|
11
|
Gill NP, Balasubramanian R, Bain JR, Muehlbauer MJ, Lowe WL, Scholtens DM. Path-level interpretation of Gaussian graphical models using the pair-path subscore. BMC Bioinformatics 2022; 23:12. [PMID: 34986802 PMCID: PMC8729005 DOI: 10.1186/s12859-021-04542-5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2021] [Accepted: 12/10/2021] [Indexed: 12/14/2022] Open
Abstract
BACKGROUND : Construction of networks from cross-sectional biological data is increasingly common. Many recent methods have been based on Gaussian graphical modeling, and prioritize estimation of conditional pairwise dependencies among nodes in the network. However, challenges remain on how specific paths through the resultant network contribute to overall 'network-level' correlations. For biological applications, understanding these relationships is particularly relevant for parsing structural information contained in complex subnetworks. RESULTS: We propose the pair-path subscore (PPS), a method for interpreting Gaussian graphical models at the level of individual network paths. The scoring is based on the relative importance of such paths in determining the Pearson correlation between their terminal nodes. PPS is validated using human metabolomics data from the Hyperglycemia and adverse pregnancy outcome (HAPO) study, with observations confirming well-documented biological relationships among the metabolites. We also highlight how the PPS can be used in an exploratory fashion to generate new biological hypotheses. Our method is implemented in the R package pps, available at https://github.com/nathan-gill/pps . CONCLUSIONS: The PPS can be used to probe network structure on a finer scale by investigating which paths in a potentially intricate topology contribute most substantially to marginal behavior. Adding PPS to the network analysis toolkit may enable researchers to ask new questions about the relationships among nodes in network data.
Collapse
Affiliation(s)
- Nathan P Gill
- Feinberg School of Medicine, Northwestern University, Chicago, IL, USA
| | - Raji Balasubramanian
- Department of Biostatistics and Epidemiology, University of Massachusetts - Amherst, Amherst, MA, USA
| | - James R Bain
- Sarah W. Stedman Nutrition and Metabolism Center, Duke University Medical Center, Durham, NC, USA.,Duke Molecular Physiology Institute, Durham, NC, USA.,Duke University School of Medicine, Durham, NC, USA
| | - Michael J Muehlbauer
- Sarah W. Stedman Nutrition and Metabolism Center, Duke University Medical Center, Durham, NC, USA.,Duke Molecular Physiology Institute, Durham, NC, USA.,Duke University School of Medicine, Durham, NC, USA
| | - William L Lowe
- Feinberg School of Medicine, Northwestern University, Chicago, IL, USA
| | | |
Collapse
|
12
|
Jablonski KP, Pirkl M, Ćevid D, Bühlmann P, Beerenwinkel N. Identifying cancer pathway dysregulations using differential causal effects. Bioinformatics 2021; 38:1550-1559. [PMID: 34927666 PMCID: PMC8896597 DOI: 10.1093/bioinformatics/btab847] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2021] [Revised: 11/05/2021] [Accepted: 12/14/2021] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION Signaling pathways control cellular behavior. Dysregulated pathways, for example, due to mutations that cause genes and proteins to be expressed abnormally, can lead to diseases, such as cancer. RESULTS We introduce a novel computational approach, called Differential Causal Effects (dce), which compares normal to cancerous cells using the statistical framework of causality. The method allows to detect individual edges in a signaling pathway that are dysregulated in cancer cells, while accounting for confounding. Hence, technical artifacts have less influence on the results and dce is more likely to detect the true biological signals. We extend the approach to handle unobserved dense confounding, where each latent variable, such as, for example, batch effects or cell cycle states, affects many covariates. We show that dce outperforms competing methods on synthetic datasets and on CRISPR knockout screens. We validate its latent confounding adjustment properties on a GTEx (Genotype-Tissue Expression) dataset. Finally, in an exploratory analysis on breast cancer data from TCGA (The Cancer Genome Atlas), we recover known and discover new genes involved in breast cancer progression. AVAILABILITY AND IMPLEMENTATION The method dce is freely available as an R package on Bioconductor (https://bioconductor.org/packages/release/bioc/html/dce.html) as well as on https://github.com/cbg-ethz/dce. The GitHub repository also contains the Snakemake workflows needed to reproduce all results presented here. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | | | - Domagoj Ćevid
- Seminar for Statistics, ETH Zürich, 8092 Zürich, Switzerland
| | - Peter Bühlmann
- Seminar for Statistics, ETH Zürich, 8092 Zürich, Switzerland
| | | |
Collapse
|
13
|
Yi H, Zhang Q, Sun Y, Ma S. Assisted estimation of gene expression graphical models. Genet Epidemiol 2021; 45:372-385. [PMID: 33527531 PMCID: PMC8137544 DOI: 10.1002/gepi.22377] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2020] [Revised: 12/16/2020] [Accepted: 12/31/2020] [Indexed: 02/02/2023]
Abstract
In the study of gene expression data, network analysis has played a uniquely important role. To accommodate the high dimensionality and low sample size and generate interpretable results, regularized estimation is usually conducted in the construction of gene expression Gaussian Graphical Models (GGM). Here we use GeO-GGM to represent gene-expression-only GGM. Gene expressions are regulated by regulators. gene-expression-regulator GGMs (GeR-GGMs), which accommodate gene expressions as well as their regulators, have been constructed accordingly. In practical data analysis, with a "lack of information" caused by the large number of model parameters, limited sample size, and weak signals, the construction of both GeO-GGMs and GeR-GGMs is often unsatisfactory. In this article, we recognize that with the regulation between gene expressions and regulators, the sparsity structures of a GeO-GGM and its GeR-GGM counterpart can satisfy a hierarchy. Accordingly, we propose a joint estimation which reinforces the hierarchical structure and use the construction of a GeO-GGM to assist that of its GeR-GGM counterpart and vice versa. Consistency properties are rigorously established, and an effective computational algorithm is developed. In simulation, the assisted construction outperforms the separation construction of GeO-GGM and GeR-GGM. Two The Cancer Genome Atlas data sets are analyzed, leading to findings different from the direct competitors.
Collapse
Affiliation(s)
- Huangdi Yi
- Department of Biostatistics, Yale University
| | - Qingzhao Zhang
- Department of Statistics, School of Economics; Key Laboratory of Econometrics, Ministry of Education; The Wang Yanan Institute for Studies in Economics, Xiamen University
| | - Yifan Sun
- Center of Applied Statistics, School of Statistics, Renmin University of China
| | - Shuangge Ma
- Department of Biostatistics, Yale University
- Department of Statistics, School of Economics; Key Laboratory of Econometrics, Ministry of Education; The Wang Yanan Institute for Studies in Economics, Xiamen University
| |
Collapse
|
14
|
Arbet J, Zhuang Y, Litkowski E, Saba L, Kechris K. Comparing Statistical Tests for Differential Network Analysis of Gene Modules. Front Genet 2021; 12:630215. [PMID: 34093641 PMCID: PMC8170128 DOI: 10.3389/fgene.2021.630215] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2020] [Accepted: 04/19/2021] [Indexed: 11/13/2022] Open
Abstract
Genes often work together to perform complex biological processes, and "networks" provide a versatile framework for representing the interactions between multiple genes. Differential network analysis (DiNA) quantifies how this network structure differs between two or more groups/phenotypes (e.g., disease subjects and healthy controls), with the goal of determining whether differences in network structure can help explain differences between phenotypes. In this paper, we focus on gene co-expression networks, although in principle, the methods studied can be used for DiNA for other types of features (e.g., metabolome, epigenome, microbiome, proteome, etc.). Three common applications of DiNA involve (1) testing whether the connections to a single gene differ between groups, (2) testing whether the connection between a pair of genes differs between groups, or (3) testing whether the connections within a "module" (a subset of 3 or more genes) differs between groups. This article focuses on the latter, as there is a lack of studies comparing statistical methods for identifying differentially co-expressed modules (DCMs). Through extensive simulations, we compare several previously proposed test statistics and a new p-norm difference test (PND). We demonstrate that the true positive rate of the proposed PND test is competitive with and often higher than the other methods, while controlling the false positive rate. The R package discoMod (differentially co-expressed modules) implements the proposed method and provides a full pipeline for identifying DCMs: clustering tools to derive gene modules, tests to identify DCMs, and methods for visualizing the results.
Collapse
Affiliation(s)
- Jaron Arbet
- Department of Biostatistics and Informatics, Colorado School of Public Health, University of Colorado Anschutz Medical Campus, Aurora, CO, United States
| | - Yaxu Zhuang
- Department of Biostatistics and Informatics, Colorado School of Public Health, University of Colorado Anschutz Medical Campus, Aurora, CO, United States
| | - Elizabeth Litkowski
- Department of Epidemiology, Colorado School of Public Health, University of Colorado Anschutz Medical Campus, Aurora, CO, United States
| | - Laura Saba
- Department of Pharmaceutical Sciences, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of Colorado Anschutz Medical Campus, Aurora CO, United States
| | - Katerina Kechris
- Department of Biostatistics and Informatics, Colorado School of Public Health, University of Colorado Anschutz Medical Campus, Aurora, CO, United States
| |
Collapse
|
15
|
Kontio JAJ, Pyhäjärvi T, Sillanpää MJ. Model guided trait-specific co-expression network estimation as a new perspective for identifying molecular interactions and pathways. PLoS Comput Biol 2021; 17:e1008960. [PMID: 33939702 PMCID: PMC8118548 DOI: 10.1371/journal.pcbi.1008960] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2020] [Revised: 05/13/2021] [Accepted: 04/13/2021] [Indexed: 11/19/2022] Open
Abstract
A wide variety of 1) parametric regression models and 2) co-expression networks have been developed for finding gene-by-gene interactions underlying complex traits from expression data. While both methodological schemes have their own well-known benefits, little is known about their synergistic potential. Our study introduces their methodological fusion that cross-exploits the strengths of individual approaches via a built-in information-sharing mechanism. This fusion is theoretically based on certain trait-conditioned dependency patterns between two genes depending on their role in the underlying parametric model. Resulting trait-specific co-expression network estimation method 1) serves to enhance the interpretation of biological networks in a parametric sense, and 2) exploits the underlying parametric model itself in the estimation process. To also account for the substantial amount of intrinsic noise and collinearities, often entailed by expression data, a tailored co-expression measure is introduced along with this framework to alleviate related computational problems. A remarkable advance over the reference methods in simulated scenarios substantiate the method's high-efficiency. As proof-of-concept, this synergistic approach is successfully applied in survival analysis, with acute myeloid leukemia data, further highlighting the framework's versatility and broad practical relevance.
Collapse
Affiliation(s)
- Juho A. J. Kontio
- Research Unit of Mathematical Sciences, University of Oulu, Oulu, Finland
| | - Tanja Pyhäjärvi
- Department of Ecology and Genetics, University of Oulu, Oulu, Finland
- Department of Forest Sciences, University of Helsinki, Helsinki, Finland
| | - Mikko J. Sillanpää
- Research Unit of Mathematical Sciences, University of Oulu, Oulu, Finland
- * E-mail:
| |
Collapse
|
16
|
Hinoveanu LC, Leisen F, Villa C. A loss‐based prior for Gaussian graphical models. AUST NZ J STAT 2021. [DOI: 10.1111/anzs.12307] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- Laurenţiu Cătălin Hinoveanu
- School of Mathematics, Statistics and Actuarial Science University of Kent Sibson Building Canterbury CT2 7FSUK
| | - Fabrizio Leisen
- School of Mathematical Sciences University of Nottingham University Park Nottingham NG7 2RDUK
| | - Cristiano Villa
- School of Mathematics, Statistics and Physics Newcastle University Herschel Building Newcastle NE1 7RUUK
| |
Collapse
|
17
|
Kim J, Zhu H, Wang X, Do K. Scalable network estimation with
L
0
penalty. Stat Anal Data Min 2021; 14:18-30. [DOI: 10.1002/sam.11483] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Affiliation(s)
- Junghi Kim
- Center for Drug Evaluation and Research U.S. Food and Drug Administration Silver Spring Maryland USA
| | - Hongtu Zhu
- Department of Biostatistics University of North Carolina Chapel Hill North Carolina USA
| | - Xiao Wang
- Department of Statistics Purdue University West Lafayette Indiana USA
| | - Kim‐Anh Do
- Department of Biostatistics University of Texas MD Anderson Cancer Center Houston Texas USA
| |
Collapse
|
18
|
Murad NF, Brandão MM. Probabilistic Graphical Models Applied to Biological Networks. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2021; 1346:119-130. [DOI: 10.1007/978-3-030-80352-0_7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
19
|
|
20
|
Shahdoust M, Mahjub H, Pezeshk H, Sadeghi M. A Network-Based Comparison Between Molecular Apocrine Breast Cancer Tumor and Basal and Luminal Tumors by Joint Graphical Lasso. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:1555-1562. [PMID: 30990436 DOI: 10.1109/tcbb.2019.2911074] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Joint graphical lasso (JGL) approach is a Gaussian graphical model to estimate multiple graphical models corresponding to distinct but related groups. Molecular apocrine (MA) breast cancer tumor has similar characteristics to luminal and basal subtypes. Due to the relationship between MA tumor and two other subtypes, this paper investigates the similarities and differences between the MA genes association network and the ones corresponding to other tumors by taking advantageous of JGL properties. Two distinct JGL graphical models are applied to two sub-datasets including the gene expression information of the MA and the luminal tumors and also the MA and the basal tumors. Then, topological comparisons between the networks such as finding the shared edges are applied. In addition, several support vector machine (SVM) classification models are performed to assess the discriminating power of some critical nodes in the networks, like hub nodes, to discriminate the tumors sample. Applying the JGL approach prepares an appropriate tool to observe the networks of the MA tumor and other subtypes in one map. The results obtained by comparing the networks could be helpful to generate new insight about MA tumor for future studies.
Collapse
|
21
|
Kontio JAJ, Rinta-Aho MJ, Sillanpää MJ. Estimating Linear and Nonlinear Gene Coexpression Networks by Semiparametric Neighborhood Selection. Genetics 2020; 215:597-607. [PMID: 32414870 PMCID: PMC7337083 DOI: 10.1534/genetics.120.303186] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2020] [Accepted: 05/11/2020] [Indexed: 11/18/2022] Open
Abstract
Whereas nonlinear relationships between genes are acknowledged, there exist only a few methods for estimating nonlinear gene coexpression networks or gene regulatory networks (GCNs/GRNs) with common deficiencies. These methods often consider only pairwise associations between genes, and are, therefore, poorly capable of identifying higher-order regulatory patterns when multiple genes should be considered simultaneously. Another critical issue in current nonlinear GCN/GRN estimation approaches is that they consider linear and nonlinear dependencies at the same time in confounded form nonparametrically. This severely undermines the possibilities for nonlinear associations to be found, since the power of detecting nonlinear dependencies is lower compared to linear dependencies, and the sparsity-inducing procedures might favor linear relationships over nonlinear ones only due to small sample sizes. In this paper, we propose a method to estimate undirected nonlinear GCNs independently from the linear associations between genes based on a novel semiparametric neighborhood selection procedure capable of identifying complex nonlinear associations between genes. Simulation studies using the common DREAM3 and DREAM9 datasets show that the proposed method compares superiorly to the current nonlinear GCN/GRN estimation methods.
Collapse
Affiliation(s)
- Juho A J Kontio
- Research Unit of Mathematical Sciences, Biocenter Oulu, University of Oulu, 90014, Finland
| | - Marko J Rinta-Aho
- Research Unit of Mathematical Sciences, Biocenter Oulu, University of Oulu, 90014, Finland
| | - Mikko J Sillanpää
- Research Unit of Mathematical Sciences, Biocenter Oulu, University of Oulu, 90014, Finland
- Infotech Oulu, University of Oulu, 90014, Finland
| |
Collapse
|
22
|
Altenbuchinger M, Weihs A, Quackenbush J, Grabe HJ, Zacharias HU. Gaussian and Mixed Graphical Models as (multi-)omics data analysis tools. BIOCHIMICA ET BIOPHYSICA ACTA. GENE REGULATORY MECHANISMS 2020; 1863:194418. [PMID: 31639475 PMCID: PMC7166149 DOI: 10.1016/j.bbagrm.2019.194418] [Citation(s) in RCA: 27] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/31/2019] [Revised: 08/21/2019] [Accepted: 08/21/2019] [Indexed: 11/30/2022]
Abstract
Gaussian Graphical Models (GGMs) are tools to infer dependencies between biological variables. Popular applications are the reconstruction of gene, protein, and metabolite association networks. GGMs are an exploratory research tool that can be useful to discover interesting relations between genes (functional clusters) or to identify therapeutically interesting genes, but do not necessarily infer a network in the mechanistic sense. Although GGMs are well investigated from a theoretical and applied perspective, important extensions are not well known within the biological community. GGMs assume, for instance, multivariate normal distributed data. If this assumption is violated Mixed Graphical Models (MGMs) can be the better choice. In this review, we provide the theoretical foundations of GGMs, present extensions such as MGMs or multi-class GGMs, and illustrate how those methods can provide insight in biological mechanisms. We summarize several applications and present user-friendly estimation software. This article is part of a Special Issue entitled: Transcriptional Profiles and Regulatory Gene Networks edited by Dr. Dr. Federico Manuel Giorgi and Dr. Shaun Mahony.
Collapse
Affiliation(s)
- Michael Altenbuchinger
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, MA Boston, 02115, USA.
| | - Antoine Weihs
- Department of Psychiatry and Psychotherapy, University Medicine Greifswald, 17475 Greifswald, Germany
| | - John Quackenbush
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, MA Boston, 02115, USA; Channing Division of Network Medicine, Brigham and Women's Hospital, Boston, MA 02115, USA; Department of Medicine, Harvard Medical School, Boston, MA 02115, USA
| | - Hans Jörgen Grabe
- Department of Psychiatry and Psychotherapy, University Medicine Greifswald, 17475 Greifswald, Germany; German Center for Neurodegenerative Diseases DZNE, Site Rostock/Greifswald, 17475 Greifswald, Germany
| | - Helena U Zacharias
- Department of Psychiatry and Psychotherapy, University Medicine Greifswald, 17475 Greifswald, Germany.
| |
Collapse
|
23
|
Williams DR, Rast P. Back to the basics: Rethinking partial correlation network methodology. THE BRITISH JOURNAL OF MATHEMATICAL AND STATISTICAL PSYCHOLOGY 2020; 73:187-212. [PMID: 31206621 PMCID: PMC8572131 DOI: 10.1111/bmsp.12173] [Citation(s) in RCA: 81] [Impact Index Per Article: 20.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/23/2018] [Revised: 03/02/2019] [Indexed: 05/08/2023]
Abstract
The Gaussian graphical model (GGM) is an increasingly popular technique used in psychology to characterize relationships among observed variables. These relationships are represented as elements in the precision matrix. Standardizing the precision matrix and reversing the sign yields corresponding partial correlations that imply pairwise dependencies in which the effects of all other variables have been controlled for. The graphical lasso (glasso) has emerged as the default estimation method, which uses ℓ1 -based regularization. The glasso was developed and optimized for high-dimensional settings where the number of variables (p) exceeds the number of observations (n), which is uncommon in psychological applications. Here we propose to go 'back to the basics', wherein the precision matrix is first estimated with non-regularized maximum likelihood and then Fisher Z transformed confidence intervals are used to determine non-zero relationships. We first show the exact correspondence between the confidence level and specificity, which is due to 1 minus specificity denoting the false positive rate (i.e., α). With simulations in low-dimensional settings (p ≪ n), we then demonstrate superior performance compared to the glasso for detecting the non-zero effects. Further, our results indicate that the glasso is inconsistent for the purpose of model selection and does not control the false discovery rate, whereas the proposed method converges on the true model and directly controls error rates. We end by discussing implications for estimating GGMs in psychology.
Collapse
|
24
|
Law SR, Kellgren TG, Björk R, Ryden P, Keech O. Centralization Within Sub-Experiments Enhances the Biological Relevance of Gene Co-expression Networks: A Plant Mitochondrial Case Study. FRONTIERS IN PLANT SCIENCE 2020; 11:524. [PMID: 32582224 PMCID: PMC7287149 DOI: 10.3389/fpls.2020.00524] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/12/2019] [Accepted: 04/07/2020] [Indexed: 05/07/2023]
Abstract
UNLABELLED Gene co-expression networks (GCNs) can be prepared using a variety of mathematical approaches based on data sampled across diverse developmental processes, tissue types, pathologies, mutant backgrounds, and stress conditions. These networks are used to identify genes with similar expression dynamics but are prone to introducing false-positive and false-negative relationships, especially in the instance of large and heterogenous datasets. With the aim of optimizing the relevance of edges in GCNs and enhancing global biological insight, we propose a novel approach that involves a data-centering step performed simultaneously per gene and per sub-experiment, called centralization within sub-experiments (CSE). Using a gene set encoding the plant mitochondrial proteome as a case study, our results show that all CSE-based GCNs assessed had significantly more edges within the majority of the considered functional sub-networks, such as the mitochondrial electron transport chain and its complexes, than GCNs not using CSE; thus demonstrating that CSE-based GCNs are efficient at predicting canonical functions and associated pathways, here referred to as the core gene network. Furthermore, we show that correlation analyses using CSE-processed data can be used to fine-tune prediction of the function of uncharacterized genes; while its use in combination with analyses based on non-CSE data can augment conventional stress analyses with the innate connections underpinning the dynamic system being examined. Therefore, CSE is an effective alternative method to conventional batch correction approaches, particularly when dealing with large and heterogenous datasets. The method is easy to implement into a pre-existing GCN analysis pipeline and can provide enhanced biological relevance to conventional GCNs by allowing users to delineate a core gene network. AUTHOR SUMMARY Gene co-expression networks (GCNs) are the product of a variety of mathematical approaches that identify causal relationships in gene expression dynamics but are prone to the misdiagnoses of false-positives and false-negatives, especially in the instance of large and heterogenous datasets. In light of the burgeoning output of next-generation sequencing projects performed on a variety of species, and developmental or clinical conditions; the statistical power and complexity of these networks will undoubtedly increase, while their biological relevance will be fiercely challenged. Here, we propose a novel approach to generate a "core" GCN with enhanced biological relevance. Our method involves a data-centering step that effectively removes all primary treatment/tissue effects, which is simple to employ and can be easily implemented into pre-existing GCN analysis pipelines. The gain in biological relevance resulting from the adoption of this approach was assessed using a plant mitochondrial case study.
Collapse
Affiliation(s)
- Simon R. Law
- Department of Plant Physiology, Umeå Plant Science Centre, Umeå Universitet, Umeå, Sweden
| | - Therese G. Kellgren
- Department of Mathematics and Mathematical Statistics, Umeå Universitet, Umeå, Sweden
| | - Rafael Björk
- Department of Mathematics and Mathematical Statistics, Umeå Universitet, Umeå, Sweden
| | - Patrik Ryden
- Department of Mathematics and Mathematical Statistics, Umeå Universitet, Umeå, Sweden
- *Correspondence: Patrik Ryden,
| | - Olivier Keech
- Department of Plant Physiology, Umeå Plant Science Centre, Umeå Universitet, Umeå, Sweden
- Olivier Keech,
| |
Collapse
|
25
|
Imkamp K, Bernal V, Grzegorzcyk M, Horvatovich P, Vermeulen CJ, Heijink IH, Guryev V, Kerstjens HAM, van den Berge M, Faiz A. Gene network approach reveals co-expression patterns in nasal and bronchial epithelium. Sci Rep 2019; 9:15835. [PMID: 31676779 PMCID: PMC6825243 DOI: 10.1038/s41598-019-50963-x] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2019] [Accepted: 09/13/2019] [Indexed: 12/20/2022] Open
Abstract
Nasal gene expression profiling is a new approach to investigate the airway epithelium as a biomarker to study the activity and treatment responses of obstructive pulmonary diseases. We investigated to what extent gene expression profiling of nasal brushings is similar to that of bronchial brushings. We performed genome wide gene expression profiling on matched nasal and bronchial epithelial brushes from 77 respiratory healthy individuals. To investigate differences and similarities among regulatory modules, network analysis was performed on correlated, differentially expressed and smoking-related genes using Gaussian Graphical Models. Between nasal and bronchial brushes, 619 genes were correlated and 1692 genes were differentially expressed (false discovery rate <0.05, |Fold-change|>2). Network analysis of correlated genes showed pro-inflammatory pathways to be similar between the two locations. Focusing on smoking-related genes, cytochrome-P450 pathway related genes were found to be similar, supporting the concept of a detoxifying response to tobacco exposure throughout the airways. In contrast, cilia-related pathways were decreased in nasal compared to bronchial brushes when focusing on differentially expressed genes. Collectively, while there are substantial differences in gene expression between nasal and bronchial brushes, we also found similarities, especially in the response to the external factors such as smoking.
Collapse
Affiliation(s)
- Kai Imkamp
- University of Groningen, University Medical Center Groningen, Department of Pulmonology, Groningen, The Netherlands. .,University of Groningen, University Medical Center Groningen, GRIAC (Groningen Research Institute for Asthma and COPD), Groningen, The Netherlands.
| | - Victor Bernal
- University of Groningen, Bernoulli Institute (JBI), Groningen, The Netherlands.,University of Groningen, Department of Pharmacy, Analytical Biochemistry, Groningen, The Netherlands
| | - Marco Grzegorzcyk
- University of Groningen, Bernoulli Institute (JBI), Groningen, The Netherlands
| | - Peter Horvatovich
- University of Groningen, Department of Pharmacy, Analytical Biochemistry, Groningen, The Netherlands
| | - Cornelis J Vermeulen
- University of Groningen, University Medical Center Groningen, Department of Pulmonology, Groningen, The Netherlands.,University of Groningen, University Medical Center Groningen, GRIAC (Groningen Research Institute for Asthma and COPD), Groningen, The Netherlands
| | - Irene H Heijink
- University of Groningen, University Medical Center Groningen, Department of Pulmonology, Groningen, The Netherlands.,University of Groningen, University Medical Center Groningen, GRIAC (Groningen Research Institute for Asthma and COPD), Groningen, The Netherlands.,University of Groningen, University Medical Center Groningen, Department of Pathology & Medical Biology, section Medical Biology, Groningen, The Netherlands
| | - Victor Guryev
- University of Groningen, University Medical Center Groningen, GRIAC (Groningen Research Institute for Asthma and COPD), Groningen, The Netherlands.,European Research Institute for the Biology of Ageing, University of Groningen, University Medical Center Groningen, Groningen, The Netherlands
| | - Huib A M Kerstjens
- University of Groningen, University Medical Center Groningen, Department of Pulmonology, Groningen, The Netherlands.,University of Groningen, University Medical Center Groningen, GRIAC (Groningen Research Institute for Asthma and COPD), Groningen, The Netherlands
| | - Maarten van den Berge
- University of Groningen, University Medical Center Groningen, Department of Pulmonology, Groningen, The Netherlands.,University of Groningen, University Medical Center Groningen, GRIAC (Groningen Research Institute for Asthma and COPD), Groningen, The Netherlands
| | - Alen Faiz
- University of Groningen, University Medical Center Groningen, Department of Pulmonology, Groningen, The Netherlands.,University of Groningen, University Medical Center Groningen, GRIAC (Groningen Research Institute for Asthma and COPD), Groningen, The Netherlands.,University of Groningen, University Medical Center Groningen, Department of Pathology & Medical Biology, section Medical Biology, Groningen, The Netherlands.,University of Technology Sydney, Respiratory Bioinformatics and Molecular Biology (RBMB), School of life sciences, Sydney, Australia.,Woolcock Emphysema Centre, Woolcock Institute of Medical Research, University of Sydney, Sydney, NSW, Australia
| |
Collapse
|
26
|
Williams DR, Rhemtulla M, Wysocki AC, Rast P. On Nonregularized Estimation of Psychological Networks. MULTIVARIATE BEHAVIORAL RESEARCH 2019; 54:719-750. [PMID: 30957629 PMCID: PMC6736701 DOI: 10.1080/00273171.2019.1575716] [Citation(s) in RCA: 66] [Impact Index Per Article: 13.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/08/2023]
Abstract
An important goal for psychological science is developing methods to characterize relationships between variables. Customary approaches use structural equation models to connect latent factors to a number of observed measurements, or test causal hypotheses between observed variables. More recently, regularized partial correlation networks have been proposed as an alternative approach for characterizing relationships among variables through off-diagonal elements in the precision matrix. While the graphical Lasso (glasso) has emerged as the default network estimation method, it was optimized in fields outside of psychology with very different needs, such as high dimensional data where the number of variables (p) exceeds the number of observations (n). In this article, we describe the glasso method in the context of the fields where it was developed, and then we demonstrate that the advantages of regularization diminish in settings where psychological networks are often fitted ( p≪n ). We first show that improved properties of the precision matrix, such as eigenvalue estimation, and predictive accuracy with cross-validation are not always appreciable. We then introduce nonregularized methods based on multiple regression and a nonparametric bootstrap strategy, after which we characterize performance with extensive simulations. Our results demonstrate that the nonregularized methods can be used to reduce the false-positive rate, compared to glasso, and they appear to provide consistent performance across sparsity levels, sample composition (p/n), and partial correlation size. We end by reviewing recent findings in the statistics literature that suggest alternative methods often have superior performance than glasso, as well as suggesting areas for future research in psychology. The nonregularized methods have been implemented in the R package GGMnonreg.
Collapse
Affiliation(s)
- Donald R Williams
- Department of Psychology, University of California , Davis , CA , USA
| | - Mijke Rhemtulla
- Department of Psychology, University of California , Davis , CA , USA
| | - Anna C Wysocki
- Department of Psychology, University of California , Davis , CA , USA
| | - Philippe Rast
- Department of Psychology, University of California , Davis , CA , USA
| |
Collapse
|
27
|
A Statistical Test for Differential Network Analysis Based on Inference of Gaussian Graphical Model. Sci Rep 2019; 9:10863. [PMID: 31350445 PMCID: PMC6659630 DOI: 10.1038/s41598-019-47362-7] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2018] [Accepted: 07/15/2019] [Indexed: 11/09/2022] Open
Abstract
Differential network analysis investigates how the network of connected genes changes from one condition to another and has become a prevalent tool to provide a deeper and more comprehensive understanding of the molecular etiology of complex diseases. Based on the asymptotically normal estimation of large Gaussian graphical model (GGM) in the high-dimensional setting, we developed a computationally efficient test for differential network analysis through testing the equality of two precision matrices, which summarize the conditional dependence network structures of the genes. Additionally, we applied a multiple testing procedure to infer the differential network structure with false discovery rate (FDR) control. Through extensive simulation studies with different combinations of parameters including sample size, number of vertices, level of heterogeneity and graph structure, we demonstrated that our method performed much better than the current available methods in terms of accuracy and computational time. In real data analysis on lung adenocarcinoma, we revealed a differential network with 3503 nodes and 2550 edges, which consisted of 50 clusters with an FDR threshold at 0.05. Many of the top gene pairs in the differential network have been reported relevant to human cancers. Our method represents a powerful tool of network analysis for high-dimensional biological data.
Collapse
|
28
|
Jiang Y, Gruzieva O, Wang T, Forno E, Boutaoui N, Sun T, Merid SK, Acosta-Pérez E, Kull I, Canino G, Antó JM, Bousquet J, Melén E, Chen W, Celedón JC. Transcriptomics of atopy and atopic asthma in white blood cells from children and adolescents. Eur Respir J 2019; 53:13993003.00102-2019. [PMID: 30923181 DOI: 10.1183/13993003.00102-2019] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2019] [Accepted: 03/02/2019] [Indexed: 02/07/2023]
Abstract
Early allergic sensitisation (atopy) is the first step in the development of allergic diseases such as atopic asthma later in life. Genes and pathways associated with atopy and atopic asthma in children and adolescents have not been well characterised.A transcriptome-wide association study (TWAS) of atopy and atopic asthma in white blood cells (WBCs) or whole blood was conducted in a cohort of 460 Puerto Ricans aged 9-20 years (EVA-PR study) and in a cohort of 250 Swedish adolescents (BAMSE study). Pathway enrichment and network analyses were conducted to further assess top findings, and classification models of atopy and atopic asthma were built using expression levels for the top differentially expressed genes (DEGs).In a meta-analysis of the study cohorts, both previously implicated genes (e.g. IL5RA and IL1RL1) and genes not previously reported in TWASs (novel) were significantly associated with atopy and/or atopic asthma. Top novel genes for atopy included SIGLEC8 (p=8.07×10-13), SLC29A1 (p=7.07×10-12) and SMPD3 (p=1.48×10-11). Expression quantitative trait locus analyses identified multiple asthma-relevant genotype-expression pairs, such as rs2255888/ALOX15 Pathway enrichment analysis uncovered 16 significantly enriched pathways at adjusted p<0.01, including those relevant to T-helper cell type 1 (Th1) and Th2 immune responses. Classification models built using the top DEGs and a few demographic/parental history variables accurately differentiated subjects with atopic asthma from nonatopic control subjects (area under the curve 0.84).We have identified genes and pathways for atopy and atopic asthma in children and adolescents, using transcriptome-wide data from WBCs and whole blood samples.
Collapse
Affiliation(s)
- Yale Jiang
- Division of Pulmonary Medicine, Dept of Pediatrics, UPMC Children's Hospital of Pittsburgh, University of Pittsburgh, Pittsburgh, PA, USA.,School of Medicine, Tsinghua University, Beijing, China.,These two authors contributed equally to this work
| | - Olena Gruzieva
- Institute of Environmental Medicine, Karolinska Institutet, Stockholm, Sweden.,These two authors contributed equally to this work
| | - Ting Wang
- Division of Pulmonary Medicine, Dept of Pediatrics, UPMC Children's Hospital of Pittsburgh, University of Pittsburgh, Pittsburgh, PA, USA
| | - Erick Forno
- Division of Pulmonary Medicine, Dept of Pediatrics, UPMC Children's Hospital of Pittsburgh, University of Pittsburgh, Pittsburgh, PA, USA
| | - Nadia Boutaoui
- Division of Pulmonary Medicine, Dept of Pediatrics, UPMC Children's Hospital of Pittsburgh, University of Pittsburgh, Pittsburgh, PA, USA
| | - Tao Sun
- Dept of Biostatistics, Graduate School of Public Health, University of Pittsburgh, Pittsburgh, PA, USA
| | - Simon K Merid
- Institute of Environmental Medicine, Karolinska Institutet, Stockholm, Sweden
| | - Edna Acosta-Pérez
- Behavioral Sciences Research Institute, University of Puerto Rico, San Juan, Puerto Rico
| | - Inger Kull
- Institute of Environmental Medicine, Karolinska Institutet, Stockholm, Sweden
| | - Glorisa Canino
- Behavioral Sciences Research Institute, University of Puerto Rico, San Juan, Puerto Rico
| | - Josep M Antó
- ISGlobal, Barcelona Institute for Global Health, Barcelona, Spain
| | - Jean Bousquet
- CESP, Inserm U1018, Villejuif, France.,University Hospital, Montpellier, France
| | - Erik Melén
- Institute of Environmental Medicine, Karolinska Institutet, Stockholm, Sweden.,These three authors are joint senior authors
| | - Wei Chen
- Division of Pulmonary Medicine, Dept of Pediatrics, UPMC Children's Hospital of Pittsburgh, University of Pittsburgh, Pittsburgh, PA, USA.,These three authors are joint senior authors
| | - Juan C Celedón
- Division of Pulmonary Medicine, Dept of Pediatrics, UPMC Children's Hospital of Pittsburgh, University of Pittsburgh, Pittsburgh, PA, USA .,These three authors are joint senior authors
| |
Collapse
|
29
|
Maffeo C, Chou HY, Aksimentiev A. Molecular Mechanisms of DNA Replication and Repair Machinery: Insights from Microscopic Simulations. ADVANCED THEORY AND SIMULATIONS 2019; 2:1800191. [PMID: 31728433 PMCID: PMC6855400 DOI: 10.1002/adts.201800191] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/30/2018] [Indexed: 12/15/2022]
Abstract
Reproduction, the hallmark of biological activity, requires making an accurate copy of the genetic material to allow the progeny to inherit parental traits. In all living cells, the process of DNA replication is carried out by a concerted action of multiple protein species forming a loose protein-nucleic acid complex, the replisome. Proofreading and error correction generally accompany replication but also occur independently, safeguarding genetic information through all phases of the cell cycle. Advances in biochemical characterization of intracellular processes, proteomics and the advent of single-molecule biophysics have brought about a treasure trove of information awaiting to be assembled into an accurate mechanistic model of the DNA replication process. In this review, we describe recent efforts to model elements of DNA replication and repair processes using computer simulations, an approach that has gained immense popularity in many areas of molecular biophysics but has yet to become mainstream in the DNA metabolism community. We highlight the use of diverse computational methods to address specific problems of the fields and discuss unexplored possibilities that lie ahead for the computational approaches in these areas.
Collapse
Affiliation(s)
- Christopher Maffeo
- Department of Physics, Center for the Physics of Living Cells, University of Illinois at Urbana-Champaign,1110 W Green St, Urbana, IL 61801, USA
| | - Han-Yi Chou
- Department of Physics, Center for the Physics of Living Cells, University of Illinois at Urbana-Champaign,1110 W Green St, Urbana, IL 61801, USA
| | - Aleksei Aksimentiev
- Department of Physics, Center for the Physics of Living Cells, University of Illinois at Urbana-Champaign,1110 W Green St, Urbana, IL 61801, USA
| |
Collapse
|
30
|
Shi M, Shen W, Chong Y, Wang HQ. Improving GRN re-construction by mining hidden regulatory signals. IET Syst Biol 2019; 11:174-181. [PMID: 29125126 PMCID: PMC8687237 DOI: 10.1049/iet-syb.2017.0013] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023] Open
Abstract
Inferring gene regulatory networks (GRNs) from gene expression data is an important but challenging issue in systems biology. Here, the authors propose a dictionary learning-based approach that aims to infer GRNs by globally mining regulatory signals, known or latent. Gene expression is often regulated by various regulatory factors, some of which are observed and some of which are latent. The authors assume that all regulators are unknown for a target gene and the expression of the target gene can be mapped into a regulatory space spanned by all the regulators. Specifically, the authors modify the dictionary learning model, k-SVD, according to the sparse property of GRNs for mining the regulatory signals. The recovered regulatory signals are then used as a pool of regulatory factors to calculate a confidence score for a given transcription factor regulating a target gene. The capability of recovering hidden regulatory signals was verified on simulated data. Comparative experiments for GRN inference between the proposed algorithm (OURM) and some state-of-the-art algorithms, e.g. GENIE3 and ARACNE, on real-world data sets show the superior performance of OURM in inferring GRNs: higher area under the receiver operating characteristic curves and area under the precision-recall curves.
Collapse
Affiliation(s)
- Ming Shi
- State Key Laboratory for Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University, 129 Luoyu Road, Wuhan 430079, People's Republic of China
| | - Weiming Shen
- State Key Laboratory for Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University, 129 Luoyu Road, Wuhan 430079, People's Republic of China
| | - Yanwen Chong
- State Key Laboratory for Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University, 129 Luoyu Road, Wuhan 430079, People's Republic of China
| | - Hong-Qiang Wang
- Machine Intelligence and Computational Biology Laboratory, Institute of Intelligent Machines, Chinese Academy of Science, PO Box 1130, Hefei 230031, People's Republic of China.
| |
Collapse
|
31
|
Izadi F. Differential Connectivity in Colorectal Cancer Gene Expression Network. IRANIAN BIOMEDICAL JOURNAL 2019; 23. [PMID: 29843204 PMCID: PMC6305824 DOI: 10.29252/.23.1.34] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
BACKGROUND Colorectal cancer (CRC) is one of the challenging types of cancers; thus, exploring effective biomarkers related to colorectal could lead to significant progresses toward the treatment of this disease. METHODS In the present study, CRC gene expression datasets have been reanalyzed. Mutual differentially expressed genes across 294 normal mucosa and adjacent tumoral samples were then utilized in order to build two independent transcriptional regulatory networks. By analyzing the networks topologically, genes with differential global connectivity related to cancer state were determined for which the potential transcriptional regulators including transcription factors were identified. RESULTS The majority of differentially connected genes (DCGs) were up-regulated in colorectal transcriptome experiments. Moreover, a number of these genes have been experimentally validated as cancer or CRC-associated genes. The DCGs, including GART, TGFB1, ITGA2, SLC16A5, SOX9, and MMP7, were investigated across 12 cancer types. Functional enrichment analysis followed by detailed data mining exhibited that these candidate genes could be related to CRC by mediating in metastatic cascade in addition to shared pathways with 12 cancer types by triggering the inflammatory events. DISCUSSION Our study uncovered correlated alterations in gene expression related to CRC susceptibility and progression that the potent candidate biomarkers could provide a link to disease.
Collapse
Affiliation(s)
- Fereshteh Izadi
- Sari Agricultural Sciences and Natural Resources University (SANRU), Farah Abad Road, Mazandaran 4818168984, Iran,Corresponding Author: Fereshteh Izadi Sari Agricultural Sciences and Natural Resources University (SANRU), Farah Abad Road, Mazandaran 4818168984, Iran; Mobile: (+98-918) 6291302; E-mail:
| |
Collapse
|
32
|
Khatibipour MJ, Kurtoğlu F, Çakır T. JacLy: a Jacobian-based method for the inference of metabolic interactions from the covariance of steady-state metabolome data. PeerJ 2018; 6:e6034. [PMID: 30564518 PMCID: PMC6286809 DOI: 10.7717/peerj.6034] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2018] [Accepted: 10/30/2018] [Indexed: 11/20/2022] Open
Abstract
Reverse engineering metabolome data to infer metabolic interactions is a challenging research topic. Here we introduce JacLy, a Jacobian-based method to infer metabolic interactions of small networks (<20 metabolites) from the covariance of steady-state metabolome data. The approach was applied to two different in silico small-scale metabolome datasets. The power of JacLy lies on the use of steady-state metabolome data to predict the Jacobian matrix of the system, which is a source of information on structure and dynamic characteristics of the system. Besides its advantage of inferring directed interactions, its superiority over correlation-based network inference was especially clear in terms of the required number of replicates and the effect of the use of priori knowledge in the inference. Additionally, we showed the use of standard deviation of the replicate data as a suitable approximation for the magnitudes of metabolite fluctuations inherent in the system.
Collapse
Affiliation(s)
- Mohammad Jafar Khatibipour
- Computational Systems Biology Group, Department of Bioengineering, Gebze Technical University, Gebze, Kocaeli, Turkey.,Department of Chemical Engineering, Gebze Technical University, Gebze, Kocaeli, Turkey
| | - Furkan Kurtoğlu
- Computational Systems Biology Group, Department of Bioengineering, Gebze Technical University, Gebze, Kocaeli, Turkey
| | - Tunahan Çakır
- Computational Systems Biology Group, Department of Bioengineering, Gebze Technical University, Gebze, Kocaeli, Turkey
| |
Collapse
|
33
|
Zhang R, Ren Z, Chen W. SILGGM: An extensive R package for efficient statistical inference in large-scale gene networks. PLoS Comput Biol 2018; 14:e1006369. [PMID: 30102702 PMCID: PMC6107288 DOI: 10.1371/journal.pcbi.1006369] [Citation(s) in RCA: 26] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2018] [Revised: 08/23/2018] [Accepted: 07/17/2018] [Indexed: 11/18/2022] Open
Abstract
Gene co-expression network analysis is extremely useful in interpreting a complex biological process. The recent droplet-based single-cell technology is able to generate much larger gene expression data routinely with thousands of samples and tens of thousands of genes. To analyze such a large-scale gene-gene network, remarkable progress has been made in rigorous statistical inference of high-dimensional Gaussian graphical model (GGM). These approaches provide a formal confidence interval or a p-value rather than only a single point estimator for conditional dependence of a gene pair and are more desirable for identifying reliable gene networks. To promote their widespread use, we herein introduce an extensive and efficient R package named SILGGM (Statistical Inference of Large-scale Gaussian Graphical Model) that includes four main approaches in statistical inference of high-dimensional GGM. Unlike the existing tools, SILGGM provides statistically efficient inference on both individual gene pair and whole-scale gene pairs. It has a novel and consistent false discovery rate (FDR) procedure in all four methodologies. Based on the user-friendly design, it provides outputs compatible with multiple platforms for interactive network visualization. Furthermore, comparisons in simulation illustrate that SILGGM can accelerate the existing MATLAB implementation to several orders of magnitudes and further improve the speed of the already very efficient R package FastGGM. Testing results from the simulated data confirm the validity of all the approaches in SILGGM even in a very large-scale setting with the number of variables or genes to a ten thousand level. We have also applied our package to a novel single-cell RNA-seq data set with pan T cells. The results show that the approaches in SILGGM significantly outperform the conventional ones in a biological sense. The package is freely available via CRAN at https://cran.r-project.org/package=SILGGM.
Collapse
Affiliation(s)
- Rong Zhang
- Department of Statistics, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
| | - Zhao Ren
- Department of Statistics, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
| | - Wei Chen
- Division of Pulmonary Medicine; Department of Pediatrics, Children’s Hospital of Pittsburgh of UPMC, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
- Department of Biostatistics, University of Pittsburgh Graduate School of Public Health, Pittsburgh, Pennsylvania, United States of America
| |
Collapse
|
34
|
Thorne T. Approximate inference of gene regulatory network models from RNA-Seq time series data. BMC Bioinformatics 2018; 19:127. [PMID: 29642837 PMCID: PMC5896118 DOI: 10.1186/s12859-018-2125-2] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2017] [Accepted: 03/22/2018] [Indexed: 01/08/2023] Open
Abstract
Background Inference of gene regulatory network structures from RNA-Seq data is challenging due to the nature of the data, as measurements take the form of counts of reads mapped to a given gene. Here we present a model for RNA-Seq time series data that applies a negative binomial distribution for the observations, and uses sparse regression with a horseshoe prior to learn a dynamic Bayesian network of interactions between genes. We use a variational inference scheme to learn approximate posterior distributions for the model parameters. Results The methodology is benchmarked on synthetic data designed to replicate the distribution of real world RNA-Seq data. We compare our method to other sparse regression approaches and find improved performance in learning directed networks. We demonstrate an application of our method to a publicly available human neuronal stem cell differentiation RNA-Seq time series data set to infer the underlying network structure. Conclusions Our method is able to improve performance on synthetic data by explicitly modelling the statistical distribution of the data when learning networks from RNA-Seq time series. Applying approximate inference techniques we can learn network structures quickly with only moderate computing resources.
Collapse
Affiliation(s)
- Thomas Thorne
- Department of Computer Science, University of Reading, Reading, UK.
| |
Collapse
|
35
|
Fang J, Xu C, Zille P, Lin D, Deng HW, Calhoun VD, Wang YP. Fast and Accurate Detection of Complex Imaging Genetics Associations Based on Greedy Projected Distance Correlation. IEEE TRANSACTIONS ON MEDICAL IMAGING 2018; 37:860-870. [PMID: 29990017 PMCID: PMC6043419 DOI: 10.1109/tmi.2017.2783244] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/20/2023]
Abstract
Recent advances in imaging genetics produce large amounts of data including functional MRI images, single nucleotide polymorphisms (SNPs), and cognitive assessments. Understanding the complex interactions among these heterogeneous and complementary data has the potential to help with diagnosis and prevention of mental disorders. However, limited efforts have been made due to the high dimensionality, group structure, and mixed type of these data. In this paper we present a novel method to detect conditional associations between imaging genetics data. We use projected distance correlation to build a conditional dependency graph among high-dimensional mixed data, then use multiple testing to detect significant group level associations (e.g., ROI-gene). In addition, we introduce a scalable algorithm based on orthogonal greedy algorithm, yielding the greedy projected distance correlation (G-PDC). This can reduce the computational cost, which is critical for analyzing large-volume of imaging genomics data. The results from our simulations demonstrate a higher degree of accuracy with GPDC than distance correlation, Pearson's correlation and partial correlation, especially when the correlation is nonlinear. Finally, we apply our method to the Philadelphia Neurodevelopmental data cohort with 866 samples including fMRI images and SNP profiles. The results uncover several statistically significant and biologically interesting interactions, which are further validated with many existing studies. The Matlab code is available at https://sites.google.com/site/jianfang86/gPDC.
Collapse
|
36
|
Fang J, Zhang JG, Deng HW, Wang YP. Joint Detection of Associations between DNA Methylation and Gene Expression from Multiple Cancers. IEEE J Biomed Health Inform 2017; 22:1960-1969. [PMID: 29990049 DOI: 10.1109/jbhi.2017.2784621] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Abstract
DNA methylation plays an important role in the development of various cancers mainly through the regulation on gene expression. Hence, the study on the relation between DNA methylation and gene expression is of particular interest to understand cancers. Recently, an increasing number of datasets are available from multiple cancers, which makes it possible to study both the similarity and difference of genomic alterations across multiple tumor types. However, most of the existing pan-cancer analysis methods perform simple aggregations, which may overlook the heterogeneity of the interactions. In this paper, we propose a novel method to jointly detect complex associations between DNA methylation and gene expression levels from multiple cancers. The main idea is to apply joint sparse canonical correlation analysis to detect a small set of methylated sites, which are associated with another set of genes either shared across cancers or specific to a particular group (group-specific) of cancers. These methylated sites and genes form a complex module with strong multivariate correlations. We further introduced a joint sparse precision matrix estimation method to identify driver methylation-gene pairs in the module. These pairs are characterized by significant partial correlations, which may imply high functional impacts and contribute to complementary information to the main step. We apply our method to The Cancer Genome Atlas(TCGA) datasets with 1166 samples from four cancers. The results reveal significant shared and groupspecific interactions between DNA methylation and gene expression levels. To promote reproducible research, the Matlab code is available at https://sites.google.com/site/jianfang86/jointTCGA.
Collapse
|
37
|
A Systemic Analysis of Transcriptomic and Epigenomic Data To Reveal Regulation Patterns for Complex Disease. G3-GENES GENOMES GENETICS 2017; 7:2271-2279. [PMID: 28500050 PMCID: PMC5499134 DOI: 10.1534/g3.117.042408] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
Abstract
Integrating diverse genomics data can provide a global view of the complex biological processes related to the human complex diseases. Although substantial efforts have been made to integrate different omics data, there are at least three challenges for multi-omics integration methods: (i) How to simultaneously consider the effects of various genomic factors, since these factors jointly influence the phenotypes; (ii) How to effectively incorporate the information from publicly accessible databases and omics datasets to fully capture the interactions among (epi)genomic factors from diverse omics data; and (iii) Until present, the combination of more than two omics datasets has been poorly explored. Current integration approaches are not sufficient to address all of these challenges together. We proposed a novel integrative analysis framework by incorporating sparse model, multivariate analysis, Gaussian graphical model, and network analysis to address these three challenges simultaneously. Based on this strategy, we performed a systemic analysis for glioblastoma multiforme (GBM) integrating genome-wide gene expression, DNA methylation, and miRNA expression data. We identified three regulatory modules of genomic factors associated with GBM survival time and revealed a global regulatory pattern for GBM by combining the three modules, with respect to the common regulatory factors. Our method can not only identify disease-associated dysregulated genomic factors from different omics, but more importantly, it can incorporate the information from publicly accessible databases and omics datasets to infer a comprehensive interaction map of all these dysregulated genomic factors. Our work represents an innovative approach to enhance our understanding of molecular genomic mechanisms underlying human complex diseases.
Collapse
|
38
|
Thorne T. NetDiff - Bayesian model selection for differential gene regulatory network inference. Sci Rep 2016; 6:39224. [PMID: 27982083 PMCID: PMC5159802 DOI: 10.1038/srep39224] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2016] [Accepted: 11/18/2016] [Indexed: 11/09/2022] Open
Abstract
Differential networks allow us to better understand the changes in cellular processes that are exhibited in conditions of interest, identifying variations in gene regulation or protein interaction between, for example, cases and controls, or in response to external stimuli. Here we present a novel methodology for the inference of differential gene regulatory networks from gene expression microarray data. Specifically we apply a Bayesian model selection approach to compare models of conserved and varying network structure, and use Gaussian graphical models to represent the network structures. We apply a variational inference approach to the learning of Gaussian graphical models of gene regulatory networks, that enables us to perform Bayesian model selection that is significantly more computationally efficient than Markov Chain Monte Carlo approaches. Our method is demonstrated to be more robust than independent analysis of data from multiple conditions when applied to synthetic network data, generating fewer false positive predictions of differential edges. We demonstrate the utility of our approach on real world gene expression microarray data by applying it to existing data from amyotrophic lateral sclerosis cases with and without mutations in C9orf72, and controls, where we are able to identify differential network interactions for further investigation.
Collapse
Affiliation(s)
- Thomas Thorne
- Division of Brain Sciences, Imperial College London, UK
| |
Collapse
|
39
|
Differential Regulatory Analysis Based on Coexpression Network in Cancer Research. BIOMED RESEARCH INTERNATIONAL 2016; 2016:4241293. [PMID: 27597964 PMCID: PMC4997028 DOI: 10.1155/2016/4241293] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/14/2016] [Revised: 06/09/2016] [Accepted: 06/12/2016] [Indexed: 12/15/2022]
Abstract
With rapid development of high-throughput techniques and accumulation of big transcriptomic data, plenty of computational methods and algorithms such as differential analysis and network analysis have been proposed to explore genome-wide gene expression characteristics. These efforts are aiming to transform underlying genomic information into valuable knowledges in biological and medical research fields. Recently, tremendous integrative research methods are dedicated to interpret the development and progress of neoplastic diseases, whereas differential regulatory analysis (DRA) based on gene coexpression network (GCN) increasingly plays a robust complement to regular differential expression analysis in revealing regulatory functions of cancer related genes such as evading growth suppressors and resisting cell death. Differential regulatory analysis based on GCN is prospective and shows its essential role in discovering the system properties of carcinogenesis features. Here we briefly review the paradigm of differential regulatory analysis based on GCN. We also focus on the applications of differential regulatory analysis based on GCN in cancer research and point out that DRA is necessary and extraordinary to reveal underlying molecular mechanism in large-scale carcinogenesis studies.
Collapse
|