1
|
Zitnik M, Li MM, Wells A, Glass K, Morselli Gysi D, Krishnan A, Murali TM, Radivojac P, Roy S, Baudot A, Bozdag S, Chen DZ, Cowen L, Devkota K, Gitter A, Gosline SJC, Gu P, Guzzi PH, Huang H, Jiang M, Kesimoglu ZN, Koyuturk M, Ma J, Pico AR, Pržulj N, Przytycka TM, Raphael BJ, Ritz A, Sharan R, Shen Y, Singh M, Slonim DK, Tong H, Yang XH, Yoon BJ, Yu H, Milenković T. Current and future directions in network biology. BIOINFORMATICS ADVANCES 2024; 4:vbae099. [PMID: 39143982 PMCID: PMC11321866 DOI: 10.1093/bioadv/vbae099] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/27/2023] [Revised: 05/31/2024] [Accepted: 07/08/2024] [Indexed: 08/16/2024]
Abstract
Summary Network biology is an interdisciplinary field bridging computational and biological sciences that has proved pivotal in advancing the understanding of cellular functions and diseases across biological systems and scales. Although the field has been around for two decades, it remains nascent. It has witnessed rapid evolution, accompanied by emerging challenges. These stem from various factors, notably the growing complexity and volume of data together with the increased diversity of data types describing different tiers of biological organization. We discuss prevailing research directions in network biology, focusing on molecular/cellular networks but also on other biological network types such as biomedical knowledge graphs, patient similarity networks, brain networks, and social/contact networks relevant to disease spread. In more detail, we highlight areas of inference and comparison of biological networks, multimodal data integration and heterogeneous networks, higher-order network analysis, machine learning on networks, and network-based personalized medicine. Following the overview of recent breakthroughs across these five areas, we offer a perspective on future directions of network biology. Additionally, we discuss scientific communities, educational initiatives, and the importance of fostering diversity within the field. This article establishes a roadmap for an immediate and long-term vision for network biology. Availability and implementation Not applicable.
Collapse
Affiliation(s)
- Marinka Zitnik
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, United States
| | - Michelle M Li
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, United States
| | - Aydin Wells
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN 46556, United States
- Lucy Family Institute for Data and Society, University of Notre Dame, Notre Dame, IN 46556, United States
- Eck Institute for Global Health, University of Notre Dame, Notre Dame, IN 46556, United States
| | - Kimberly Glass
- Channing Division of Network Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA 02115, United States
| | - Deisy Morselli Gysi
- Channing Division of Network Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA 02115, United States
- Department of Statistics, Federal University of Paraná, Curitiba, Paraná 81530-015, Brazil
- Department of Physics, Northeastern University, Boston, MA 02115, United States
| | - Arjun Krishnan
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, United States
| | - T M Murali
- Department of Computer Science, Virginia Tech, Blacksburg, VA 24061, United States
| | - Predrag Radivojac
- Khoury College of Computer Sciences, Northeastern University, Boston, MA 02115, United States
| | - Sushmita Roy
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI 53715, United States
- Wisconsin Institute for Discovery, Madison, WI 53715, United States
| | - Anaïs Baudot
- Aix Marseille Université, INSERM, MMG, Marseille, France
| | - Serdar Bozdag
- Department of Computer Science and Engineering, University of North Texas, Denton, TX 76203, United States
- Department of Mathematics, University of North Texas, Denton, TX 76203, United States
| | - Danny Z Chen
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN 46556, United States
| | - Lenore Cowen
- Department of Computer Science, Tufts University, Medford, MA 02155, United States
| | - Kapil Devkota
- Department of Computer Science, Tufts University, Medford, MA 02155, United States
| | - Anthony Gitter
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI 53715, United States
- Morgridge Institute for Research, Madison, WI 53715, United States
| | - Sara J C Gosline
- Biological Sciences Division, Pacific Northwest National Laboratory, Seattle, WA 98109, United States
| | - Pengfei Gu
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN 46556, United States
| | - Pietro H Guzzi
- Department of Medical and Surgical Sciences, University Magna Graecia of Catanzaro, Catanzaro, 88100, Italy
| | - Heng Huang
- Department of Computer Science, University of Maryland College Park, College Park, MD 20742, United States
| | - Meng Jiang
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN 46556, United States
| | - Ziynet Nesibe Kesimoglu
- Department of Computer Science and Engineering, University of North Texas, Denton, TX 76203, United States
- National Center of Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20814, United States
| | - Mehmet Koyuturk
- Department of Computer and Data Sciences, Case Western Reserve University, Cleveland, OH 44106, United States
| | - Jian Ma
- Ray and Stephanie Lane Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, United States
| | - Alexander R Pico
- Institute of Data Science and Biotechnology, Gladstone Institutes, San Francisco, CA 94158, United States
| | - Nataša Pržulj
- Department of Computer Science, University College London, London, WC1E 6BT, England
- ICREA, Catalan Institution for Research and Advanced Studies, Barcelona, 08010, Spain
- Barcelona Supercomputing Center (BSC), Barcelona, 08034, Spain
| | - Teresa M Przytycka
- National Center of Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20814, United States
| | - Benjamin J Raphael
- Department of Computer Science, Princeton University, Princeton, NJ 08544, United States
| | - Anna Ritz
- Department of Biology, Reed College, Portland, OR 97202, United States
| | - Roded Sharan
- School of Computer Science, Tel Aviv University, Tel Aviv, 69978, Israel
| | - Yang Shen
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77843, United States
| | - Mona Singh
- Department of Computer Science, Princeton University, Princeton, NJ 08544, United States
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08544, United States
| | - Donna K Slonim
- Department of Computer Science, Tufts University, Medford, MA 02155, United States
| | - Hanghang Tong
- Department of Computer Science, University of Illinois Urbana-Champaign, Urbana, IL 61801, United States
| | - Xinan Holly Yang
- Department of Pediatrics, University of Chicago, Chicago, IL 60637, United States
| | - Byung-Jun Yoon
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77843, United States
- Computational Science Initiative, Brookhaven National Laboratory, Upton, NY 11973, United States
| | - Haiyuan Yu
- Department of Computational Biology, Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, NY 14853, United States
| | - Tijana Milenković
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN 46556, United States
- Lucy Family Institute for Data and Society, University of Notre Dame, Notre Dame, IN 46556, United States
- Eck Institute for Global Health, University of Notre Dame, Notre Dame, IN 46556, United States
| |
Collapse
|
2
|
Huang X, Zhang H. Detecting responsible nodes in differential Bayesian networks. Stat Med 2024; 43:3294-3312. [PMID: 38831542 DOI: 10.1002/sim.10125] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2023] [Revised: 03/25/2024] [Accepted: 05/18/2024] [Indexed: 06/05/2024]
Abstract
To study the roles that different nodes play in differentiating Bayesian networks under two states, such as control versus disease, we formulate two node-specific scores to facilitate such assessment. The first score is motivated by the prediction invariance property of a causal model. The second score results from modifying an existing score constructed for differential analysis of undirected networks. We develop strategies based on these scores to identify nodes responsible for topological differences between two Bayesian networks. Synthetic data and real-life data from designed experiments are used to demonstrate the efficacy of the proposed methods in detecting responsible nodes.
Collapse
Affiliation(s)
- Xianzheng Huang
- Department of Statistics, University of South Carolina, Columbia, South Carolina, USA
| | - Hongmei Zhang
- Division of Epidemiology, Biostatistics, and Environmental Health, School of Public Health, University of Memphis, Memphis, Tennessee
| |
Collapse
|
3
|
Erdem C, Gross SM, Heiser LM, Birtwistle MR. MOBILE pipeline enables identification of context-specific networks and regulatory mechanisms. Nat Commun 2023; 14:3991. [PMID: 37414767 PMCID: PMC10326020 DOI: 10.1038/s41467-023-39729-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2022] [Accepted: 06/27/2023] [Indexed: 07/08/2023] Open
Abstract
Robust identification of context-specific network features that control cellular phenotypes remains a challenge. We here introduce MOBILE (Multi-Omics Binary Integration via Lasso Ensembles) to nominate molecular features associated with cellular phenotypes and pathways. First, we use MOBILE to nominate mechanisms of interferon-γ (IFNγ) regulated PD-L1 expression. Our analyses suggest that IFNγ-controlled PD-L1 expression involves BST2, CLIC2, FAM83D, ACSL5, and HIST2H2AA3 genes, which were supported by prior literature. We also compare networks activated by related family members transforming growth factor-beta 1 (TGFβ1) and bone morphogenetic protein 2 (BMP2) and find that differences in ligand-induced changes in cell size and clustering properties are related to differences in laminin/collagen pathway activity. Finally, we demonstrate the broad applicability and adaptability of MOBILE by analyzing publicly available molecular datasets to investigate breast cancer subtype specific networks. Given the ever-growing availability of multi-omics datasets, we envision that MOBILE will be broadly useful for identification of context-specific molecular features and pathways.
Collapse
Affiliation(s)
- Cemal Erdem
- Department of Chemical and Biomolecular Engineering, Clemson University, Clemson, SC, USA
| | - Sean M Gross
- Department of Biomedical Engineering, Oregon Health & Science University, Portland, OR, USA
| | - Laura M Heiser
- Department of Biomedical Engineering, Oregon Health & Science University, Portland, OR, USA.
| | - Marc R Birtwistle
- Department of Chemical and Biomolecular Engineering, Clemson University, Clemson, SC, USA.
- Department of Bioengineering, Clemson University, Clemson, SC, USA.
| |
Collapse
|
4
|
Racedo S, Portnoy I, Vélez JI, San-Juan-Vergara H, Sanjuan M, Zurek E. A new pipeline for structural characterization and classification of RNA-Seq microbiome data. BioData Min 2021; 14:31. [PMID: 34243809 PMCID: PMC8268467 DOI: 10.1186/s13040-021-00266-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2020] [Accepted: 06/16/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND High-throughput sequencing enables the analysis of the composition of numerous biological systems, such as microbial communities. The identification of dependencies within these systems requires the analysis and assimilation of the underlying interaction patterns between all the variables that make up that system. However, this task poses a challenge when considering the compositional nature of the data coming from DNA-sequencing experiments because traditional interaction metrics (e.g., correlation) produce unreliable results when analyzing relative fractions instead of absolute abundances. The compositionality-associated challenges extend to the classification task, as it usually involves the characterization of the interactions between the principal descriptive variables of the datasets. The classification of new samples/patients into binary categories corresponding to dissimilar biological settings or phenotypes (e.g., control and cases) could help researchers in the development of treatments/drugs. RESULTS Here, we develop and exemplify a new approach, applicable to compositional data, for the classification of new samples into two groups with different biological settings. We propose a new metric to characterize and quantify the overall correlation structure deviation between these groups and a technique for dimensionality reduction to facilitate graphical representation. We conduct simulation experiments with synthetic data to assess the proposed method's classification accuracy. Moreover, we illustrate the performance of the proposed approach using Operational Taxonomic Unit (OTU) count tables obtained through 16S rRNA gene sequencing data from two microbiota experiments. Also, compare our method's performance with that of two state-of-the-art methods. CONCLUSIONS Simulation experiments show that our method achieves a classification accuracy equal to or greater than 98% when using synthetic data. Finally, our method outperforms the other classification methods with real datasets from gene sequencing experiments.
Collapse
Affiliation(s)
| | - Ivan Portnoy
- Universidad del Norte, Barranquilla, Colombia.
- Productivity and Innovation Department, Universidad de la Costa, Calle 58 # 55-56, Barranquilla, Colombia.
| | | | | | | | | |
Collapse
|
5
|
Arbet J, Zhuang Y, Litkowski E, Saba L, Kechris K. Comparing Statistical Tests for Differential Network Analysis of Gene Modules. Front Genet 2021; 12:630215. [PMID: 34093641 PMCID: PMC8170128 DOI: 10.3389/fgene.2021.630215] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2020] [Accepted: 04/19/2021] [Indexed: 11/13/2022] Open
Abstract
Genes often work together to perform complex biological processes, and "networks" provide a versatile framework for representing the interactions between multiple genes. Differential network analysis (DiNA) quantifies how this network structure differs between two or more groups/phenotypes (e.g., disease subjects and healthy controls), with the goal of determining whether differences in network structure can help explain differences between phenotypes. In this paper, we focus on gene co-expression networks, although in principle, the methods studied can be used for DiNA for other types of features (e.g., metabolome, epigenome, microbiome, proteome, etc.). Three common applications of DiNA involve (1) testing whether the connections to a single gene differ between groups, (2) testing whether the connection between a pair of genes differs between groups, or (3) testing whether the connections within a "module" (a subset of 3 or more genes) differs between groups. This article focuses on the latter, as there is a lack of studies comparing statistical methods for identifying differentially co-expressed modules (DCMs). Through extensive simulations, we compare several previously proposed test statistics and a new p-norm difference test (PND). We demonstrate that the true positive rate of the proposed PND test is competitive with and often higher than the other methods, while controlling the false positive rate. The R package discoMod (differentially co-expressed modules) implements the proposed method and provides a full pipeline for identifying DCMs: clustering tools to derive gene modules, tests to identify DCMs, and methods for visualizing the results.
Collapse
Affiliation(s)
- Jaron Arbet
- Department of Biostatistics and Informatics, Colorado School of Public Health, University of Colorado Anschutz Medical Campus, Aurora, CO, United States
| | - Yaxu Zhuang
- Department of Biostatistics and Informatics, Colorado School of Public Health, University of Colorado Anschutz Medical Campus, Aurora, CO, United States
| | - Elizabeth Litkowski
- Department of Epidemiology, Colorado School of Public Health, University of Colorado Anschutz Medical Campus, Aurora, CO, United States
| | - Laura Saba
- Department of Pharmaceutical Sciences, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of Colorado Anschutz Medical Campus, Aurora CO, United States
| | - Katerina Kechris
- Department of Biostatistics and Informatics, Colorado School of Public Health, University of Colorado Anschutz Medical Campus, Aurora, CO, United States
| |
Collapse
|
6
|
Lin W, Ji J, Zhu Y, Li M, Zhao J, Xue F, Yuan Z. PMINR: Pointwise Mutual Information-Based Network Regression - With Application to Studies of Lung Cancer and Alzheimer's Disease. Front Genet 2020; 11:556259. [PMID: 33193633 PMCID: PMC7594515 DOI: 10.3389/fgene.2020.556259] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2020] [Accepted: 08/12/2020] [Indexed: 11/13/2022] Open
Abstract
Complex diseases are believed to be the consequence of intracellular network(s) involving a range of factors. An improved understanding of a disease-predisposing biological network could lead to better identification of genes and pathways that confer disease risk and therefore inform drug development. The group difference in biological networks, as is often characterized by graphs of nodes and edges, is attributable to effects of these nodes and edges. Here we introduced pointwise mutual information (PMI) as a measure of the connection between a pair of nodes with either a linear relationship or nonlinear dependence. We then proposed a PMI-based network regression (PMINR) model to differentiate patterns of network changes (in node or edge) linking a disease outcome. Through simulation studies with various sample sizes and inter-node correlation structures, we showed that PMINR can accurately identify these changes with higher power than current methods and be robust to the network topology. Finally, we illustrated, with publicly available data on lung cancer and gene methylation data on aging and Alzheimer’s disease, an evaluation of the practical performance of PMINR. We concluded that PMI is able to capture the generic inter-node correlation pattern in biological networks, and PMINR is a powerful and efficient approach for biological network analysis.
Collapse
Affiliation(s)
- Weiqiang Lin
- Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan, China
| | - Jiadong Ji
- Department of Data Science, School of Statistics, Shandong University of Finance and Economics, Jinan, China
| | - Yuchen Zhu
- Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan, China
| | - Mingzhuo Li
- Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan, China
| | - Jinghua Zhao
- Cardiovasucular Epidemiology Unit, Department of Public Health and Primary Care, University of Cambridge, Cambridge, United Kingdom
| | - Fuzhong Xue
- Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan, China
| | - Zhongshang Yuan
- Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan, China
| |
Collapse
|
7
|
Al-Harazi O, El Allali A, Colak D. Biomolecular Databases and Subnetwork Identification Approaches of Interest to Big Data Community: An Expert Review. OMICS-A JOURNAL OF INTEGRATIVE BIOLOGY 2020; 23:138-151. [PMID: 30883301 DOI: 10.1089/omi.2018.0205] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
Next-generation sequencing approaches and genome-wide studies have become essential for characterizing the mechanisms of human diseases. Consequently, many researchers have applied these approaches to discover the genetic/genomic causes of common complex and rare human diseases, generating multiomics big data that span the continuum of genomics, proteomics, metabolomics, and many other system science fields. Therefore, there is a significant and unmet need for biological databases and tools that enable and empower the researchers to analyze, integrate, and make sense of big data. There are currently large number of databases that offer different types of biological information. In particular, the integration of gene expression profiles and protein-protein interaction networks provides a deeper understanding of the complex multilayered molecular architecture of human diseases. Therefore, there has been a growing interest in developing methodologies that integrate and contextualize big data from molecular interaction networks to identify biomarkers of human diseases at a subnetwork resolution as well. In this expert review, we provide a comprehensive summary of most popular biomolecular databases for molecular interactions (e.g., Biological General Repository for Interaction Datasets, Kyoto Encyclopedia of Genes and Genomes and Search Tool for The Retrieval of Interacting Genes/Proteins), gene-disease associations (e.g., Online Mendelian Inheritance in Man, Disease-Gene Network, MalaCards), and population-specific databases (e.g., Human Genetic Variation Database), and describe some examples of their usage and potential applications. We also present the most recent subnetwork identification approaches and discuss their main advantages and limitations. As the field of data science continues to emerge, the present analysis offers a deeper and contextualized understanding of the available databases in molecular biomedicine.
Collapse
Affiliation(s)
- Olfat Al-Harazi
- 1 Department of Biostatistics, Epidemiology, and Scientific Computing, King Faisal Specialist Hospital and Research Centre, Riyadh, Saudi Arabia.,2 Computer Science Department, College of Computer and Information Sciences, King Saud University, Riyadh, Saudi Arabia
| | - Achraf El Allali
- 2 Computer Science Department, College of Computer and Information Sciences, King Saud University, Riyadh, Saudi Arabia
| | - Dilek Colak
- 1 Department of Biostatistics, Epidemiology, and Scientific Computing, King Faisal Specialist Hospital and Research Centre, Riyadh, Saudi Arabia
| |
Collapse
|
8
|
Basha O, Argov CM, Artzy R, Zoabi Y, Hekselman I, Alfandari L, Chalifa-Caspi V, Yeger-Lotem E. Differential network analysis of multiple human tissue interactomes highlights tissue-selective processes and genetic disorder genes. Bioinformatics 2020; 36:2821-2828. [DOI: 10.1093/bioinformatics/btaa034] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2019] [Revised: 01/07/2020] [Accepted: 01/16/2020] [Indexed: 01/19/2023] Open
Abstract
Abstract
Motivation
Differential network analysis, designed to highlight network changes between conditions, is an important paradigm in network biology. However, differential network analysis methods have been typically designed to compare between two conditions and were rarely applied to multiple protein interaction networks (interactomes). Importantly, large-scale benchmarks for their evaluation have been lacking.
Results
Here, we present a framework for assessing the ability of differential network analysis of multiple human tissue interactomes to highlight tissue-selective processes and disorders. For this, we created a benchmark of 6499 curated tissue-specific Gene Ontology biological processes. We applied five methods, including four differential network analysis methods, to construct weighted interactomes for 34 tissues. Rigorous assessment of this benchmark revealed that differential analysis methods perform well in revealing tissue-selective processes (AUCs of 0.82–0.9). Next, we applied differential network analysis to illuminate the genes underlying tissue-selective hereditary disorders. For this, we curated a dataset of 1305 tissue-specific hereditary disorders and their manifesting tissues. Focusing on subnetworks containing the top 1% differential interactions in disease-relevant tissue interactomes revealed significant enrichment for disorder-causing genes in 18.6% of the cases, with a significantly high success rate for blood, nerve, muscle and heart diseases.
Summary
Altogether, we offer a framework that includes expansive manually curated datasets of tissue-selective processes and disorders to be used as benchmarks or to illuminate tissue-selective processes and genes. Our results demonstrate that differential analysis of multiple human tissue interactomes is a powerful tool for highlighting processes and genes with tissue-selective functionality and clinical impact.
Availability and implementation
Datasets are available as part of the Supplementary data.
Supplementary information
Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Omer Basha
- Department of Clinical Biochemistry and Pharmacology, Faculty of Health Sciences
| | - Chanan M Argov
- Department of Clinical Biochemistry and Pharmacology, Faculty of Health Sciences
| | - Raviv Artzy
- Department of Clinical Biochemistry and Pharmacology, Faculty of Health Sciences
| | - Yazeed Zoabi
- Department of Clinical Biochemistry and Pharmacology, Faculty of Health Sciences
| | - Idan Hekselman
- Department of Clinical Biochemistry and Pharmacology, Faculty of Health Sciences
| | - Liad Alfandari
- Department of Clinical Biochemistry and Pharmacology, Faculty of Health Sciences
| | - Vered Chalifa-Caspi
- National Institute for Biotechnology in the Negev, Ben-Gurion University of the Negev, Beer-Sheva 84105, Israel
| | - Esti Yeger-Lotem
- Department of Clinical Biochemistry and Pharmacology, Faculty of Health Sciences
- National Institute for Biotechnology in the Negev, Ben-Gurion University of the Negev, Beer-Sheva 84105, Israel
| |
Collapse
|
9
|
Colborne SF, Hondorp DW, Holbrook CM, Lowe MR, Boase JC, Chiotti JA, Wills TC, Roseman EF, Krueger CC. Sequence analysis and acoustic tracking of individual lake sturgeon identify multiple patterns of river–lake habitat use. Ecosphere 2019. [DOI: 10.1002/ecs2.2983] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Affiliation(s)
- Scott F. Colborne
- Department of Fisheries and Wildlife Center for Systems Integration and Sustainability Michigan State University 480 Wilson Road East Lansing Michigan 48824 USA
| | - Darryl W. Hondorp
- Great Lakes Science Center U.S. Geological Survey 1451 Green Road Ann Arbor Michigan 48105 USA
| | - Christopher M. Holbrook
- Great Lakes Science Center U.S. Geological Survey, Hammond Bay Biological Station 11188 Ray Drive Millersburg Michigan 49759 USA
| | - Michael R. Lowe
- Great Lakes Science Center U.S. Geological Survey, Hammond Bay Biological Station 11188 Ray Drive Millersburg Michigan 49759 USA
| | - James C. Boase
- Alpena Fish and Wildlife Conservation Office U.S. Fish and Wildlife Service 480 W. Fletcher Street Alpena Michigan 49707 USA
| | - Justin A. Chiotti
- Alpena Fish and Wildlife Conservation Office U.S. Fish and Wildlife Service 480 W. Fletcher Street Alpena Michigan 49707 USA
| | - Todd C. Wills
- Michigan Department of Natural Resources Lake St. Clair Fisheries Research Station 33135 South River Road Harrison Township Michigan 48045 USA
| | - Edward F. Roseman
- Great Lakes Science Center U.S. Geological Survey 1451 Green Road Ann Arbor Michigan 48105 USA
| | - Charles C. Krueger
- Department of Fisheries and Wildlife Center for Systems Integration and Sustainability Michigan State University 480 Wilson Road East Lansing Michigan 48824 USA
| |
Collapse
|
10
|
Chen H, He Y, Ji J, Shi Y. A Machine Learning Method for Identifying Critical Interactions Between Gene Pairs in Alzheimer's Disease Prediction. Front Neurol 2019; 10:1162. [PMID: 31736866 PMCID: PMC6834789 DOI: 10.3389/fneur.2019.01162] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2019] [Accepted: 10/15/2019] [Indexed: 12/26/2022] Open
Abstract
Background: Alzheimer's disease (AD) is the most common type of dementia. Scientists have discovered that the causes of AD may include a combination of genetic, lifestyle, and environmental factors, but the exact cause has not yet been elucidated. Effective strategies to prevent and treat AD therefore remain elusive. The identified genetic causes of AD mainly focus on individual genes, but growing evidence has shown that complex diseases are usually affected by the interaction of genes in a network. Few studies have focused on the interactions and correlations between genes and how they are gradually destroyed or disappear during AD progression. A differential network analysis has been recognized as an essential tool for identifying the underlying pathogenic mechanisms and significant genes for prediction analysis. We therefore aim to conduct a differential network analysis to reveal potential networks involved in the neuropathogenesis of AD and identify genes for AD prediction. Methods: In this paper, we selected 365 samples from the Religious Orders Study and the Rush Memory and Aging Project, including 193 clinically and neuropathologically confirmed AD subjects and 172 no cognitive impairment (NCI) controls. Then, we selected 158 genes belonging to the AD pathway (hsa05010) of the Kyoto Encyclopedia of Genes and Genomes. We employed a machine learning method, namely, joint density-based non-parametric differential interaction network analysis and classification (JDINAC), in the analysis of gene expression data (RNA-seq data). We searched for the differential networks in the RNA-seq data with a pathological diagnosis of AD. Finally, an optimal prediction model was built through cross-validation, which showed good discrimination and calibration for AD prediction. Results: We used JDINAC to derive a gene co-expression network and to explore the relationship between the interaction of gene pairs and AD, and the top 10 differential gene pairs were identified. We then compared the prediction performance between JDINAC and individual genes based on prediction methods. JDINAC provides better accuracy of classification than the latest methods, such as random forest and penalized logistic regression. Conclusions: The interaction between gene pairs is related to AD and can provide more insight than the individual genes in AD prediction.
Collapse
Affiliation(s)
- Hao Chen
- School of Statistics, Shandong University of Finance and Economics, Jinan, China
| | - Yong He
- School of Statistics, Shandong University of Finance and Economics, Jinan, China
| | - Jiadong Ji
- School of Statistics, Shandong University of Finance and Economics, Jinan, China
| | - Yufeng Shi
- School of Statistics, Shandong University of Finance and Economics, Jinan, China
- Institute for Financial Studies and School of Mathematics, Shandong University, Jinan, China
| |
Collapse
|
11
|
Lim E, Xu H, Wu P, Posner D, Wu J, Peloso GM, Pitsillides AN, DeStefano AL, Adrienne Cupples L, Liu CT. Network analysis of drug effect on triglyceride-associated DNA methylation. BMC Proc 2018; 12:27. [PMID: 30275881 PMCID: PMC6157190 DOI: 10.1186/s12919-018-0130-0] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/05/2022] Open
Abstract
BACKGROUND DNA methylation, an epigenetic modification, can be affected by environmental factors and thus regulate gene expression levels that can lead to alterations of certain phenotypes. Network analysis has been used successfully to discover gene sets that are expressed differently across multiple disease states and suggest possible pathways of disease progression. We applied this framework to compare DNA methylation levels before and after lipid-lowering medication and to identify modules that differ topologically between the two time points, revealing the association between lipid medication and these triglyceride-related methylation sites. METHODS We performed quality control using beta-mixture quantile normalization on 463,995 cytosine-phosphate-guanine (CpG) sites and deleted problematic sites, resulting in 423,004 probes. We identified 14,850 probes that were nominally associated with triglycerides prior to treatment and performed weighted gene correlation network analysis (WGCNA) to construct pre- and posttreatment methylation networks of these probes. We then applied both WGCNA module preservation and generalized Hamming distance (GHD) to identify modules with topological differences between the pre- and posttreatment. For modules with structural changes between 2 time points, we performed pathway-enrichment analysis to gain further insight into the biological function of the genes from these modules. RESULTS Six triglyceride-associated modules were identified using pretreatment methylation probes. The same 3 modules were not preserved in posttreatment data using both the module-preservation and the GHD methods. Top-enriched pathways for the 3 differentially methylated modules are sphingolipid signaling pathway, proteoglycans in cancer, and metabolic pathways (p values < 0.005). One module in particular included an enrichment of lipid-related pathways among the top results. CONCLUSIONS The same 3 modules, which were differentially methylated between pre- and posttreatment, were identified using both WGCNA module-preservation and GHD methods. Pathway analysis revealed that triglyceride-associated modules contain groups of genes that are involved in lipid signaling and metabolism. These 3 modules may provide insight into the effect of fenofibrate on changes in triglyceride levels and these methylation sites.
Collapse
Affiliation(s)
- Elise Lim
- Department of Biostatistics, Boston University, 801 Massachusetts Avenue 3rd Floor, Boston, MA 02118 USA
| | - Hanfei Xu
- Department of Biostatistics, Boston University, 801 Massachusetts Avenue 3rd Floor, Boston, MA 02118 USA
| | - Peitao Wu
- Department of Biostatistics, Boston University, 801 Massachusetts Avenue 3rd Floor, Boston, MA 02118 USA
| | - Daniel Posner
- Department of Biostatistics, Boston University, 801 Massachusetts Avenue 3rd Floor, Boston, MA 02118 USA
| | - Jiayi Wu
- Department of Genetics and Genomics, Boston University, 72 East Concord Street, Boston, MA 02118 USA
| | - Gina M. Peloso
- Department of Biostatistics, Boston University, 801 Massachusetts Avenue 3rd Floor, Boston, MA 02118 USA
| | - Achilleas N. Pitsillides
- Department of Biostatistics, Boston University, 801 Massachusetts Avenue 3rd Floor, Boston, MA 02118 USA
| | - Anita L. DeStefano
- Department of Biostatistics, Boston University, 801 Massachusetts Avenue 3rd Floor, Boston, MA 02118 USA
| | - L. Adrienne Cupples
- Department of Biostatistics, Boston University, 801 Massachusetts Avenue 3rd Floor, Boston, MA 02118 USA
| | - Ching-Ti Liu
- Department of Biostatistics, Boston University, 801 Massachusetts Avenue 3rd Floor, Boston, MA 02118 USA
| |
Collapse
|
12
|
Abstract
BACKGROUND Longitudinal data and repeated measurements in epigenome-wide association studies (EWAS) provide a rich resource for understanding epigenetics. We summarize 7 analytical approaches to the GAW20 data sets that addressed challenges and potential applications of phenotypic and epigenetic data. All contributions used the GAW20 real data set and employed either linear mixed effect (LME) models or marginal models through generalized estimating equations (GEE). These contributions were subdivided into 3 categories: (a) quality control (QC) methods for DNA methylation data; (b) heritability estimates pretreatment and posttreatment with fenofibrate; and (c) impact of drug response pretreatment and posttreatment with fenofibrate on DNA methylation and blood lipids. RESULTS Two contributions addressed QC and identified large statistical differences with pretreatment and posttreatment DNA methylation, possibly a result of batch effects. Two contributions compared epigenome-wide heritability estimates pretreatment and posttreatment, with one employing a Bayesian LME and the other using a variance-component LME. Density curves comparing these studies indicated these heritability estimates were similar. Another contribution used a variance-component LME to depict the proportion of heritability resulting from a genetic and shared environment. By including environmental exposures as random effects, the authors found heritability estimates became more stable but not significantly different. Two contributions investigated treatment response. One estimated drug-associated methylation effects on triglyceride levels as the response, and identified 11 significant cytosine-phosphate-guanine (CpG) sites with or without adjusting for high-density lipoprotein. The second contribution performed weighted gene coexpression network analysis and identified 6 significant modules of at least 30 CpG sites, including 3 modules with topological differences pretreatment and posttreatment. CONCLUSIONS Four conclusions from this GAW20 working group are: (a) QC measures are an important consideration for EWAS studies that are investigating multiple time points or repeated measurements; (b) application of heritability estimates between time points for individual CpG sites is a useful QC measure for DNA methylation studies; (c) drug intervention demonstrated strong epigenome-wide DNA methylation patterns across the 2 time points; and (d) new statistical methods are required to account for the environmental contributions of DNA methylation across time. These contributions demonstrate numerous opportunities exist for the analysis of longitudinal data in future epigenetic studies.
Collapse
Affiliation(s)
- Haakon E. Nustad
- Department of Medical Genetics, Oslo University Hospital, Kirkeveien 166, 0450 Oslo, Norway
- Faculty of Medicine, University of Oslo, Klaus Torgårds vei 3, 0372 Oslo, Norway
- PharmaTox Strategic Research Initiative, University of Oslo, Sem Sælands vei 3, 0371 Oslo, Norway
| | - Marcio Almeida
- South Texas Diabetes and Obesity Institute, University of Texas Rio Grande Valley School of Medicine, One West University Blvd., STDOI Modular Building #100, Brownsville, TX 78520 USA
| | - Angelo J. Canty
- Department of Mathematics and Statistics, McMaster University, 1280 Main St. W, Hamilton, ON L8S 4K1 Canada
| | - Marissa LeBlanc
- Oslo Centre for Biostatistics and Epidemiology, Oslo University Hospital, Klaus Torgårds vei 3, 0372 Oslo, Norway
| | - Christian M. Page
- Oslo Centre for Biostatistics and Epidemiology, Oslo University Hospital, Klaus Torgårds vei 3, 0372 Oslo, Norway
- Department of Non-communicable disease, Norwegian Institute of Public Health, Marcus Thranes Gate 6, 0473 Oslo, Norway
| | - Phillip E. Melton
- Curtin/UWA Centre for Genetic Origins of Health and Disease, School of Pharmacy and Biomedical Sciences, Curtin University and the University of Western Australia, 35 Stirling Hwy. (M409), Crawley, WA 6009 Australia
| |
Collapse
|
13
|
Ji J, He D, Feng Y, He Y, Xue F, Xie L. JDINAC: joint density-based non-parametric differential interaction network analysis and classification using high-dimensional sparse omics data. Bioinformatics 2018; 33:3080-3087. [PMID: 28582486 DOI: 10.1093/bioinformatics/btx360] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2016] [Accepted: 06/01/2017] [Indexed: 12/26/2022] Open
Abstract
Motivation A complex disease is usually driven by a number of genes interwoven into networks, rather than a single gene product. Network comparison or differential network analysis has become an important means of revealing the underlying mechanism of pathogenesis and identifying clinical biomarkers for disease classification. Most studies, however, are limited to network correlations that mainly capture the linear relationship among genes, or rely on the assumption of a parametric probability distribution of gene measurements. They are restrictive in real application. Results We propose a new Joint density based non-parametric Differential Interaction Network Analysis and Classification (JDINAC) method to identify differential interaction patterns of network activation between two groups. At the same time, JDINAC uses the network biomarkers to build a classification model. The novelty of JDINAC lies in its potential to capture non-linear relations between molecular interactions using high-dimensional sparse data as well as to adjust confounding factors, without the need of the assumption of a parametric probability distribution of gene measurements. Simulation studies demonstrate that JDINAC provides more accurate differential network estimation and lower classification error than that achieved by other state-of-the-art methods. We apply JDINAC to a Breast Invasive Carcinoma dataset, which includes 114 patients who have both tumor and matched normal samples. The hub genes and differential interaction patterns identified were consistent with existing experimental studies. Furthermore, JDINAC discriminated the tumor and normal sample with high accuracy by virtue of the identified biomarkers. JDINAC provides a general framework for feature selection and classification using high-dimensional sparse omics data. Availability and implementation R scripts available at https://github.com/jijiadong/JDINAC. Contact lxie@iscb.org. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jiadong Ji
- Department of Mathematical Statistics, School of Statistics, Shandong University of Finance and Economics, Jinan 250014, China
| | - Di He
- Ph.D. Program in Computer Science, The Graduate Center, The City University of New York, New York, NY 10016, USA
| | - Yang Feng
- Department of Statistics, Columbia University, New York, NY 10027, USA
| | - Yong He
- Department of Mathematical Statistics, School of Statistics, Shandong University of Finance and Economics, Jinan 250014, China
| | - Fuzhong Xue
- Department of Biostatistics, School of Public Health, Shandong University, Jinan 250012, China
| | - Lei Xie
- Ph.D. Program in Computer Science, The Graduate Center, The City University of New York, New York, NY 10016, USA.,Department of Computer Science, Hunter College, The City University of New York, NY 10065, USA
| |
Collapse
|
14
|
Ebbels TMD, Rodriguez-Martinez A, Dumas ME, Keun HC. Advances in Computational Analysis of Metabolomic NMR Data. NMR-BASED METABOLOMICS 2018. [DOI: 10.1039/9781782627937-00310] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/30/2023]
Abstract
In this chapter we discuss some of the more recent developments in preprocessing and statistical analysis of NMR spectra in metabolomics. Bayesian methods for analyzing NMR spectra are summarized and we describe one particular approach, BATMAN, in more detail. We consider techniques based on statistical associations, such as correlation spectroscopy (e.g. STOCSY and recent variants), as well as approaches that model the associations as a network and how these change under different biological conditions. The link between metabolism and genotype is explored by looking at metabolic GWAS and related techniques. Finally, we describe the relevance and current status of data standards for NMR metabolomics.
Collapse
Affiliation(s)
- Timothy M. D. Ebbels
- Division of Computational and Systems Medicine, Department of Surgery and Cancer, Faculty of Medicine, Imperial College London London SW7 2AZ UK
| | - Andrea Rodriguez-Martinez
- Division of Computational and Systems Medicine, Department of Surgery and Cancer, Faculty of Medicine, Imperial College London London SW7 2AZ UK
| | - Marc-Emmanuel Dumas
- Division of Computational and Systems Medicine, Department of Surgery and Cancer, Faculty of Medicine, Imperial College London London SW7 2AZ UK
| | - Hector C. Keun
- Division of Cancer, Department of Surgery and Cancer, Faculty of Medicine, Imperial College London London W12 0NN UK
| |
Collapse
|
15
|
Liu L, Ruan J. Utilizing networks for differential analysis of chromatin interactions. J Bioinform Comput Biol 2017; 15:1740008. [PMID: 29113562 DOI: 10.1142/s021972001740008x] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Chromatin conformation capture with high-throughput sequencing (Hi-C) is a powerful technique to detect genome-wide chromatin interactions. In this paper, we introduce two novel approaches to detect differentially interacting genomic regions between two Hi-C experiments using a network model. To make input data from multiple experiments comparable, we propose a normalization strategy guided by network topological properties. We then devise two measurements, using local and global connectivity information from the chromatin interaction networks, respectively, to assess the interaction differences between two experiments. When multiple replicates are present in experiments, our approaches provide the flexibility for users to either pool all replicates together to therefore increase the network coverage, or to use the replicates in parallel to increase the signal to noise ratio. We show that while the local method works better in detecting changes from simulated networks, the global method performs better on real Hi-C data. The local and global methods, regardless of pooling, are always superior to two existing methods. Furthermore, our methods work well on both unweighted and weighted networks and our normalization strategy significantly improves the performance compared with raw networks without normalization. Therefore, we believe our methods will be useful for identifying differentially interacting genomic regions.
Collapse
Affiliation(s)
- Lu Liu
- * College of Information Technology and Engineering, Marshall University, One John Marshall Drive, Huntington, WV 25755, USA
| | - Jianhua Ruan
- † Department of Computer Science, The University of Texas at San Antonio, One UTSA Circle, San Antonio, Texas 78249, USA
| |
Collapse
|
16
|
Wu T, Wang Y, Jiang R, Lu X, Tian J. A pathways-based prediction model for classifying breast cancer subtypes. Oncotarget 2017; 8:58809-58822. [PMID: 28938599 PMCID: PMC5601695 DOI: 10.18632/oncotarget.18544] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2016] [Accepted: 05/01/2017] [Indexed: 11/25/2022] Open
Abstract
Breast cancer is highly heterogeneous and is classified into four subtypes characterized by specific biological traits, treatment responses, and clinical prognoses. We performed a systemic analysis of 698 breast cancer patient samples from The Cancer Genome Atlas project database. We identified 136 breast cancer genes differentially expressed among the four subtypes. Based on unsupervised clustering analysis, these 136 core genes efficiently categorized breast cancer patients into the appropriate subtypes. Functional enrichment based on Kyoto Encyclopedia of Genes and Genomes analysis identified six functional pathways regulated by these genes: JAK-STAT signaling, basal cell carcinoma, inflammatory mediator regulation of TRP channels, non-small cell lung cancer, glutamatergic synapse, and amyotrophic lateral sclerosis. Three support vector machine (SVM) classification models based on the identified pathways effectively classified different breast cancer subtypes, suggesting that breast cancer subtype-specific risk assessment based on disease pathways could be a potentially valuable approach. Our analysis not only provides insight into breast cancer subtype-specific mechanisms, but also may improve the accuracy of SVM classification models.
Collapse
Affiliation(s)
- Tong Wu
- Department of Ultrasound, The Second Affiliated Hospital of Harbin Medical University, Heilongjiang Province, China
| | - Yunfeng Wang
- College of Bioinformatics Science and Technology, Harbin Medical University, Heilongjiang Province, China
| | - Ronghui Jiang
- Department of Surgery, Yanbian No.2 People's Hospital, Jilin Province, China
| | - Xinliang Lu
- Institute of Immunology, Zhejiang University School of Medicine, Zhejiang Province, China
| | - Jiawei Tian
- Department of Ultrasound, The Second Affiliated Hospital of Harbin Medical University, Heilongjiang Province, China
| |
Collapse
|
17
|
Will T, Helms V. Rewiring of the inferred protein interactome during blood development studied with the tool PPICompare. BMC SYSTEMS BIOLOGY 2017; 11:44. [PMID: 28376810 PMCID: PMC5379774 DOI: 10.1186/s12918-017-0400-x] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/05/2016] [Accepted: 01/26/2017] [Indexed: 12/24/2022]
Abstract
BACKGROUND Differential analysis of cellular conditions is a key approach towards understanding the consequences and driving causes behind biological processes such as developmental transitions or diseases. The progress of whole-genome expression profiling enabled to conveniently capture the state of a cell's transcriptome and to detect the characteristic features that distinguish cells in specific conditions. In contrast, mapping the physical protein interactome for many samples is experimentally infeasible at the moment. For the understanding of the whole system, however, it is equally important how the interactions of proteins are rewired between cellular states. To overcome this deficiency, we recently showed how condition-specific protein interaction networks that even consider alternative splicing can be inferred from transcript expression data. Here, we present the differential network analysis tool PPICompare that was specifically designed for isoform-sensitive protein interaction networks. RESULTS Besides detecting significant rewiring events between the interactomes of grouped samples, PPICompare infers which alterations to the transcriptome caused each rewiring event and what is the minimal set of alterations necessary to explain all between-group changes. When applied to the development of blood cells, we verified that a reasonable amount of rewiring events were reported by the tool and found that differential gene expression was the major determinant of cellular adjustments to the interactome. Alternative splicing events were consistently necessary in each developmental step to explain all significant alterations and were especially important for rewiring in the context of transcriptional control. CONCLUSIONS Applying PPICompare enabled us to investigate the dynamics of the human protein interactome during developmental transitions. A platform-independent implementation of the tool PPICompare is available at https://sourceforge.net/projects/ppicompare/ .
Collapse
Affiliation(s)
- Thorsten Will
- Center for Bioinformatics, Saarland University, Campus E2.1, Saarbrücken, 66123 Germany
- Graduate School of Computer Science, Saarland University, Campus E1.3, Saarbrücken, 66123 Germany
| | - Volkhard Helms
- Center for Bioinformatics, Saarland University, Campus E2.1, Saarbrücken, 66123 Germany
| |
Collapse
|
18
|
Mall R, Cerulo L, Bensmail H, Iavarone A, Ceccarelli M. Detection of statistically significant network changes in complex biological networks. BMC SYSTEMS BIOLOGY 2017; 11:32. [PMID: 28259158 PMCID: PMC5336651 DOI: 10.1186/s12918-017-0412-6] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/23/2016] [Accepted: 02/22/2017] [Indexed: 01/10/2023]
Abstract
Background Biological networks contribute effectively to unveil the complex structure of molecular interactions and to discover driver genes especially in cancer context. It can happen that due to gene mutations, as for example when cancer progresses, the gene expression network undergoes some amount of localized re-wiring. The ability to detect statistical relevant changes in the interaction patterns induced by the progression of the disease can lead to the discovery of novel relevant signatures. Several procedures have been recently proposed to detect sub-network differences in pairwise labeled weighted networks. Methods In this paper, we propose an improvement over the state-of-the-art based on the Generalized Hamming Distance adopted for evaluating the topological difference between two networks and estimating its statistical significance. The proposed procedure exploits a more effective model selection criteria to generate p-values for statistical significance and is more efficient in terms of computational time and prediction accuracy than literature methods. Moreover, the structure of the proposed algorithm allows for a faster parallelized implementation. Results In the case of dense random geometric networks the proposed approach is 10-15x faster and achieves 5-10% higher AUC, Precision/Recall, and Kappa value than the state-of-the-art. We also report the application of the method to dissect the difference between the regulatory networks of IDH-mutant versus IDH-wild-type glioma cancer. In such a case our method is able to identify some recently reported master regulators as well as novel important candidates. Conclusions We show that our network differencing procedure can effectively and efficiently detect statistical significant network re-wirings in different conditions. When applied to detect the main differences between the networks of IDH-mutant and IDH-wild-type glioma tumors, it correctly selects sub-networks centered on important key regulators of these two different subtypes. In addition, its application highlights several novel candidates that cannot be detected by standard single network-based approaches. Electronic supplementary material The online version of this article (doi:10.1186/s12918-017-0412-6) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Raghvendra Mall
- QCRI - Qatar Computing Research Institute, HBKU, Doha, Qatar.
| | - Luigi Cerulo
- Department of Science and Technology, University of Sannio, Benevento, Italy.,BioGeM, Institute of Genetic Research "Gaetano Salvatore", Ariano Irpino (AV), Italy
| | - Halima Bensmail
- QCRI - Qatar Computing Research Institute, HBKU, Doha, Qatar
| | - Antonio Iavarone
- Department of Neurology, Department of Pathology, Institute for Cancer Genetics, Columbia University Medical Center, New York, USA
| | - Michele Ceccarelli
- QCRI - Qatar Computing Research Institute, HBKU, Doha, Qatar. .,Department of Science and Technology, University of Sannio, Benevento, Italy.
| |
Collapse
|
19
|
Martin AJM, Dominguez C, Contreras-Riquelme S, Holmes DS, Perez-Acle T. Graphlet Based Metrics for the Comparison of Gene Regulatory Networks. PLoS One 2016; 11:e0163497. [PMID: 27695050 PMCID: PMC5047442 DOI: 10.1371/journal.pone.0163497] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2016] [Accepted: 09/10/2016] [Indexed: 11/18/2022] Open
Abstract
Understanding the control of gene expression remains one of the main challenges in the post-genomic era. Accordingly, a plethora of methods exists to identify variations in gene expression levels. These variations underlay almost all relevant biological phenomena, including disease and adaptation to environmental conditions. However, computational tools to identify how regulation changes are scarce. Regulation of gene expression is usually depicted in the form of a gene regulatory network (GRN). Structural changes in a GRN over time and conditions represent variations in the regulation of gene expression. Like other biological networks, GRNs are composed of basic building blocks called graphlets. As a consequence, two new metrics based on graphlets are proposed in this work: REConstruction Rate (REC) and REC Graphlet Degree (RGD). REC determines the rate of graphlet similarity between different states of a network and RGD identifies the subset of nodes with the highest topological variation. In other words, RGD discerns how th GRN was rewired. REC and RGD were used to compare the local structure of nodes in condition-specific GRNs obtained from gene expression data of Escherichia coli, forming biofilms and cultured in suspension. According to our results, most of the network local structure remains unaltered in the two compared conditions. Nevertheless, changes reported by RGD necessarily imply that a different cohort of regulators (i.e. transcription factors (TFs)) appear on the scene, shedding light on how the regulation of gene expression occurs when E. coli transits from suspension to biofilm. Consequently, we propose that both metrics REC and RGD should be adopted as a quantitative approach to conduct differential analyses of GRNs. A tool that implements both metrics is available as an on-line web server (http://dlab.cl/loto).
Collapse
Affiliation(s)
- Alberto J. M. Martin
- Computational Biology Lab, Fundación Ciencia & Vida, Santiago, Chile
- Centro Interdisciplinario de Neurociencia de Valparaíso, Universidad de Valparaíso, Chile
- * E-mail: (AJMM); (TPA)
| | - Calixto Dominguez
- Computational Biology Lab, Fundación Ciencia & Vida, Santiago, Chile
- Center for Bioinformatics and Genome Biology, Fundación Ciencia & Vida and Facultad de Ciencias Biologicas, Universidad Andres Bello, Santiago, Chile
| | | | - David S. Holmes
- Center for Bioinformatics and Genome Biology, Fundación Ciencia & Vida and Facultad de Ciencias Biologicas, Universidad Andres Bello, Santiago, Chile
| | - Tomas Perez-Acle
- Computational Biology Lab, Fundación Ciencia & Vida, Santiago, Chile
- Centro Interdisciplinario de Neurociencia de Valparaíso, Universidad de Valparaíso, Chile
- * E-mail: (AJMM); (TPA)
| |
Collapse
|
20
|
Kusonmano K. Gene Expression Analysis Through Network Biology: Bioinformatics Approaches. ADVANCES IN BIOCHEMICAL ENGINEERING/BIOTECHNOLOGY 2016; 160:15-32. [PMID: 27830311 DOI: 10.1007/10_2016_44] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
Abstract
Following the availability of high-throughput technologies, vast amounts of biological data have been generated. Gene expression is one example of the popular data that has been utilized for studying cellular systems in the transcriptional level. Several bioinformatics approaches have been developed to analyze such data. A typical expression analysis identifies a ranked list of individual significant differentially expressed genes between two conditions of interest. However, it has been accepted that biomolecules in a living organism are working together and interacting with each other. Study through network analysis could be complementary to typical expression analysis and provides more contexts to understanding the biological systems. Conversely, expression data could provide clues to functional links between biomolecules in biological networks. In this chapter, bioinformatics approaches to analyze expression data in network levels including basic concepts of network biology are described. Different concepts to integrate expression data with interactome data and example studies are explained.
Collapse
Affiliation(s)
- Kanthida Kusonmano
- Bioinformatics and Systems Biology Program, School of Bioresources and Technology, King Mongkut's University of Technology Thonburi, Bangkhuntien, Bangkok, Thailand.
| |
Collapse
|