1
|
Zitnik M, Li MM, Wells A, Glass K, Morselli Gysi D, Krishnan A, Murali TM, Radivojac P, Roy S, Baudot A, Bozdag S, Chen DZ, Cowen L, Devkota K, Gitter A, Gosline SJC, Gu P, Guzzi PH, Huang H, Jiang M, Kesimoglu ZN, Koyuturk M, Ma J, Pico AR, Pržulj N, Przytycka TM, Raphael BJ, Ritz A, Sharan R, Shen Y, Singh M, Slonim DK, Tong H, Yang XH, Yoon BJ, Yu H, Milenković T. Current and future directions in network biology. BIOINFORMATICS ADVANCES 2024; 4:vbae099. [PMID: 39143982 PMCID: PMC11321866 DOI: 10.1093/bioadv/vbae099] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/27/2023] [Revised: 05/31/2024] [Accepted: 07/08/2024] [Indexed: 08/16/2024]
Abstract
Summary Network biology is an interdisciplinary field bridging computational and biological sciences that has proved pivotal in advancing the understanding of cellular functions and diseases across biological systems and scales. Although the field has been around for two decades, it remains nascent. It has witnessed rapid evolution, accompanied by emerging challenges. These stem from various factors, notably the growing complexity and volume of data together with the increased diversity of data types describing different tiers of biological organization. We discuss prevailing research directions in network biology, focusing on molecular/cellular networks but also on other biological network types such as biomedical knowledge graphs, patient similarity networks, brain networks, and social/contact networks relevant to disease spread. In more detail, we highlight areas of inference and comparison of biological networks, multimodal data integration and heterogeneous networks, higher-order network analysis, machine learning on networks, and network-based personalized medicine. Following the overview of recent breakthroughs across these five areas, we offer a perspective on future directions of network biology. Additionally, we discuss scientific communities, educational initiatives, and the importance of fostering diversity within the field. This article establishes a roadmap for an immediate and long-term vision for network biology. Availability and implementation Not applicable.
Collapse
Affiliation(s)
- Marinka Zitnik
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, United States
| | - Michelle M Li
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, United States
| | - Aydin Wells
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN 46556, United States
- Lucy Family Institute for Data and Society, University of Notre Dame, Notre Dame, IN 46556, United States
- Eck Institute for Global Health, University of Notre Dame, Notre Dame, IN 46556, United States
| | - Kimberly Glass
- Channing Division of Network Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA 02115, United States
| | - Deisy Morselli Gysi
- Channing Division of Network Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA 02115, United States
- Department of Statistics, Federal University of Paraná, Curitiba, Paraná 81530-015, Brazil
- Department of Physics, Northeastern University, Boston, MA 02115, United States
| | - Arjun Krishnan
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, United States
| | - T M Murali
- Department of Computer Science, Virginia Tech, Blacksburg, VA 24061, United States
| | - Predrag Radivojac
- Khoury College of Computer Sciences, Northeastern University, Boston, MA 02115, United States
| | - Sushmita Roy
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI 53715, United States
- Wisconsin Institute for Discovery, Madison, WI 53715, United States
| | - Anaïs Baudot
- Aix Marseille Université, INSERM, MMG, Marseille, France
| | - Serdar Bozdag
- Department of Computer Science and Engineering, University of North Texas, Denton, TX 76203, United States
- Department of Mathematics, University of North Texas, Denton, TX 76203, United States
| | - Danny Z Chen
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN 46556, United States
| | - Lenore Cowen
- Department of Computer Science, Tufts University, Medford, MA 02155, United States
| | - Kapil Devkota
- Department of Computer Science, Tufts University, Medford, MA 02155, United States
| | - Anthony Gitter
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI 53715, United States
- Morgridge Institute for Research, Madison, WI 53715, United States
| | - Sara J C Gosline
- Biological Sciences Division, Pacific Northwest National Laboratory, Seattle, WA 98109, United States
| | - Pengfei Gu
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN 46556, United States
| | - Pietro H Guzzi
- Department of Medical and Surgical Sciences, University Magna Graecia of Catanzaro, Catanzaro, 88100, Italy
| | - Heng Huang
- Department of Computer Science, University of Maryland College Park, College Park, MD 20742, United States
| | - Meng Jiang
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN 46556, United States
| | - Ziynet Nesibe Kesimoglu
- Department of Computer Science and Engineering, University of North Texas, Denton, TX 76203, United States
- National Center of Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20814, United States
| | - Mehmet Koyuturk
- Department of Computer and Data Sciences, Case Western Reserve University, Cleveland, OH 44106, United States
| | - Jian Ma
- Ray and Stephanie Lane Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, United States
| | - Alexander R Pico
- Institute of Data Science and Biotechnology, Gladstone Institutes, San Francisco, CA 94158, United States
| | - Nataša Pržulj
- Department of Computer Science, University College London, London, WC1E 6BT, England
- ICREA, Catalan Institution for Research and Advanced Studies, Barcelona, 08010, Spain
- Barcelona Supercomputing Center (BSC), Barcelona, 08034, Spain
| | - Teresa M Przytycka
- National Center of Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20814, United States
| | - Benjamin J Raphael
- Department of Computer Science, Princeton University, Princeton, NJ 08544, United States
| | - Anna Ritz
- Department of Biology, Reed College, Portland, OR 97202, United States
| | - Roded Sharan
- School of Computer Science, Tel Aviv University, Tel Aviv, 69978, Israel
| | - Yang Shen
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77843, United States
| | - Mona Singh
- Department of Computer Science, Princeton University, Princeton, NJ 08544, United States
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08544, United States
| | - Donna K Slonim
- Department of Computer Science, Tufts University, Medford, MA 02155, United States
| | - Hanghang Tong
- Department of Computer Science, University of Illinois Urbana-Champaign, Urbana, IL 61801, United States
| | - Xinan Holly Yang
- Department of Pediatrics, University of Chicago, Chicago, IL 60637, United States
| | - Byung-Jun Yoon
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77843, United States
- Computational Science Initiative, Brookhaven National Laboratory, Upton, NY 11973, United States
| | - Haiyuan Yu
- Department of Computational Biology, Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, NY 14853, United States
| | - Tijana Milenković
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN 46556, United States
- Lucy Family Institute for Data and Society, University of Notre Dame, Notre Dame, IN 46556, United States
- Eck Institute for Global Health, University of Notre Dame, Notre Dame, IN 46556, United States
| |
Collapse
|
2
|
Li Q, Newaz K, Milenković T. Towards future directions in data-integrative supervised prediction of human aging-related genes. BIOINFORMATICS ADVANCES 2022; 2:vbac081. [PMID: 36699345 PMCID: PMC9710570 DOI: 10.1093/bioadv/vbac081] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/25/2022] [Revised: 09/23/2022] [Accepted: 10/31/2022] [Indexed: 11/13/2022]
Abstract
Motivation Identification of human genes involved in the aging process is critical due to the incidence of many diseases with age. A state-of-the-art approach for this purpose infers a weighted dynamic aging-specific subnetwork by mapping gene expression (GE) levels at different ages onto the protein-protein interaction network (PPIN). Then, it analyzes this subnetwork in a supervised manner by training a predictive model to learn how network topologies of known aging- versus non-aging-related genes change across ages. Finally, it uses the trained model to predict novel aging-related gene candidates. However, the best current subnetwork resulting from this approach still yields suboptimal prediction accuracy. This could be because it was inferred using outdated GE and PPIN data. Here, we evaluate whether analyzing a weighted dynamic aging-specific subnetwork inferred from newer GE and PPIN data improves prediction accuracy upon analyzing the best current subnetwork inferred from outdated data. Results Unexpectedly, we find that not to be the case. To understand this, we perform aging-related pathway and Gene Ontology term enrichment analyses. We find that the suboptimal prediction accuracy, regardless of which GE or PPIN data is used, may be caused by the current knowledge about which genes are aging-related being incomplete, or by the current methods for inferring or analyzing an aging-specific subnetwork being unable to capture all of the aging-related knowledge. These findings can potentially guide future directions towards improving supervised prediction of aging-related genes via -omics data integration. Availability and implementation All data and code are available at zenodo, DOI: 10.5281/zenodo.6995045. Supplementary information Supplementary data are available at Bioinformatics Advances online.
Collapse
Affiliation(s)
- Qi Li
- Department of Computer Science and Engineering, Lucy Family Institute for Data & Society, and Eck Institute for Global Health (EIGH), University of Notre Dame, Notre Dame, IN 46556, USA
| | - Khalique Newaz
- Department of Computer Science and Engineering, Lucy Family Institute for Data & Society, and Eck Institute for Global Health (EIGH), University of Notre Dame, Notre Dame, IN 46556, USA,Center for Data and Computing in Natural Sciences (CDCS), Institute for Computational Systems Biology, Universität Hamburg, Hamburg 20146, Germany
| | | |
Collapse
|
3
|
Corominas GR, Blesa MJ, Blum C. AntNetAlign: Ant colony optimization for Network Alignment. Appl Soft Comput 2022. [DOI: 10.1016/j.asoc.2022.109832] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
|
4
|
Carrasco-Santano I, Vega-Rodríguez MA. MOMEA: Multi-Objective Mutation-based Evolutionary Algorithm for the alignment of protein networks. Appl Soft Comput 2022. [DOI: 10.1016/j.asoc.2022.109366] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
|
5
|
Wang S, Atkinson GRS, Hayes WB. SANA: cross-species prediction of Gene Ontology GO annotations via topological network alignment. NPJ Syst Biol Appl 2022; 8:25. [PMID: 35859153 PMCID: PMC9300714 DOI: 10.1038/s41540-022-00232-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2019] [Accepted: 05/20/2022] [Indexed: 12/31/2022] Open
Abstract
Topological network alignment aims to align two networks node-wise in order to maximize the observed common connection (edge) topology between them. The topological alignment of two protein-protein interaction (PPI) networks should thus expose protein pairs with similar interaction partners allowing, for example, the prediction of common Gene Ontology (GO) terms. Unfortunately, no network alignment algorithm based on topology alone has been able to achieve this aim, though those that include sequence similarity have seen some success. We argue that this failure of topology alone is due to the sparsity and incompleteness of the PPI network data of almost all species, which provides the network topology with a small signal-to-noise ratio that is effectively swamped when sequence information is added to the mix. Here we show that the weak signal can be detected using multiple stochastic samples of "good" topological network alignments, which allows us to observe regions of the two networks that are robustly aligned across multiple samples. The resulting network alignment frequency (NAF) strongly correlates with GO-based Resnik semantic similarity and enables the first successful cross-species predictions of GO terms based on topology-only network alignments. Our best predictions have an AUPR of about 0.4, which is competitive with state-of-the-art algorithms, even when there is no observable sequence similarity and no known homology relationship. While our results provide only a "proof of concept" on existing network data, we hypothesize that predicting GO terms from topology-only network alignments will become increasingly practical as the volume and quality of PPI network data increase.
Collapse
Affiliation(s)
- Siyue Wang
- Department of Computer Science, University of California, Irvine, CA, 92697-3435, USA
| | - Giles R S Atkinson
- Department of Computer Science, University of California, Irvine, CA, 92697-3435, USA
| | - Wayne B Hayes
- Department of Computer Science, University of California, Irvine, CA, 92697-3435, USA.
| |
Collapse
|
6
|
Li Q, Milenkovic T. Supervised Prediction of Aging-Related Genes From a Context-Specific Protein Interaction Subnetwork. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:2484-2498. [PMID: 33929964 DOI: 10.1109/tcbb.2021.3076961] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Human aging is linked to many prevalent diseases. The aging process is highly influenced by genetic factors. Hence, it is important to identify human aging-related genes. We focus on supervised prediction of such genes. Gene expression-based methods for this purpose study genes in isolation from each other. While protein-protein interaction (PPI) network-based methods for this purpose account for interactions between genes' protein products, current PPI network data are context-unspecific, spanning different biological conditions. Instead, here, we focus on an aging-specific subnetwork of the entire PPI network, obtained by integrating aging-specific gene expression data and PPI network data. The potential of such data integration has been recognized but mostly in the context of cancer. So, we are the first to propose a supervised learning framework for predicting aging-related genes from an aging-specific PPI subnetwork. In a systematic and comprehensive evaluation, we find that in many of the evaluation tests: (i) using an aging-specific subnetwork indeed yields more accurate aging-related gene predictions than using the entire network, and (ii) predictive methods from our framework that have not previously been used for supervised prediction of aging-related genes outperform existing prominent methods for the same purpose. These results justify the need for our framework.
Collapse
|
7
|
Wang S, Chen X, Frederisy BJ, Mbakogu BA, Kanne AD, Khosravi P, Hayes WB. On the current failure-but bright future-of topology-driven biological network alignment. ADVANCES IN PROTEIN CHEMISTRY AND STRUCTURAL BIOLOGY 2022; 131:1-44. [PMID: 35871888 DOI: 10.1016/bs.apcsb.2022.05.005] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/26/2023]
Abstract
Since the function of a protein is defined by its interaction partners, and since we expect similar interaction patterns across species, the alignment of protein-protein interaction (PPI) networks between species, based on network topology alone, should uncover functionally related proteins across species. Surprisingly, despite the publication of more than fifty algorithms aimed at performing PPI network alignment, few have demonstrated a statistically significant link between network topology and functional similarity, and none have demonstrated that orthologs can be recovered using network topology alone. We find that the major contributing factors to this surprising failure are: (i) edge densities in most currently available experimental PPI networks are demonstrably too low to expect topological network alignment to succeed; (ii) in the few cases where the edge densities are high enough, some measures of topological similarity easily uncover functionally similar proteins while others do not; and (iii) most network alignment algorithms to date perform poorly at optimizing even their own topological objective functions, hampering their ability to use topology effectively. We demonstrate that SANA-the Simulated Annealing Network Aligner-significantly outperforms existing aligners at optimizing their own objective functions, even achieving near-optimal solutions when the optimal solution is known. We offer the first demonstration of global network alignments based on topology alone that align functionally similar proteins with p-values in some cases below 10-300. We predict that topological network alignment has a bright future as edge densities increase toward the value where good alignments become possible. We demonstrate that when enough common topology is present at high enough edge densities-for example in the recent, partly synthetic networks of the Integrated Interaction Database-topological network alignment easily recovers most orthologs, paving the way toward high-throughput functional prediction based on topology-driven network alignment.
Collapse
Affiliation(s)
- Siyue Wang
- Department of Computer Science, University of California, Irvine, CA, United States
| | - Xiaoyin Chen
- Department of Computer Science, University of California, Irvine, CA, United States
| | - Brent J Frederisy
- Department of Computer Science, University of California, Irvine, CA, United States
| | - Benedict A Mbakogu
- Department of Computer Science, University of California, Irvine, CA, United States
| | - Amy D Kanne
- Department of Computer Science, University of California, Irvine, CA, United States
| | - Pasha Khosravi
- Department of Computer Science, University of California, Irvine, CA, United States
| | - Wayne B Hayes
- Department of Computer Science, University of California, Irvine, CA, United States.
| |
Collapse
|
8
|
Identifying multiple social network accounts belonging to the same users. SOCIAL NETWORK ANALYSIS AND MINING 2021. [DOI: 10.1007/s13278-021-00736-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
9
|
Li Q, Newaz K, Milenković T. Improved supervised prediction of aging-related genes via weighted dynamic network analysis. BMC Bioinformatics 2021; 22:520. [PMID: 34696741 PMCID: PMC8543111 DOI: 10.1186/s12859-021-04439-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2021] [Accepted: 10/12/2021] [Indexed: 12/22/2022] Open
Abstract
BACKGROUND This study focuses on the task of supervised prediction of aging-related genes from -omics data. Unlike gene expression methods for this task that capture aging-specific information but ignore interactions between genes (i.e., their protein products), or protein-protein interaction (PPI) network methods for this task that account for PPIs but the PPIs are context-unspecific, we recently integrated the two data types into an aging-specific PPI subnetwork, which yielded more accurate aging-related gene predictions. However, a dynamic aging-specific subnetwork did not improve prediction performance compared to a static aging-specific subnetwork, despite the aging process being dynamic. This could be because the dynamic subnetwork was inferred using a naive Induced subgraph approach. Instead, we recently inferred a dynamic aging-specific subnetwork using a methodologically more advanced notion of network propagation (NP), which improved upon Induced dynamic aging-specific subnetwork in a different task, that of unsupervised analyses of the aging process. RESULTS Here, we evaluate whether our existing NP-based dynamic subnetwork will improve upon the dynamic as well as static subnetwork constructed by the Induced approach in the considered task of supervised prediction of aging-related genes. The existing NP-based subnetwork is unweighted, i.e., it gives equal importance to each of the aging-specific PPIs. Because accounting for aging-specific edge weights might be important, we additionally propose a weighted NP-based dynamic aging-specific subnetwork. We demonstrate that a predictive machine learning model trained and tested on the weighted subnetwork yields higher accuracy when predicting aging-related genes than predictive models run on the existing unweighted dynamic or static subnetworks, regardless of whether the existing subnetworks were inferred using NP or the Induced approach. CONCLUSIONS Our proposed weighted dynamic aging-specific subnetwork and its corresponding predictive model could guide with higher confidence than the existing data and models the discovery of novel aging-related gene candidates for future wet lab validation.
Collapse
Affiliation(s)
- Qi Li
- Department of Computer Science and Engineering, Center for Network and Data Science (CNDS), and Eck Institute for Global Health, University of Notre Dame, Notre Dame, IN, 46556, USA
| | - Khalique Newaz
- Department of Computer Science and Engineering, Center for Network and Data Science (CNDS), and Eck Institute for Global Health, University of Notre Dame, Notre Dame, IN, 46556, USA
| | - Tijana Milenković
- Department of Computer Science and Engineering, Center for Network and Data Science (CNDS), and Eck Institute for Global Health, University of Notre Dame, Notre Dame, IN, 46556, USA.
| |
Collapse
|
10
|
PM2RA: A Framework for Detecting and Quantifying Relationship Alterations in Microbial Community. GENOMICS PROTEOMICS & BIOINFORMATICS 2021; 19:154-167. [PMID: 33581337 PMCID: PMC8498968 DOI: 10.1016/j.gpb.2020.07.005] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/21/2020] [Revised: 06/28/2020] [Accepted: 08/09/2020] [Indexed: 11/21/2022]
Abstract
The dysbiosis of gut microbiota is associated with the pathogenesis of human diseases. However, observing shifts in the microbe abundance cannot fully reveal underlying perturbations. Examining the relationship alterations (RAs) in the microbiome between health and disease statuses provides additional hints about the pathogenesis of human diseases, but no methods were designed to detect and quantify the RAs between different conditions directly. Here, we present profile monitoring for microbial relationship alteration (PM2RA), an analysis framework to identify and quantify the microbial RAs. The performance of PM2RA was evaluated with synthetic data, and it showed higher specificity and sensitivity than the co-occurrence-based methods. Analyses of real microbial datasets showed that PM2RA was robust for quantifying microbial RAs across different datasets in several diseases. By applying PM2RA, we identified several novel or previously reported microbes implicated in multiple diseases. PM2RA is now implemented as a web-based application available at http://www.pm2ra-xingyinliulab.cn/.
Collapse
|
11
|
Abstract
In this study, we deal with the problem of biological network alignment (NA), which aims to find a node mapping between species' molecular networks that uncovers similar network regions, thus allowing for the transfer of functional knowledge between the aligned nodes. We provide evidence that current NA methods, which assume that topologically similar nodes (i.e., nodes whose network neighborhoods are isomorphic-like) have high functional relatedness, do not actually end up aligning functionally related nodes. That is, we show that the current topological similarity assumption does not hold well. Consequently, we argue that a paradigm shift is needed with how the NA problem is approached. So, we redefine NA as a data-driven framework, called TARA (data-driven NA), which attempts to learn the relationship between topological relatedness and functional relatedness without assuming that topological relatedness corresponds to topological similarity. TARA makes no assumptions about what nodes should be aligned, distinguishing it from existing NA methods. Specifically, TARA trains a classifier to predict whether two nodes from different networks are functionally related based on their network topological patterns (features). We find that TARA is able to make accurate predictions. TARA then takes each pair of nodes that are predicted as related to be part of an alignment. Like traditional NA methods, TARA uses this alignment for the across-species transfer of functional knowledge. TARA as currently implemented uses topological but not protein sequence information for functional knowledge transfer. In this context, we find that TARA outperforms existing state-of-the-art NA methods that also use topological information, WAVE and SANA, and even outperforms or complements a state-of-the-art NA method that uses both topological and sequence information, PrimAlign. Hence, adding sequence information to TARA, which is our future work, is likely to further improve its performance. The software and data are available at http://www.nd.edu/~cone/TARA/.
Collapse
Affiliation(s)
- Shawn Gu
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN, United States of America
- Eck Institute for Global Health, University of Notre Dame, Notre Dame, IN, United States of America
- Center for Network and Data Science, University of Notre Dame, Notre Dame, IN, United States of America
| | - Tijana Milenković
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN, United States of America
- Eck Institute for Global Health, University of Notre Dame, Notre Dame, IN, United States of America
- Center for Network and Data Science, University of Notre Dame, Notre Dame, IN, United States of America
| |
Collapse
|
12
|
Zhu L, Zhang J, Zhang Y, Lang J, Xiang J, Bai X, Yan N, Tian G, Zhang H, Yang J. NAIGO: An Improved Method to Align PPI Networks Based on Gene Ontology and Graphlets. Front Bioeng Biotechnol 2020; 8:547. [PMID: 32637398 PMCID: PMC7318716 DOI: 10.3389/fbioe.2020.00547] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2020] [Accepted: 05/06/2020] [Indexed: 11/24/2022] Open
Abstract
With the development of high throughput technologies, there are more and more protein–protein interaction (PPI) networks available, which provide a need for efficient computational tools for network alignment. Network alignment is widely used to predict functions of certain proteins, identify conserved network modules, and study the evolutionary relationship across species or biological entities. However, network alignment is an NP-complete problem, and previous algorithms are usually slow or less accurate in aligning big networks like human vs. yeast. In this study, we proposed a fast yet accurate algorithm called Network Alignment by Integrating Biological Process (NAIGO). Specifically, we first divided the networks into subnets taking the advantage of known prior knowledge, such as gene ontology. For each subnet pair, we then developed a novel method to align them by considering both protein orthologous information and their local structural information. After that, we expanded the obtained local network alignments in a greedy manner. Taking the aligned pairs as seeds, we formulated the global network alignment problem as an assignment problem based on similarity matrix, which was solved by the Hungarian method. We applied NAIGO to align human and Saccharomyces cerevisiae S288c PPI network and compared the results with other popular methods like IsoRank, GRAAL, SANA, and NABEECO. As a result, our method outperformed the competitors by aligning more orthologous proteins or matched interactions. In addition, we found a few potential functional orthologous proteins such as RRM2B in human and DNA2 in S. cerevisiae S288c, which are related to DNA repair. We also identified a conserved subnet with six orthologous proteins EXO1, MSH3, MSH2, MLH1, MLH3, and MSH6, and six aligned interactions. All these proteins are associated with mismatch repair. Finally, we predicted a few proteins of S. cerevisiae S288c potentially involving in certain biological processes like autophagosome assembly.
Collapse
Affiliation(s)
- Lijuan Zhu
- College of Mathematics and Computer Science, Zhejiang Normal University, Jinhua, China
| | - Ju Zhang
- Institute of Infectious Diseases, Beijing Ditan Hospital, Capital Medical University, and Beijing Key Laboratory of Emerging Infectious Diseases, Beijing, China
| | - Yi Zhang
- Department of Mathematics, Hebei University of Science & Technology, Shijiazhuang, China
| | | | - Ju Xiang
- Neuroscience Research Center & Department of Basic Medical Sciences, Changsha Medical University, Changsha, China.,School of Computer Science and Engineering, Central South University, Changsha, China
| | - Xiaogang Bai
- Department of Mathematics, Hebei University of Science & Technology, Shijiazhuang, China
| | - Na Yan
- Geneis Beijing Co., Ltd., Beijing, China
| | - Geng Tian
- Geneis Beijing Co., Ltd., Beijing, China
| | - Huajun Zhang
- College of Mathematics and Computer Science, Zhejiang Normal University, Jinhua, China
| | | |
Collapse
|
13
|
Vijayan V, Gu S, Krebs ET, Meng L, MilenkoviĆ T. Pairwise Versus Multiple Global Network Alignment. IEEE ACCESS : PRACTICAL INNOVATIONS, OPEN SOLUTIONS 2020; 8:41961-41974. [PMID: 33747670 PMCID: PMC7971151 DOI: 10.1109/access.2020.2976487] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Biological network alignment (NA) aims to identify similar regions between molecular networks of different species. NA can be local or global. Just as the recent trend in the NA field, we also focus on global NA, which can be pairwise (PNA) and multiple (MNA). PNA produces aligned node pairs between two networks. MNA produces aligned node clusters between more than two networks. Recently, the focus has shifted from PNA to MNA, because MNA captures conserved regions between more networks than PNA (and MNA is thus hypothesized to yield higher-quality alignments), though at higher computational complexity. The issue is that, due to the different outputs of PNA and MNA, a PNA method is only compared to other PNA methods, and an MNA method is only compared to other MNA methods. Comparison of PNA against MNA must be done to evaluate whether MNA indeed yields higher-quality alignments, as only this would justify MNA's higher computational complexity. We introduce a framework that allows for this. We evaluate eight prominent PNA and MNA methods, on synthetic and real-world biological networks, using topological and functional alignment quality measures. We compare PNA against MNA in both a pairwise (native to PNA) and multiple (native to MNA) manner. PNA is expected to perform better under the pairwise evaluation framework. Indeed this is what we find. MNA is expected to perform better under the multiple evaluation framework. Shockingly, we find this not always to hold; PNA is often better than MNA in this framework, depending on the choice of evaluation test.
Collapse
Affiliation(s)
- Vipin Vijayan
- Center for Network and Data Science, Department of Computer Science and Engineering, Eck Institute for Global Health, University of Notre Dame, Notre Dame, IN 46556, USA
| | - Shawn Gu
- Center for Network and Data Science, Department of Computer Science and Engineering, Eck Institute for Global Health, University of Notre Dame, Notre Dame, IN 46556, USA
| | - Eric T Krebs
- Center for Network and Data Science, Department of Computer Science and Engineering, Eck Institute for Global Health, University of Notre Dame, Notre Dame, IN 46556, USA
| | - Lei Meng
- Center for Network and Data Science, Department of Computer Science and Engineering, Eck Institute for Global Health, University of Notre Dame, Notre Dame, IN 46556, USA
| | - Tijana MilenkoviĆ
- Center for Network and Data Science, Department of Computer Science and Engineering, Eck Institute for Global Health, University of Notre Dame, Notre Dame, IN 46556, USA
| |
Collapse
|
14
|
Guzzi PH, Milenkovic T. Survey of local and global biological network alignment: the need to reconcile the two sides of the same coin. Brief Bioinform 2019; 19:472-481. [PMID: 28062413 DOI: 10.1093/bib/bbw132] [Citation(s) in RCA: 30] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2016] [Indexed: 12/23/2022] Open
Abstract
Analogous to genomic sequence alignment that allows for across-species transfer of biological knowledge between conserved sequence regions, biological network alignment can be used to guide the knowledge transfer between conserved regions of molecular networks of different species. Hence, biological network alignment can be used to redefine the traditional notion of a sequence-based homology to a new notion of network-based homology. Analogous to genomic sequence alignment, there exist local and global biological network alignments. Here, we survey prominent and recent computational approaches of each network alignment type and discuss their (dis)advantages. Then, as it was recently shown that the two approach types are complementary, in the sense that they capture different slices of cellular functioning, we discuss the need to reconcile the two network alignment types and present a recent first step in this direction. We conclude with some open research problems on this topic and comment on the usefulness of network alignment in other domains besides computational biology.
Collapse
Affiliation(s)
- Pietro Hiram Guzzi
- Department of Surgical and Medical Sciences, University Magna Graecia, Catanzaro, 88100 Italy
| | - Tijana Milenkovic
- Department of Computer Science and Engineering, Interdisciplinary Center for Network Science and Applications (iCeNSA), ECK Institute for Global Health, University of Notre Dame, Notre Dame, IN 46556, USA
| |
Collapse
|
15
|
Freund A. Untangling Aging Using Dynamic, Organism-Level Phenotypic Networks. Cell Syst 2019; 8:172-181. [PMID: 30878357 DOI: 10.1016/j.cels.2019.02.005] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2018] [Revised: 12/14/2018] [Accepted: 02/13/2019] [Indexed: 12/15/2022]
Abstract
Research on aging requires the ability to measure aging, and therein lies a challenge: it is impossible to measure every molecular, cellular, and physiological change that develops over time, but it is difficult to prioritize phenotypes for measurement because it is unclear which biological changes should be considered aspects of aging and, further, which species and environments exhibit "real aging." Here, I propose a strategy to address this challenge: rather than classify phenotypes as "real aging" or not, conceptualize aging as the set of all age-dependent phenotypes and appreciate that this set and its underlying mechanisms may vary by population. Use automated phenotyping technologies to measure as many age-dependent phenotypes as possible within individuals over time, prioritizing organism-level (i.e., physiological) phenotypes in order to enrich for health relevance. Use those high-dimensional phenotypic data to construct dynamic networks that allow aging to be studied with unprecedented sophistication and rigor.
Collapse
Affiliation(s)
- Adam Freund
- Calico Life Sciences, LLC, South San Francisco, CA 94080, USA.
| |
Collapse
|
16
|
Vijayan V, Milenkovic T. Multiple Network Alignment via MultiMAGNA+. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2018; 15:1669-1682. [PMID: 28829315 DOI: 10.1109/tcbb.2017.2740381] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Network alignment (NA) aims to find a node mapping that identifies topologically or functionally similar network regions between molecular networks of different species. Analogous to genomic sequence alignment, NA can be used to transfer biological knowledge from well- to poorly-studied species between aligned network regions. Pairwise NA (PNA) finds similar regions between two networks while multiple NA (MNA) can align more than two networks. We focus on MNA. Existing MNA methods aim to maximize total similarity over all aligned nodes (node conservation). Then, they evaluate alignment quality by measuring the amount of conserved edges, but only after the alignment is constructed. Directly optimizing edge conservation during alignment construction in addition to node conservation may result in superior alignments. Thus, we present a novel MNA method called multiMAGNA++ that can achieve this. Indeed, multiMAGNA++ outperforms or is on par with existing MNA methods, while often completing faster than existing methods. That is, multiMAGNA++ scales well to larger network data and can be parallelized effectively. During method evaluation, we also introduce new MNA quality measures to allow for more fair MNA method comparison compared to the existing alignment quality measures. The multiMAGNA++ code is available on the method's web page at http://nd.edu/~cone/multiMAGNA++/.
Collapse
|
17
|
Gu S, Johnson J, Faisal FE, Milenković T. From homogeneous to heterogeneous network alignment via colored graphlets. Sci Rep 2018; 8:12524. [PMID: 30131590 PMCID: PMC6104050 DOI: 10.1038/s41598-018-30831-w] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2018] [Accepted: 08/07/2018] [Indexed: 11/19/2022] Open
Abstract
Network alignment (NA) compares networks with the goal of finding a node mapping that uncovers highly similar (conserved) network regions. Existing NA methods are homogeneous, i.e., they can deal only with networks containing nodes and edges of one type. Due to increasing amounts of heterogeneous network data with nodes or edges of different types, we extend three recent state-of-the-art homogeneous NA methods, WAVE, MAGNA++, and SANA, to allow for heterogeneous NA for the first time. We introduce several algorithmic novelties. Namely, these existing methods compute homogeneous graphlet-based node similarities and then find high-scoring alignments with respect to these similarities, while simultaneously maximizing the amount of conserved edges. Instead, we extend homogeneous graphlets to their heterogeneous counterparts, which we then use to develop a new measure of heterogeneous node similarity. Also, we extend S3, a state-of-the-art measure of edge conservation for homogeneous NA, to its heterogeneous counterpart. Then, we find high-scoring alignments with respect to our heterogeneous node similarity and edge conservation measures. In evaluations on synthetic and real-world biological networks, our proposed heterogeneous NA methods lead to higher-quality alignments and better robustness to noise in the data than their homogeneous counterparts. The software and data from this work is available at https://nd.edu/~cone/colored_graphlets/.
Collapse
Affiliation(s)
- Shawn Gu
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN, 46556, USA
| | - John Johnson
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN, 46556, USA
| | - Fazle E Faisal
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN, 46556, USA
- Eck Institute for Global Health and Interdisciplinary Center for Network Science and Applications (iCeNSA), University of Notre Dame, Notre Dame, IN, 46556, USA
| | - Tijana Milenković
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN, 46556, USA.
- Eck Institute for Global Health and Interdisciplinary Center for Network Science and Applications (iCeNSA), University of Notre Dame, Notre Dame, IN, 46556, USA.
| |
Collapse
|
18
|
Huang J, Gong M, Ma L. A Global Network Alignment Method Using Discrete Particle Swarm Optimization. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2018; 15:705-718. [PMID: 27775534 DOI: 10.1109/tcbb.2016.2618380] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Molecular interactions data increase exponentially with the advance of biotechnology. This makes it possible and necessary to comparatively analyze the different data at a network level. Global network alignment is an important network comparison approach to identify conserved subnetworks and get insight into evolutionary relationship across species. Network alignment which is analogous to subgraph isomorphism is known to be an NP-hard problem. In this paper, we introduce a novel heuristic Particle-Swarm-Optimization based Network Aligner (PSONA), which optimizes a weighted global alignment model considering both protein sequence similarity and interaction conservations. The particle statuses and status updating rules are redefined in a discrete form by using permutation. A seed-and-extend strategy is employed to guide the searching for the superior alignment. The proposed initialization method "seeds" matches with high sequence similarity into the alignment, which guarantees the functional coherence of the mapping nodes. A greedy local search method is designed as the "extension" procedure to iteratively optimize the edge conservations. PSONA is compared with several state-of-art methods on ten network pairs combined by five species. The experimental results demonstrate that the proposed aligner can map the proteins with high functional coherence and can be used as a booster to effectively refine the well-studied aligners.
Collapse
|
19
|
Milano M, Guzzi PH, Cannataro M. GLAlign: A Novel Algorithm for Local Network Alignment. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2018; 16:1958-1969. [PMID: 29993696 DOI: 10.1109/tcbb.2018.2830323] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Networks are successfully used as a modelling framework in many application domains. For instance, Protein-Protein Interaction Networks (PPINs) model the set of interactions among proteins in a cell. A critical application of network analysis is the comparison among PPINs of different organisms to reveal similarities among the underlying biological processes. Algorithms for comparing networks (also referred to as network aligners) fall into two main classes: global aligners, which aim to compare two networks on a global scale, and local aligners that evidence single sub-regions of similarity among networks. The possibility to improve the performance of the aligners by mixing the two approaches is a growing research area. In our previous work, we started to explore the possibility to use global alignment to improve the local one. We here examine further this possibility by using topological information extracted from global alignment to guide the steps of the local alignment. Therefore, we present GLAlign (Global Local Aligner), a methodology that improves the performances of local network aligners by exploiting a preliminary global alignment. Furthermore, we provide an implementation of GLAlign. As a proof-of-principle, we evaluated the performance of the GLAlign prototype. Results show that GLAlign methodology outperforms the state-of-the-art local alignment algorithms. GLAlign is publicly available for academic use and can be downloaded here: https://sites.google.com/site/globallocalalignment/.
Collapse
|
20
|
Yoo B, Faisal FE, Chen H, Milenkovic T. Improving Identification of Key Players in Aging via Network De-Noising and Core Inference. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2017; 14:1056-1069. [PMID: 26529776 DOI: 10.1109/tcbb.2015.2495170] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Current "ground truth" knowledge about human aging has been obtained by transferring aging-related knowledge from well-studied model species via sequence homology or by studying human gene expression data. Since proteins function by interacting with each other, analyzing protein-protein interaction (PPI) networks in the context of aging is promising. Unlike existing static network research of aging, since cellular functioning is dynamic, we recently integrated the static human PPI network with aging-related gene expression data to form dynamic, age-specific networks. Then, we predicted as key players in aging those proteins whose network topologies significantly changed with age. Since current networks are noisy , here, we use link prediction to de-noise the human network and predict improved key players in aging from the de-noised data. Indeed, de-noising gives more significant overlap between the predicted data and the "ground truth" aging-related data. Yet, we obtain novel predictions, which we validate in the literature. Also, we improve the predictions by an alternative strategy: removing "redundant" edges from the age-specific networks and using the resulting age-specific network "cores" to study aging. We produce new knowledge from dynamic networks encompassing multiple data types, via network de-noising or core inference, complementing the existing knowledge obtained from sequence or expression data.
Collapse
|
21
|
Abstract
MOTIVATION Network alignment (NA) aims to find a node mapping that conserves similar regions between compared networks. NA is applicable to many fields, including computational biology, where NA can guide the transfer of biological knowledge from well- to poorly-studied species across aligned network regions. Existing NA methods can only align static networks. However, most complex real-world systems evolve over time and should thus be modeled as dynamic networks. We hypothesize that aligning dynamic network representations of evolving systems will produce superior alignments compared to aligning the systems' static network representations, as is currently done. RESULTS For this purpose, we introduce the first ever dynamic NA method, DynaMAGNA ++. This proof-of-concept dynamic NA method is an extension of a state-of-the-art static NA method, MAGNA++. Even though both MAGNA++ and DynaMAGNA++ optimize edge as well as node conservation across the aligned networks, MAGNA++ conserves static edges and similarity between static node neighborhoods, while DynaMAGNA++ conserves dynamic edges (events) and similarity between evolving node neighborhoods. For this purpose, we introduce the first ever measure of dynamic edge conservation and rely on our recent measure of dynamic node conservation. Importantly, the two dynamic conservation measures can be optimized with any state-of-the-art NA method and not just MAGNA++. We confirm our hypothesis that dynamic NA is superior to static NA, on synthetic and real-world networks, in computational biology and social domains. DynaMAGNA++ is parallelized and has a user-friendly graphical interface. AVAILABILITY AND IMPLEMENTATION http://nd.edu/∼cone/DynaMAGNA++/ . CONTACT tmilenko@nd.edu. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- V Vijayan
- Department of Computer Science and Engineering, ECK Institute for Global Health, and Interdisciplinary Center for Network Science and Applications (iCeNSA), University of Notre Dame, Notre Dame, IN, USA
| | - D Critchlow
- Department of Computer Science and Engineering, ECK Institute for Global Health, and Interdisciplinary Center for Network Science and Applications (iCeNSA), University of Notre Dame, Notre Dame, IN, USA
- Department of Physics and Astronomy, Austin Peay State University, Clarksville, Tennessee, TN, USA
| | - T Milenković
- Department of Computer Science and Engineering, ECK Institute for Global Health, and Interdisciplinary Center for Network Science and Applications (iCeNSA), University of Notre Dame, Notre Dame, IN, USA
| |
Collapse
|
22
|
Meng L, Striegel A, Milenković T. Local versus global biological network alignment. Bioinformatics 2016; 32:3155-3164. [PMID: 27357169 PMCID: PMC5048063 DOI: 10.1093/bioinformatics/btw348] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2015] [Revised: 02/18/2016] [Accepted: 05/23/2016] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Network alignment (NA) aims to find regions of similarities between species' molecular networks. There exist two NA categories: local (LNA) and global (GNA). LNA finds small highly conserved network regions and produces a many-to-many node mapping. GNA finds large conserved regions and produces a one-to-one node mapping. Given the different outputs of LNA and GNA, when a new NA method is proposed, it is compared against existing methods from the same category. However, both NA categories have the same goal: to allow for transferring functional knowledge from well- to poorly-studied species between conserved network regions. So, which one to choose, LNA or GNA? To answer this, we introduce the first systematic evaluation of the two NA categories. RESULTS We introduce new measures of alignment quality that allow for fair comparison of the different LNA and GNA outputs, as such measures do not exist. We provide user-friendly software for efficient alignment evaluation that implements the new and existing measures. We evaluate prominent LNA and GNA methods on synthetic and real-world biological networks. We study the effect on alignment quality of using different interaction types and confidence levels. We find that the superiority of one NA category over the other is context-dependent. Further, when we contrast LNA and GNA in the application of learning novel protein functional knowledge, the two produce very different predictions, indicating their complementarity. Our results and software provide guidelines for future NA method development and evaluation. AVAILABILITY AND IMPLEMENTATION Software: http://www.nd.edu/~cone/LNA_GNA CONTACT: : tmilenko@nd.eduSupplementary information: Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Lei Meng
- Department of Computer Science and Engineering ECK Institute of Global Health and Interdisciplinary Center for Network Science and Applications
| | - Aaron Striegel
- Department of Computer Science and Engineering Wireless Institute, University of Notre Dame, Notre Dame, IN 46556, USA
| | - Tijana Milenković
- Department of Computer Science and Engineering ECK Institute of Global Health and Interdisciplinary Center for Network Science and Applications
| |
Collapse
|
23
|
Faisal FE, Meng L, Crawford J, Milenković T. The post-genomic era of biological network alignment. EURASIP JOURNAL ON BIOINFORMATICS & SYSTEMS BIOLOGY 2015; 2015:3. [PMID: 28194172 PMCID: PMC5270500 DOI: 10.1186/s13637-015-0022-9] [Citation(s) in RCA: 48] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/21/2015] [Accepted: 05/18/2015] [Indexed: 11/10/2022]
Abstract
Biological network alignment aims to find regions of topological and functional (dis)similarities between molecular networks of different species. Then, network alignment can guide the transfer of biological knowledge from well-studied model species to less well-studied species between conserved (aligned) network regions, thus complementing valuable insights that have already been provided by genomic sequence alignment. Here, we review computational challenges behind the network alignment problem, existing approaches for solving the problem, ways of evaluating their alignment quality, and the approaches' biomedical applications. We discuss recent innovative efforts of improving the existing view of network alignment. We conclude with open research questions in comparative biological network research that could further our understanding of principles of life, evolution, disease, and therapeutics.
Collapse
Affiliation(s)
- Fazle E Faisal
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN, 46556 USA
- Interdisciplinary Center for Network Science and Applications, University of Notre Dame, Notre Dame, IN, 46556 USA
- ECK Institute for Global Health, University of Notre Dame, Notre Dame, IN, 46556 USA
| | - Lei Meng
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN, 46556 USA
| | - Joseph Crawford
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN, 46556 USA
- Interdisciplinary Center for Network Science and Applications, University of Notre Dame, Notre Dame, IN, 46556 USA
- ECK Institute for Global Health, University of Notre Dame, Notre Dame, IN, 46556 USA
| | - Tijana Milenković
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN, 46556 USA
- Interdisciplinary Center for Network Science and Applications, University of Notre Dame, Notre Dame, IN, 46556 USA
- ECK Institute for Global Health, University of Notre Dame, Notre Dame, IN, 46556 USA
| |
Collapse
|
24
|
Hulovatyy Y, Chen H, Milenković T. Exploring the structure and function of temporal networks with dynamic graphlets. Bioinformatics 2015; 31:i171-80. [PMID: 26072480 PMCID: PMC4765862 DOI: 10.1093/bioinformatics/btv227] [Citation(s) in RCA: 62] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023] Open
Abstract
MOTIVATION With increasing availability of temporal real-world networks, how to efficiently study these data? One can model a temporal network as a single aggregate static network, or as a series of time-specific snapshots, each being an aggregate static network over the corresponding time window. Then, one can use established methods for static analysis on the resulting aggregate network(s), but losing in the process valuable temporal information either completely, or at the interface between different snapshots, respectively. Here, we develop a novel approach for studying a temporal network more explicitly, by capturing inter-snapshot relationships. RESULTS We base our methodology on well-established graphlets (subgraphs), which have been proven in numerous contexts in static network research. We develop new theory to allow for graphlet-based analyses of temporal networks. Our new notion of dynamic graphlets is different from existing dynamic network approaches that are based on temporal motifs (statistically significant subgraphs). The latter have limitations: their results depend on the choice of a null network model that is required to evaluate the significance of a subgraph, and choosing a good null model is non-trivial. Our dynamic graphlets overcome the limitations of the temporal motifs. Also, when we aim to characterize the structure and function of an entire temporal network or of individual nodes, our dynamic graphlets outperform the static graphlets. Clearly, accounting for temporal information helps. We apply dynamic graphlets to temporal age-specific molecular network data to deepen our limited knowledge about human aging. AVAILABILITY AND IMPLEMENTATION http://www.nd.edu/∼cone/DG.
Collapse
Affiliation(s)
- Y Hulovatyy
- Department of Computer Science and Engineering, Interdisciplinary Center for Network Science and Applications, and ECK Institute for Global Health, University of Notre Dame, Notre Dame, IN 46556, USA
| | - H Chen
- Department of Computer Science and Engineering, Interdisciplinary Center for Network Science and Applications, and ECK Institute for Global Health, University of Notre Dame, Notre Dame, IN 46556, USA
| | - T Milenković
- Department of Computer Science and Engineering, Interdisciplinary Center for Network Science and Applications, and ECK Institute for Global Health, University of Notre Dame, Notre Dame, IN 46556, USA
| |
Collapse
|
25
|
Crawford J, Sun Y, Milenković T. Fair evaluation of global network aligners. Algorithms Mol Biol 2015; 10:19. [PMID: 26060505 PMCID: PMC4460690 DOI: 10.1186/s13015-015-0050-8] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2014] [Accepted: 05/10/2015] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Analogous to genomic sequence alignment, biological network alignment identifies conserved regions between networks of different species. Then, function can be transferred from well- to poorly-annotated species between aligned network regions. Network alignment typically encompasses two algorithmic components: node cost function (NCF), which measures similarities between nodes in different networks, and alignment strategy (AS), which uses these similarities to rapidly identify high-scoring alignments. Different methods use both different NCFs and different ASs. Thus, it is unclear whether the superiority of a method comes from its NCF, its AS, or both. We already showed on state-of-the-art methods, MI-GRAAL and IsoRankN, that combining NCF of one method and AS of another method can give a new superior method. Here, we evaluate MI-GRAAL against a newer approach, GHOST, by mixing-and-matching the methods' NCFs and ASs to potentially further improve alignment quality. While doing so, we approach important questions that have not been asked systematically thus far. First, we ask how much of the NCF information should come from protein sequence data compared to network topology data. Existing methods determine this parameter more-less arbitrarily, which could affect alignment quality. Second, when topological information is used in NCF, we ask how large the size of the neighborhoods of the compared nodes should be. Existing methods assume that the larger the neighborhood size, the better. RESULTS Our findings are as follows. MI-GRAAL's NCF is superior to GHOST's NCF, while the performance of the methods' ASs is data-dependent. Thus, for data on which GHOST's AS is superior to MI-GRAAL's AS, the combination of MI-GRAAL's NCF and GHOST's AS represents a new superior method. Also, which amount of sequence information is used within NCF does not affect alignment quality, while the inclusion of topological information is crucial for producing good alignments. Finally, larger neighborhood sizes are preferred, but often, it is the second largest size that is superior. Using this size instead of the largest one would decrease computational complexity. CONCLUSION Taken together, our results represent general recommendations for a fair evaluation of network alignment methods and in particular of two-stage NCF-AS approaches.
Collapse
|
26
|
Yaveroğlu ÖN, Milenković T, Pržulj N. Proper evaluation of alignment-free network comparison methods. Bioinformatics 2015; 31:2697-704. [PMID: 25810431 PMCID: PMC4528624 DOI: 10.1093/bioinformatics/btv170] [Citation(s) in RCA: 29] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2015] [Accepted: 03/18/2015] [Indexed: 11/25/2022] Open
Abstract
Motivation: Network comparison is a computationally intractable problem with important applications in systems biology and other domains. A key challenge is to properly quantify similarity between wiring patterns of two networks in an alignment-free fashion. Also, alignment-based methods exist that aim to identify an actual node mapping between networks and as such serve a different purpose. Various alignment-free methods that use different global network properties (e.g. degree distribution) have been proposed. Methods based on small local subgraphs called graphlets perform the best in the alignment-free network comparison task, due to high level of topological detail that graphlets can capture. Among different graphlet-based methods, Graphlet Correlation Distance (GCD) was shown to be the most accurate for comparing networks. Recently, a new graphlet-based method called NetDis was proposed, which was claimed to be superior. We argue against this, as the performance of NetDis was not properly evaluated to position it correctly among the other alignment-free methods. Results: We evaluate the performance of available alignment-free network comparison methods, including GCD and NetDis. We do this by measuring accuracy of each method (in a systematic precision-recall framework) in terms of how well the method can group (cluster) topologically similar networks. By testing this on both synthetic and real-world networks from different domains, we show that GCD remains the most accurate, noise-tolerant and computationally efficient alignment-free method. That is, we show that NetDis does not outperform the other methods, as originally claimed, while it is also computationally more expensive. Furthermore, since NetDis is dependent on the choice of a network null model (unlike the other graphlet-based methods), we show that its performance is highly sensitive to the choice of this parameter. Finally, we find that its performance is not independent on network sizes and densities, as originally claimed. Contact: natasha@imperial.ac.uk Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Ömer Nebil Yaveroğlu
- California Institute for Telecommunications and Information Technology (Calit2), University of California, Irvine, CA 92697, USA
| | - Tijana Milenković
- Department of Computer Science and Engineering, University of Notre Dame, IN 46556, USA and
| | - Nataša Pržulj
- Department of Computing, Imperial College London, London SW7 2AZ, UK
| |
Collapse
|
27
|
Vijayan V, Saraph V, Milenković T. MAGNA++: Maximizing Accuracy in Global Network Alignment via both node and edge conservation. Bioinformatics 2015; 31:2409-11. [PMID: 25792552 DOI: 10.1093/bioinformatics/btv161] [Citation(s) in RCA: 72] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2014] [Accepted: 03/14/2015] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Network alignment aims to find conserved regions between different networks. Existing methods aim to maximize total similarity over all aligned nodes (i.e. node conservation). Then, they evaluate alignment quality by measuring the amount of conserved edges, but only after the alignment is constructed. Thus, we recently introduced MAGNA (Maximizing Accuracy in Global Network Alignment) to directly maximize edge conservation while producing alignments and showed its superiority over the existing methods. Here, we extend the original MAGNA with several important algorithmic advances into a new MAGNA++ framework. RESULTS MAGNA++ introduces several novelties: (i) it simultaneously maximizes any one of three different measures of edge conservation (including our recent superior [Formula: see text] measure) and any desired node conservation measure, which further improves alignment quality compared with maximizing only node conservation or only edge conservation; (ii) it speeds up the original MAGNA algorithm by parallelizing it to automatically use all available resources, as well as by reimplementing the edge conservation measures more efficiently; (iii) it provides a friendly graphical user interface for easy use by domain (e.g. biological) scientists; and (iv) at the same time, MAGNA++ offers source code for easy extensibility by computational scientists. AVAILABILITY AND IMPLEMENTATION http://www.nd.edu/∼cone/MAGNA++/
Collapse
Affiliation(s)
- V Vijayan
- Department of Computer Science and Engineering, ECK Institute for Global Health, Interdisciplinary Center for Network Science and Application, University of Notre Dame, IN 46556, USA and
| | - V Saraph
- Department of Computer Science, Brown University, Providence, RI 02912, USA
| | - T Milenković
- Department of Computer Science and Engineering, ECK Institute for Global Health, Interdisciplinary Center for Network Science and Application, University of Notre Dame, IN 46556, USA and
| |
Collapse
|
28
|
Sun Y, Crawford J, Tang J, Milenković T. Simultaneous Optimization of both Node and Edge Conservation in Network Alignment via WAVE. LECTURE NOTES IN COMPUTER SCIENCE 2015. [DOI: 10.1007/978-3-662-48221-6_2] [Citation(s) in RCA: 32] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
|