1
|
Raajaraam L, Raman K. Modeling Microbial Communities: Perspective and Challenges. ACS Synth Biol 2024; 13:2260-2270. [PMID: 39148432 DOI: 10.1021/acssynbio.4c00116] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/17/2024]
Abstract
Microbial communities are immensely important due to their widespread presence and profound impact on various facets of life. Understanding these complex systems necessitates mathematical modeling, a powerful tool for simulating and predicting microbial community behavior. This review offers a critical analysis of metabolic modeling and highlights key areas that would greatly benefit from broader discussion and collaboration. Moreover, we explore the challenges and opportunities linked to the intricate nature of these communities, spanning data generation, modeling, and validation. We are confident that ongoing advancements in modeling techniques, such as machine learning, coupled with interdisciplinary collaborations, will unlock the full potential of microbial communities across diverse applications.
Collapse
Affiliation(s)
- Lavanya Raajaraam
- Bhupat and Jyoti Mehta School of Biosciences, Department of Biotechnology, Indian Institute of Technology (IIT) Madras, Chennai 600 036, India
- Centre for Integrative Biology and Systems mEdicine, IIT Madras, Chennai 600 036, India
- Robert Bosch Centre for Data Science and Artificial Intelligence (RBCDSAI), IIT Madras, Chennai 600 036, India
| | - Karthik Raman
- Bhupat and Jyoti Mehta School of Biosciences, Department of Biotechnology, Indian Institute of Technology (IIT) Madras, Chennai 600 036, India
- Centre for Integrative Biology and Systems mEdicine, IIT Madras, Chennai 600 036, India
- Robert Bosch Centre for Data Science and Artificial Intelligence (RBCDSAI), IIT Madras, Chennai 600 036, India
- Department of Data Science and AI, Wadhwani School of Data Science and Artificial Intelligence, IIT Madras, Chennai 600 036, India
| |
Collapse
|
2
|
Menor-Flores M, Vega-Rodríguez MA. A protein-protein interaction network aligner study in the multi-objective domain. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2024; 250:108188. [PMID: 38657382 DOI: 10.1016/j.cmpb.2024.108188] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/10/2023] [Revised: 04/14/2024] [Accepted: 04/17/2024] [Indexed: 04/26/2024]
Abstract
BACKGROUND AND OBJECTIVE The protein-protein interaction (PPI) network alignment has proven to be an efficient technique in the diagnosis and prevention of certain diseases. However, the difficulty in maximizing, at the same time, the two qualities that measure the goodness of alignments (topological and biological quality) has led aligners to produce very different alignments. Thus making a comparative study among alignments of such different qualities a big challenge. Multi-objective optimization is a computer method, which is very powerful in this kind of contexts because both conflicting qualities are considered together. Analysing the alignments of each PPI network aligner with multi-objective methodologies allows you to visualize a bigger picture of the alignments and their qualities, obtaining very interesting conclusions. This paper proposes a comprehensive PPI network aligner study in the multi-objective domain. METHODS Alignments from each aligner and all aligners together were studied and compared to each other via Pareto dominance methodologies. The best alignments produced by each aligner and all aligners together for five different alignment scenarios were displayed in Pareto front graphs. Later, the aligners were ranked according to the topological, biological, and combined quality of their alignments. Finally, the aligners were also ranked based on their average runtimes. RESULTS Regarding aligners constructing the best overall alignments, we found that SAlign, BEAMS, SANA, and HubAlign are the best options. Additionally, the alignments of best topological quality are produced by: SANA, SAlign, and HubAlign aligners. On the contrary, the aligners returning the alignments of best biological quality are: BEAMS, TAME, and WAVE. However, if there are time constraints, it is recommended to select SAlign to obtain high topological quality alignments and PISwap or SAlign aligners for high biological quality alignments. CONCLUSIONS The use of the SANA aligner is recommended for obtaining the best alignments of topological quality, BEAMS for alignments of the best biological quality, and SAlign for alignments of the best combined topological and biological quality. Simultaneously, SANA and BEAMS have above-average runtimes. Therefore, it is suggested, if necessary due to time restrictions, to choose other, faster aligners like SAlign or PISwap whose alignments are also of high quality.
Collapse
Affiliation(s)
- Manuel Menor-Flores
- Escuela Politécnica, Universidad de Extremadura,(1) Campus Universitario s/n, 10003 Cáceres, Spain.
| | - Miguel A Vega-Rodríguez
- Escuela Politécnica, Universidad de Extremadura,(1) Campus Universitario s/n, 10003 Cáceres, Spain.
| |
Collapse
|
3
|
Silva-Andrade C, Rodriguez-Fernández M, Garrido D, Martin AJM. Using metabolic networks to predict cross-feeding and competition interactions between microorganisms. Microbiol Spectr 2024; 12:e0228723. [PMID: 38506512 PMCID: PMC11064492 DOI: 10.1128/spectrum.02287-23] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2023] [Accepted: 02/06/2024] [Indexed: 03/21/2024] Open
Abstract
Understanding the interactions between microorganisms and their impact on bacterial behavior at the community level is a key research topic in microbiology. Different methods, relying on experimental or mathematical approaches based on the diverse properties of bacteria, are currently employed to study these interactions. Recently, the use of metabolic networks to understand the interactions between bacterial pairs has increased, highlighting the relevance of this approach in characterizing bacteria. In this study, we leverage the representation of bacteria through their metabolic networks to build a predictive model aimed at reducing the number of experimental assays required for designing bacterial consortia with specific behaviors. Our novel method for predicting cross-feeding or competition interactions between pairs of microorganisms utilizes metabolic network features. Machine learning classifiers are employed to determine the type of interaction from automatically reconstructed metabolic networks. Several algorithms were assessed and selected based on comprehensive testing and careful separation of manually compiled data sets obtained from literature sources. We used different classification algorithms, including K Nearest Neighbors, XGBoost, Support Vector Machine, and Random Forest, tested different parameter values, and implemented several data curation approaches to reduce the biological bias associated with our data set, ultimately achieving an accuracy of over 0.9. Our method holds substantial potential to advance the understanding of community behavior and contribute to the development of more effective approaches for consortia design.IMPORTANCEUnderstanding bacterial interactions at the community level is critical for microbiology, and leveraging metabolic networks presents an efficient and effective approach. The introduction of this novel method for predicting interactions through machine learning classifiers has the potential to advance the field by reducing the number of experimental assays required and contributing to the development of more effective bacterial consortia.
Collapse
Affiliation(s)
- Claudia Silva-Andrade
- Programa de Doctorado en Genómica Integrativa, Vicerrectoría de Investigación, Universidad Mayor, Santiago, Chile
- Laboratorio de Redes Biológicas, Centro Científico y Tecnológico de Excelencia Ciencia & Vida, Fundación Ciencia & Vida, Santiago, Chile
| | - María Rodriguez-Fernández
- Institute for Biological and Medical Engineering, Schools of Engineering, Medicine and Biological Sciences, Pontificia Universidad Católica de Chile, Santiago, Chile
| | - Daniel Garrido
- Department of Chemical and Bioprocess Engineering, School of Engineering, Pontificia Universidad Católica de Chile, Santiago, Chile
| | - Alberto J. M. Martin
- Laboratorio de Redes Biológicas, Centro Científico y Tecnológico de Excelencia Ciencia & Vida, Fundación Ciencia & Vida, Santiago, Chile
- Escuela de Ingeniería, Facultad de Ingeniería, Arquitectura y Diseño, Universidad San Sebastián, Santiago, Chile
| |
Collapse
|
4
|
Network-Based Structural Alignment of RNA Sequences Using TOPAS. Methods Mol Biol 2023; 2586:147-162. [PMID: 36705903 DOI: 10.1007/978-1-0716-2768-6_9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
Abstract
TOPAS (TOPological network-based Alignment of Structural RNAs) is a network-based alignment algorithm that predicts structurally sound pairwise alignment of RNAs. In order to take advantage of recent advances in comparative network analysis for efficient structurally sound RNA alignment, TOPAS constructs topological network representations for RNAs, which consist of sequential edges connecting nucleotide bases as well as structural edges reflecting the underlying folding structure. Structural edges are weighted by the estimated base-pairing probabilities. Next, the constructed networks are aligned using probabilistic network alignment techniques, which yield a structurally sound RNA alignment that considers both the sequence similarity and the structural similarity between the given RNAs. Compared to traditional Sankoff-style algorithms, this network-based alignment scheme leads to a significant reduction in the overall computational cost while yielding favorable alignment results. Another important benefit is its capability to handle arbitrary folding structures, which can potentially lead to more accurate alignment for RNAs with pseudoknots.
Collapse
|
5
|
Milano M, Agapito G, Cannataro M. Challenges and Limitations of Biological Network Analysis. BIOTECH 2022; 11:24. [PMID: 35892929 PMCID: PMC9326688 DOI: 10.3390/biotech11030024] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2022] [Revised: 07/04/2022] [Accepted: 07/06/2022] [Indexed: 11/17/2022] Open
Abstract
High-Throughput technologies are producing an increasing volume of data that needs large amounts of data storage, effective data models and efficient, possibly parallel analysis algorithms. Pathway and interactomics data are represented as graphs and add a new dimension of analysis, allowing, among other features, graph-based comparison of organisms' properties. For instance, in biological pathway representation, the nodes can represent proteins, RNA and fat molecules, while the edges represent the interaction between molecules. Otherwise, biological networks such as Protein-Protein Interaction (PPI) Networks, represent the biochemical interactions among proteins by using nodes that model the proteins from a given organism, and edges that model the protein-protein interactions, whereas pathway networks enable the representation of biochemical-reaction cascades that happen within the cells or tissues. In this paper, we discuss the main models for standard representation of pathways and PPI networks, the data models for the representation and exchange of pathway and protein interaction data, the main databases in which they are stored and the alignment algorithms for the comparison of pathways and PPI networks of different organisms. Finally, we discuss the challenges and the limitations of pathways and PPI network representation and analysis. We have identified that network alignment presents a lot of open problems worthy of further investigation, especially concerning pathway alignment.
Collapse
Affiliation(s)
- Marianna Milano
- Department of Medical and Clinical Surgery, University Magna Græcia, 88100 Catanzaro, Italy; (M.M.); (M.C.)
- Data Analytics Research Center, University Magna Græcia, 88100 Catanzaro, Italy
| | - Giuseppe Agapito
- Data Analytics Research Center, University Magna Græcia, 88100 Catanzaro, Italy
- Department of Law, Economics and Social Sciences, University Magna Græcia, 88100 Catanzaro, Italy
| | - Mario Cannataro
- Department of Medical and Clinical Surgery, University Magna Græcia, 88100 Catanzaro, Italy; (M.M.); (M.C.)
- Data Analytics Research Center, University Magna Græcia, 88100 Catanzaro, Italy
| |
Collapse
|
6
|
Ma L, Shao Z, Li L, Huang J, Wang S, Lin Q, Li J, Gong M, Nandi AK. Heuristics and metaheuristics for biological network alignment: A review. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2021.08.156] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
7
|
Milano M, Zucco C, Settino M, Cannataro M. An Extensive Assessment of Network Embedding in PPI Network Alignment. ENTROPY (BASEL, SWITZERLAND) 2022; 24:730. [PMID: 35626613 PMCID: PMC9141406 DOI: 10.3390/e24050730] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/29/2022] [Revised: 05/18/2022] [Accepted: 05/19/2022] [Indexed: 12/07/2022]
Abstract
Network alignment is a fundamental task in network analysis. In the biological field, where the protein-protein interaction (PPI) is represented as a graph, network alignment allowed the discovery of underlying biological knowledge such as conserved evolutionary pathways and functionally conserved proteins throughout different species. A recent trend in network science concerns network embedding, i.e., the modelling of nodes in a network as a low-dimensional feature vector. In this survey, we present an overview of current PPI network embedding alignment methods, a comparison among them, and a comparison to classical PPI network alignment algorithms. The results of this comparison highlight that: (i) only five network embeddings for network alignment algorithms have been applied in the biological context, whereas the literature presents several classical network alignment algorithms; (ii) there is a need for developing an evaluation framework that may enable a unified comparison between different algorithms; (iii) the majority of the proposed algorithms perform network embedding through matrix factorization-based techniques; (iv) three out of five algorithms leverage external biological resources, while the remaining two are designed for domain agnostic network alignment and tested on PPI networks; (v) two algorithms out of three are stated to perform multi-network alignment, while the remaining perform pairwise network alignment.
Collapse
|
8
|
Guo X, He H, Yu J, Shi S. PKSPS: a novel method for predicting kinase of specific phosphorylation sites based on maximum weighted bipartite matching algorithm and phosphorylation sequence enrichment analysis. Brief Bioinform 2021; 23:6398688. [PMID: 34661630 DOI: 10.1093/bib/bbab436] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2021] [Revised: 09/10/2021] [Accepted: 09/21/2021] [Indexed: 11/14/2022] Open
Abstract
With the development of biotechnology, a large number of phosphorylation sites have been experimentally confirmed and collected, but only a few of them have kinase annotations. Since experimental methods to detect kinases at specific phosphorylation sites are expensive and accidental, some computational methods have been proposed to predict the kinase of these sites, but most methods only consider single sequence information or single functional network information. In this study, a new method Predicting Kinase of Specific Phosphorylation Sites (PKSPS) is developed to predict kinases of specific phosphorylation sites in human proteins by combining PKSPS-Net with PKSPS-Seq, which considers protein-protein interaction (PPI) network information and sequence information. For PKSPS-Net, kinase-kinase and substrate-substrate similarity are quantified based on the topological similarity of proteins in the PPI network, and maximum weighted bipartite matching algorithm is proposed to predict kinase-substrate relationship. In PKSPS-Seq, phosphorylation sequence enrichment analysis is used to analyze the similarity of local sequences around phosphorylation sites and predict the kinase of specific phosphorylation sites (KSP). PKSPS has been proved to be more effective than the PKSPS-Net or PKSPS-Seq on different sets of kinases. Further comparison results show that the PKSPS method performs better than existing methods. Finally, the case study demonstrates the effectiveness of the PKSPS in predicting kinases of specific phosphorylation sites. The open source code and data of the PKSPS can be obtained from https://github.com/guoxinyunncu/PKSPS.
Collapse
Affiliation(s)
- Xinyun Guo
- Department of Mathematics and Numerical Simulation and High-Performance Computing Laboratory, School of Sciences, Nanchang University, Nanchang 330031, China
| | - Huan He
- Department of Mathematics and Numerical Simulation and High-Performance Computing Laboratory, School of Sciences, Nanchang University, Nanchang 330031, China
| | - Jialin Yu
- Department of Mathematics and Numerical Simulation and High-Performance Computing Laboratory, School of Sciences, Nanchang University, Nanchang 330031, China
| | - Shaoping Shi
- Department of Mathematics and Numerical Simulation and High-Performance Computing Laboratory, School of Sciences, Nanchang University, Nanchang 330031, China
| |
Collapse
|
9
|
Arsenescu V, Devkota K, Erden M, Shpilker P, Werenski M, Cowen LJ. MUNDO: protein function prediction embedded in a multispecies world. BIOINFORMATICS ADVANCES 2021; 2:vbab025. [PMID: 36699351 PMCID: PMC9710620 DOI: 10.1093/bioadv/vbab025] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/15/2021] [Revised: 09/11/2021] [Accepted: 09/23/2021] [Indexed: 01/28/2023]
Abstract
Motivation Leveraging cross-species information in protein function prediction can add significant power to network-based protein function prediction methods, because so much functional information is conserved across at least close scales of evolution. We introduce MUNDO, a new cross-species co-embedding method that combines a single-network embedding method with a co-embedding method to predict functional annotations in a target species, leveraging also functional annotations in a model species network. Results Across a wide range of parameter choices, MUNDO performs best at predicting annotations in the mouse network, when trained on mouse and human protein-protein interaction (PPI) networks, in the human network, when trained on human and mouse PPIs, and in Baker's yeast, when trained on Fission and Baker's yeast, as compared to competitor methods. MUNDO also outperforms all the cross-species methods when predicting in Fission yeast when trained on Fission and Baker's yeast; however, in this single case, discarding the information from the other species and using annotations from the Fission yeast network alone usually performs best. Availability and implementation All code is available and can be accessed here: github.com/v0rtex20k/MUNDO. Supplementary information Supplementary data are available at Bioinformatics Advances online. Additional experimental results are on our github site.
Collapse
Affiliation(s)
- Victor Arsenescu
- Department of Computer Science, Tufts University, Medford, MA 02155, USA
| | - Kapil Devkota
- Department of Computer Science, Tufts University, Medford, MA 02155, USA
| | - Mert Erden
- Department of Computer Science, Tufts University, Medford, MA 02155, USA
| | - Polina Shpilker
- Department of Computer Science, Tufts University, Medford, MA 02155, USA
| | - Matthew Werenski
- Department of Computer Science, Tufts University, Medford, MA 02155, USA
| | - Lenore J Cowen
- Department of Computer Science, Tufts University, Medford, MA 02155, USA
| |
Collapse
|
10
|
Woo HM, Yoon BJ. MONACO: accurate biological network alignment through optimal neighborhood matching between focal nodes. Bioinformatics 2021; 37:1401-1410. [PMID: 33165517 DOI: 10.1093/bioinformatics/btaa962] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2019] [Revised: 10/19/2020] [Accepted: 11/02/2020] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION Alignment of protein-protein interaction networks can be used for the unsupervised prediction of functional modules, such as protein complexes and signaling pathways, that are conserved across different species. To date, various algorithms have been proposed for biological network alignment, many of which attempt to incorporate topological similarity between the networks into the alignment process with the goal of constructing accurate and biologically meaningful alignments. Especially, random walk models have been shown to be effective for quantifying the global topological relatedness between nodes that belong to different networks by diffusing node-level similarity along the interaction edges. However, these schemes are not ideal for capturing the local topological similarity between nodes. RESULTS In this article, we propose MONACO, a novel and versatile network alignment algorithm that finds highly accurate pairwise and multiple network alignments through the iterative optimal matching of 'local' neighborhoods around focal nodes. Extensive performance assessment based on real networks as well as synthetic networks, for which the ground truth is known, demonstrates that MONACO clearly and consistently outperforms all other state-of-the-art network alignment algorithms that we have tested, in terms of accuracy, coherence and topological quality of the aligned network regions. Furthermore, despite the sharply enhanced alignment accuracy, MONACO remains computationally efficient and it scales well with increasing size and number of networks. AVAILABILITY AND IMPLEMENTATION Matlab implementation is freely available at https://github.com/bjyoontamu/MONACO. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Hyun-Myung Woo
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77843, USA
| | - Byung-Jun Yoon
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77843, USA.,TEES-AgriLife Center for Bioinformatics and Genomic Systems Engineering, Texas A&M University, College Station, TX 77845, USA.,Computational Science Initiative, Brookhaven National Laboratory, Upton, NY 11973, USA
| |
Collapse
|
11
|
Gao J, Tian L, Lv T, Wang J, Song B, Hu X. Protein2Vec: Aligning Multiple PPI Networks with Representation Learning. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:240-249. [PMID: 31478867 DOI: 10.1109/tcbb.2019.2937771] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Research of Protein-Protein Interaction (PPI) Network Alignment is playing an important role in understanding the crucial underlying biological knowledge such as functionally homologous proteins and conserved evolutionary pathways across different species. Existing methods of PPI network alignment often try to improve the coverage ratio of the alignment result by aligning all proteins from different species. However, there is a fundamental biological premise that needs to be considered carefully: not every protein in a species can, nor should, find its homologous proteins in other species. In this work, we propose a novel alignment method to map only those proteins with the most similarity throughout the PPI networks of multiple species. For the similarity features of the protein in the networks, we integrate both topological features with biological characteristics to provide enhanced supports for the alignment procedures. For topological features, we apply a representation learning method on the networks and generate a low dimensional vector embedding with its surrounding structural features for each protein. The topological similarity of proteins from different PPI networks can thus be transferred as the similarity of their corresponding vector representations, which provides a new way to comprehensively quantify the topological similarities between proteins. We also propose a new measure for the topological evaluation of the alignment results which better uncover the structural quality of the alignment across multiple networks. Both biological and topological evaluations on the alignment results of real datasets demonstrate our approach is promising and preferable against previous multiple alignment methods.
Collapse
|
12
|
Wang Y, Jeong H, Yoon BJ, Qian X. ClusterM: a scalable algorithm for computational prediction of conserved protein complexes across multiple protein interaction networks. BMC Genomics 2020; 21:615. [PMID: 33208103 PMCID: PMC7677834 DOI: 10.1186/s12864-020-07010-1] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023] Open
Abstract
Background The current computational methods on identifying conserved protein complexes across multiple Protein-Protein Interaction (PPI) networks suffer from the lack of explicit modeling of the desired topological properties within conserved protein complexes as well as their scalability. Results To overcome those issues, we propose a scalable algorithm—ClusterM—for identifying conserved protein complexes across multiple PPI networks through the integration of network topology and protein sequence similarity information. ClusterM overcomes the computational barrier that existed in previous methods, where the complexity escalates exponentially when handling an increasing number of PPI networks; and it is able to detect conserved protein complexes with both topological separability and cohesive protein sequence conservation. On two independent compendiums of PPI networks from Saccharomyces cerevisiae (Sce, yeast), Drosophila melanogaster (Dme, fruit fly), Caenorhabditis elegans (Cel, worm), and Homo sapiens (Hsa, human), we demonstrate that ClusterM outperforms other state-of-the-art algorithms by a significant margin and is able to identify de novo conserved protein complexes across four species that are missed by existing algorithms. Conclusions ClusterM can better capture the desired topological property of a typical conserved protein complex, which is densely connected within the complex while being well-separated from the rest of the networks. Furthermore, our experiments have shown that ClusterM is highly scalable and efficient when analyzing multiple PPI networks.
Collapse
|
13
|
Kazemi E, Grossglauser M. MPGM: Scalable and Accurate Multiple Network Alignment. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:2040-2052. [PMID: 31056510 DOI: 10.1109/tcbb.2019.2914050] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Protein-protein interaction (PPI) network alignment is a canonical operation to transfer biological knowledge among species. The alignment of PPI-networks has many applications, such as the prediction of protein function, detection of conserved network motifs, and the reconstruction of species' phylogenetic relationships. A good multiple-network alignment (MNA), by considering the data related to several species, provides a deep understanding of biological networks and system-level cellular processes. With the massive amounts of available PPI data and the increasing number of known PPI networks, the problem of MNA is gaining more attention in the systems-biology studies. In this paper, we introduce a new scalable and accurate algorithm, called MPGM, for aligning multiple networks. The MPGM algorithm has two main steps: (i) SeedGeneration and (ii) MultiplePercolation. In the first step, to generate an initial set of seed tuples, the SeedGeneration algorithm uses only protein sequence similarities. In the second step, to align remaining unmatched nodes, the MultiplePercolation algorithm uses network structures and the seed tuples generated from the first step. We show that, with respect to different evaluation criteria, MPGM outperforms the other state-of-the-art algorithms. In addition, we guarantee the performance of MPGM under certain classes of network models. We introduce a sampling-based stochastic model for generating k correlated networks. We prove that for this model if a sufficient number of seed tuples are available, the MultiplePercolation algorithm correctly aligns almost all the nodes. Our theoretical results are supported by experimental evaluations over synthetic networks.
Collapse
|
14
|
Ma CY, Liao CS. A review of protein-protein interaction network alignment: From pathway comparison to global alignment. Comput Struct Biotechnol J 2020; 18:2647-2656. [PMID: 33033584 PMCID: PMC7533294 DOI: 10.1016/j.csbj.2020.09.011] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2020] [Revised: 09/01/2020] [Accepted: 09/05/2020] [Indexed: 12/13/2022] Open
Abstract
Network alignment provides a comprehensive way to discover the similar parts between molecular systems of different species based on topological and biological similarity. With such a strong basis, one can do comparative studies at a systems level in the field of computational biology. In this survey paper, we focus on protein-protein interaction networks and review some representative algorithms for network alignment in the past two decades as well as the state-of-the-art aligners. We also introduce the most popular evaluation measures in the literature to benchmark the performance of these approaches. Finally, we address several future challenges and the possible ways to conquer the existing problems of biological network alignment.
Collapse
Affiliation(s)
- Cheng-Yu Ma
- Chang Gung Memorial Hospital, No. 5, Fu-Hsing St., Kuei Shan Dist., Taoyuan City 33305, Taiwan, ROC
| | - Chung-Shou Liao
- National Tsing Hua University, No. 101, Section 2, Kuang-Fu Rd., Hsinchu City 30013, Taiwan, ROC
| |
Collapse
|
15
|
Zhong Y, Li J, He J, Gao Y, Liu J, Wang J, Shang X, Hu J. Twadn: an efficient alignment algorithm based on time warping for pairwise dynamic networks. BMC Bioinformatics 2020; 21:385. [PMID: 32938373 PMCID: PMC7495832 DOI: 10.1186/s12859-020-03672-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Network alignment is an efficient computational framework in the prediction of protein function and phylogenetic relationships in systems biology. However, most of existing alignment methods focus on aligning PPIs based on static network model, which are actually dynamic in real-world systems. The dynamic characteristic of PPI networks is essential for understanding the evolution and regulation mechanism at the molecular level and there is still much room to improve the alignment quality in dynamic networks. RESULTS In this paper, we proposed a novel alignment algorithm, Twadn, to align dynamic PPI networks based on a strategy of time warping. We compare Twadn with the existing dynamic network alignment algorithm DynaMAGNA++ and DynaWAVE and use area under the receiver operating characteristic curve and area under the precision-recall curve as evaluation indicators. The experimental results show that Twadn is superior to DynaMAGNA++ and DynaWAVE. In addition, we use protein interaction network of Drosophila to compare Twadn and the static network alignment algorithm NetCoffee2 and experimental results show that Twadn is able to capture timing information compared to NetCoffee2. CONCLUSIONS Twadn is a versatile and efficient alignment tool that can be applied to dynamic network. Hopefully, its application can benefit the research community in the fields of molecular function and evolution.
Collapse
Affiliation(s)
- Yuanke Zhong
- School of Computer Science, Northwestern Polytechnical University, West Youyi Road 127, Xi’an, 710072 China
| | - Jing Li
- Xi’an Mingde Institute of Technology, Fenghe Campus, Fenghe Campus, Xi’an, 710124 China
| | - Junhao He
- School of Computer Science, Northwestern Polytechnical University, West Youyi Road 127, Xi’an, 710072 China
| | - Yiqun Gao
- School of Computer Science, Northwestern Polytechnical University, West Youyi Road 127, Xi’an, 710072 China
| | - Jie Liu
- School of Computer Science, Northwestern Polytechnical University, West Youyi Road 127, Xi’an, 710072 China
| | - Jingru Wang
- School of Computer Science, Northwestern Polytechnical University, West Youyi Road 127, Xi’an, 710072 China
| | - Xuequn Shang
- School of Computer Science, Northwestern Polytechnical University, West Youyi Road 127, Xi’an, 710072 China
| | - Jialu Hu
- School of Computer Science, Northwestern Polytechnical University, West Youyi Road 127, Xi’an, 710072 China
- Centre of Multidisciplinary Convergence Computing, School of Computer Science, Northwestern Polytechnical University, 1 Dong Xiang Road, Xi’an, 710129 China
| |
Collapse
|
16
|
Abstract
In this study, we deal with the problem of biological network alignment (NA), which aims to find a node mapping between species' molecular networks that uncovers similar network regions, thus allowing for the transfer of functional knowledge between the aligned nodes. We provide evidence that current NA methods, which assume that topologically similar nodes (i.e., nodes whose network neighborhoods are isomorphic-like) have high functional relatedness, do not actually end up aligning functionally related nodes. That is, we show that the current topological similarity assumption does not hold well. Consequently, we argue that a paradigm shift is needed with how the NA problem is approached. So, we redefine NA as a data-driven framework, called TARA (data-driven NA), which attempts to learn the relationship between topological relatedness and functional relatedness without assuming that topological relatedness corresponds to topological similarity. TARA makes no assumptions about what nodes should be aligned, distinguishing it from existing NA methods. Specifically, TARA trains a classifier to predict whether two nodes from different networks are functionally related based on their network topological patterns (features). We find that TARA is able to make accurate predictions. TARA then takes each pair of nodes that are predicted as related to be part of an alignment. Like traditional NA methods, TARA uses this alignment for the across-species transfer of functional knowledge. TARA as currently implemented uses topological but not protein sequence information for functional knowledge transfer. In this context, we find that TARA outperforms existing state-of-the-art NA methods that also use topological information, WAVE and SANA, and even outperforms or complements a state-of-the-art NA method that uses both topological and sequence information, PrimAlign. Hence, adding sequence information to TARA, which is our future work, is likely to further improve its performance. The software and data are available at http://www.nd.edu/~cone/TARA/.
Collapse
Affiliation(s)
- Shawn Gu
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN, United States of America
- Eck Institute for Global Health, University of Notre Dame, Notre Dame, IN, United States of America
- Center for Network and Data Science, University of Notre Dame, Notre Dame, IN, United States of America
| | - Tijana Milenković
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN, United States of America
- Eck Institute for Global Health, University of Notre Dame, Notre Dame, IN, United States of America
- Center for Network and Data Science, University of Notre Dame, Notre Dame, IN, United States of America
| |
Collapse
|
17
|
Chen CC, Jeong H, Qian X, Yoon BJ. TOPAS: network-based structural alignment of RNA sequences. Bioinformatics 2020; 35:2941-2948. [PMID: 30629122 DOI: 10.1093/bioinformatics/btz001] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2017] [Revised: 12/07/2018] [Accepted: 01/04/2019] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION For many RNA families, the secondary structure is known to be better conserved among the member RNAs compared to the primary sequence. For this reason, it is important to consider the underlying folding structures when aligning RNA sequences, especially for those with relatively low sequence identity. Given a set of RNAs with unknown structures, simultaneous RNA alignment and folding algorithms aim to accurately align the RNAs by jointly predicting their consensus secondary structure and the optimal sequence alignment. Despite the improved accuracy of the resulting alignment, the computational complexity of simultaneous alignment and folding for a pair of RNAs is O(N6), which is too costly to be used for large-scale analysis. RESULTS In order to address this shortcoming, in this work, we propose a novel network-based scheme for pairwise structural alignment of RNAs. The proposed algorithm, TOPAS, builds on the concept of topological networks that provide structural maps of the RNAs to be aligned. For each RNA sequence, TOPAS first constructs a topological network based on the predicted folding structure, which consists of sequential edges and structural edges weighted by the base-pairing probabilities. The obtained networks can then be efficiently aligned by using probabilistic network alignment techniques, thereby yielding the structural alignment of the RNAs. The computational complexity of our proposed method is significantly lower than that of the Sankoff-style dynamic programming approach, while yielding favorable alignment results. Furthermore, another important advantage of the proposed algorithm is its capability of handling RNAs with pseudoknots while predicting the RNA structural alignment. We demonstrate that TOPAS generally outperforms previous RNA structural alignment methods on RNA benchmarks in terms of both speed and accuracy. AVAILABILITY AND IMPLEMENTATION Source code of TOPAS and the benchmark data used in this paper are available at https://github.com/bjyoontamu/TOPAS.
Collapse
Affiliation(s)
- Chun-Chi Chen
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX, USA.,TEES-AgriLife Center for Bioinformatics & Genomic Systems Engineering, Texas A&M University, College Station, TX, USA
| | - Hyundoo Jeong
- Department of Electronic Engineering, Chosun University, Gwangju, Republic of Korea
| | - Xiaoning Qian
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX, USA.,TEES-AgriLife Center for Bioinformatics & Genomic Systems Engineering, Texas A&M University, College Station, TX, USA
| | - Byung-Jun Yoon
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX, USA.,TEES-AgriLife Center for Bioinformatics & Genomic Systems Engineering, Texas A&M University, College Station, TX, USA
| |
Collapse
|
18
|
Vijayan V, Gu S, Krebs ET, Meng L, MilenkoviĆ T. Pairwise Versus Multiple Global Network Alignment. IEEE ACCESS : PRACTICAL INNOVATIONS, OPEN SOLUTIONS 2020; 8:41961-41974. [PMID: 33747670 PMCID: PMC7971151 DOI: 10.1109/access.2020.2976487] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Biological network alignment (NA) aims to identify similar regions between molecular networks of different species. NA can be local or global. Just as the recent trend in the NA field, we also focus on global NA, which can be pairwise (PNA) and multiple (MNA). PNA produces aligned node pairs between two networks. MNA produces aligned node clusters between more than two networks. Recently, the focus has shifted from PNA to MNA, because MNA captures conserved regions between more networks than PNA (and MNA is thus hypothesized to yield higher-quality alignments), though at higher computational complexity. The issue is that, due to the different outputs of PNA and MNA, a PNA method is only compared to other PNA methods, and an MNA method is only compared to other MNA methods. Comparison of PNA against MNA must be done to evaluate whether MNA indeed yields higher-quality alignments, as only this would justify MNA's higher computational complexity. We introduce a framework that allows for this. We evaluate eight prominent PNA and MNA methods, on synthetic and real-world biological networks, using topological and functional alignment quality measures. We compare PNA against MNA in both a pairwise (native to PNA) and multiple (native to MNA) manner. PNA is expected to perform better under the pairwise evaluation framework. Indeed this is what we find. MNA is expected to perform better under the multiple evaluation framework. Shockingly, we find this not always to hold; PNA is often better than MNA in this framework, depending on the choice of evaluation test.
Collapse
Affiliation(s)
- Vipin Vijayan
- Center for Network and Data Science, Department of Computer Science and Engineering, Eck Institute for Global Health, University of Notre Dame, Notre Dame, IN 46556, USA
| | - Shawn Gu
- Center for Network and Data Science, Department of Computer Science and Engineering, Eck Institute for Global Health, University of Notre Dame, Notre Dame, IN 46556, USA
| | - Eric T Krebs
- Center for Network and Data Science, Department of Computer Science and Engineering, Eck Institute for Global Health, University of Notre Dame, Notre Dame, IN 46556, USA
| | - Lei Meng
- Center for Network and Data Science, Department of Computer Science and Engineering, Eck Institute for Global Health, University of Notre Dame, Notre Dame, IN 46556, USA
| | - Tijana MilenkoviĆ
- Center for Network and Data Science, Department of Computer Science and Engineering, Eck Institute for Global Health, University of Notre Dame, Notre Dame, IN 46556, USA
| |
Collapse
|
19
|
Woo HM, Jeong H, Yoon BJ. NAPAbench 2: A network synthesis algorithm for generating realistic protein-protein interaction (PPI) network families. PLoS One 2020; 15:e0227598. [PMID: 31986158 PMCID: PMC6984706 DOI: 10.1371/journal.pone.0227598] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2019] [Accepted: 12/23/2019] [Indexed: 11/18/2022] Open
Abstract
Comparative network analysis provides effective computational means for gaining novel insights into the structural and functional compositions of biological networks. In recent years, various methods have been developed for biological network alignment, whose main goal is to identify important similarities and critical differences between networks in terms of their topology and composition. A major impediment to advancing network alignment techniques has been the lack of gold-standard benchmarks that can be used for accurate and comprehensive performance assessment of such algorithms. The original NAPAbench (network alignment performance assessment benchmark) was developed to address this problem, and it has been widely utilized by many researchers for the development, evaluation, and comparison of novel network alignment techniques. In this work, we introduce NAPAbench 2-a major update of the original NAPAbench that was introduced in 2012. NAPAbench 2 includes a completely redesigned network synthesis algorithm that can generate protein-protein interaction (PPI) network families whose characteristics closely match those of the latest real PPI networks. Furthermore, the network synthesis algorithm comes with an intuitive GUI that allows users to easily generate PPI network families with an arbitrary number of networks of any size, according to a flexible user-defined phylogeny. In addition, NAPAbench 2 provides updated benchmark datasets-created using the redesigned network synthesis algorithm-which can be used for comprehensive performance assessment of network alignment algorithms and their scalability.
Collapse
Affiliation(s)
- Hyun-Myung Woo
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, Texas, United States of America
| | - Hyundoo Jeong
- Department of Mechatronics Engineering, Incheon National University, Incheon, Republic of Korea
| | - Byung-Jun Yoon
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, Texas, United States of America
- TEES-AgriLife Center for Bioinformatics and Genomic Systems Engineering, Texas A&M University, College Station, TX, United States of America
- Computational Science Initiative, Brookhaven National Laboratory, Upton, NY, United States of America
- * E-mail:
| |
Collapse
|
20
|
Hu J, He J, Li J, Gao Y, Zheng Y, Shang X. A novel algorithm for alignment of multiple PPI networks based on simulated annealing. BMC Genomics 2019; 20:932. [PMID: 31881842 PMCID: PMC6933650 DOI: 10.1186/s12864-019-6302-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023] Open
Abstract
Proteins play essential roles in almost all life processes. The prediction of protein function is of significance for the understanding of molecular function and evolution. Network alignment provides a fast and effective framework to automatically identify functionally conserved proteins in a systematic way. However, due to the fast growing genomic data, interactions and annotation data, there is an increasing demand for more accurate and efficient tools to deal with multiple PPI networks. Here, we present a novel global alignment algorithm NetCoffee2 based on graph feature vectors to discover functionally conserved proteins and predict function for unknown proteins. To test the algorithm performance, NetCoffee2 and three other notable algorithms were applied on eight real biological datasets. Functional analyses were performed to evaluate the biological quality of these alignments. Results show that NetCoffee2 is superior to existing algorithms IsoRankN, NetCoffee and multiMAGNA++ in terms of both coverage and consistency. The binary and source code are freely available under the GNU GPL v3 license at https://github.com/screamer/NetCoffee2.
Collapse
Affiliation(s)
- Jialu Hu
- School of Computer Science, Northwestern Polytechnical University, West Youyi Road 127, Xi’an, 710072 China
- Centre of Multidisciplinary Convergence Computing, School of Computer Science, Northwestern Polytechnical University, 1 Dong Xiang Road, Xi’an, 710129 China
| | - Junhao He
- School of Computer Science, Northwestern Polytechnical University, West Youyi Road 127, Xi’an, 710072 China
| | - Jing Li
- Ming De College, Northwestern Polytechnical University, Feng He Campus, Xi’an, 710124 China
| | - Yiqun Gao
- School of Computer Science, Northwestern Polytechnical University, West Youyi Road 127, Xi’an, 710072 China
| | - Yan Zheng
- School of Computer Science, Northwestern Polytechnical University, West Youyi Road 127, Xi’an, 710072 China
| | - Xuequn Shang
- School of Computer Science, Northwestern Polytechnical University, West Youyi Road 127, Xi’an, 710072 China
| |
Collapse
|
21
|
Maskey S, Cho YR. LePrimAlign: local entropy-based alignment of PPI networks to predict conserved modules. BMC Genomics 2019; 20:964. [PMID: 31874635 PMCID: PMC6929407 DOI: 10.1186/s12864-019-6271-3] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Open
Abstract
Background Cross-species analysis of protein-protein interaction (PPI) networks provides an effective means of detecting conserved interaction patterns. Identifying such conserved substructures between PPI networks of different species increases our understanding of the principles deriving evolution of cellular organizations and their functions in a system level. In recent years, network alignment techniques have been applied to genome-scale PPI networks to predict evolutionary conserved modules. Although a wide variety of network alignment algorithms have been introduced, developing a scalable local network alignment algorithm with high accuracy is still challenging. Results We present a novel pairwise local network alignment algorithm, called LePrimAlign, to predict conserved modules between PPI networks of three different species. The proposed algorithm exploits the results of a pairwise global alignment algorithm with many-to-many node mapping. It also applies the concept of graph entropy to detect initial cluster pairs from two networks. Finally, the initial clusters are expanded to increase the local alignment score that is formulated by a combination of intra-network and inter-network scores. The performance comparison with state-of-the-art approaches demonstrates that the proposed algorithm outperforms in terms of accuracy of identified protein complexes and quality of alignments. Conclusion The proposed method produces local network alignment of higher accuracy in predicting conserved modules even with large biological networks at a reduced computational cost.
Collapse
Affiliation(s)
- Sawal Maskey
- Department of Computer Science, Baylor University, One Bear Place #97141, Waco, 76798, TX, USA
| | - Young-Rae Cho
- Department of Computer Science, Baylor University, One Bear Place #97141, Waco, 76798, TX, USA. .,Bioinformatics Program, Baylor University, One Bear Place #97141, Waco, 76798, TX, USA.
| |
Collapse
|
22
|
Wen Bin Goh W, Thalappilly S, Thibault G. Moving beyond the current limits of data analysis in longevity and healthy lifespan studies. Drug Discov Today 2019; 24:2273-2285. [PMID: 31499187 DOI: 10.1016/j.drudis.2019.08.008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2019] [Revised: 08/03/2019] [Accepted: 08/28/2019] [Indexed: 11/19/2022]
Abstract
Living longer with sustainable quality of life is becoming increasingly important in aging populations. Understanding associative biological mechanisms have proven daunting, because of multigenicity and population heterogeneity. Although Big Data and Artificial Intelligence (AI) could help, naïve adoption is ill advised. We hold the view that model organisms are better suited for big-data analytics but might lack relevance because they do not immediately reflect the human condition. Resolving this hurdle and bridging the human-model organism gap will require some finesse. This includes improving signal:noise ratios by appropriate contextualization of high-throughput data, establishing consistency across multiple high-throughput platforms, and adopting supporting technologies that provide useful in silico and in vivo validation strategies.
Collapse
Affiliation(s)
- Wilson Wen Bin Goh
- Bio-Data Science and Education Research Group, School of Biological Sciences, Nanyang Technological University, 637551, Singapore.
| | - Subhash Thalappilly
- Lipid Regulation and Cell Stress Research Group, School of Biological Sciences, Nanyang Technological University, 637551, Singapore
| | - Guillaume Thibault
- Lipid Regulation and Cell Stress Research Group, School of Biological Sciences, Nanyang Technological University, 637551, Singapore; Institute of Molecular and Cell Biology, A*STAR, 138673, Singapore.
| |
Collapse
|
23
|
Kalecky K, Cho YR. PrimAlign: PageRank-inspired Markovian alignment for large biological networks. Bioinformatics 2019; 34:i537-i546. [PMID: 29949962 PMCID: PMC6022567 DOI: 10.1093/bioinformatics/bty288] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023] Open
Abstract
Motivation Cross-species analysis of large-scale protein–protein interaction (PPI) networks has played a significant role in understanding the principles deriving evolution of cellular organizations and functions. Recently, network alignment algorithms have been proposed to predict conserved interactions and functions of proteins. These approaches are based on the notion that orthologous proteins across species are sequentially similar and that topology of PPIs between orthologs is often conserved. However, high accuracy and scalability of network alignment are still a challenge. Results We propose a novel pairwise global network alignment algorithm, called PrimAlign, which is modeled as a Markov chain and iteratively transited until convergence. The proposed algorithm also incorporates the principles of PageRank. This approach is evaluated on tasks with human, yeast and fruit fly PPI networks. The experimental results demonstrate that PrimAlign outperforms several prevalent methods with statistically significant differences in multiple evaluation measures. PrimAlign, which is multi-platform, achieves superior performance in runtime with its linear asymptotic time complexity. Further evaluation is done with synthetic networks and results suggest that popular topological measures do not reflect real precision of alignments. Availability and implementation The source code is available at http://web.ecs.baylor.edu/faculty/cho/PrimAlign. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Karel Kalecky
- Institute of Biomedical Studies, Baylor University, Waco, TX, USA
| | - Young-Rae Cho
- Department of Computer Science, Baylor University, Waco, TX, USA
| |
Collapse
|
24
|
Abstract
BACKGROUND Biological networks describes the mechanisms which govern cellular functions. Temporal networks show how these networks evolve over time. Studying the temporal progression of network topologies is of utmost importance since it uncovers how a network evolves and how it resists to external stimuli and internal variations. Two temporal networks have co-evolving subnetworks if the evolving topologies of these subnetworks remain similar to each other as the network topology evolves over a period of time. In this paper, we consider the problem of identifying co-evolving subnetworks given a pair of temporal networks, which aim to capture the evolution of molecules and their interactions over time. Although this problem shares some characteristics of the well-known network alignment problems, it differs from existing network alignment formulations as it seeks a mapping of the two network topologies that is invariant to temporal evolution of the given networks. This is a computationally challenging problem as it requires capturing not only similar topologies between two networks but also their similar evolution patterns. RESULTS We present an efficient algorithm, Tempo, for solving identifying co-evolving subnetworks with two given temporal networks. We formally prove the correctness of our method. We experimentally demonstrate that Tempo scales efficiently with the size of network as well as the number of time points, and generates statistically significant alignments-even when evolution rates of given networks are high. Our results on a human aging dataset demonstrate that Tempo identifies novel genes contributing to the progression of Alzheimer's, Huntington's and Type II diabetes, while existing methods fail to do so. CONCLUSIONS Studying temporal networks in general and human aging specifically using Tempo enables us to identify age related genes from non age related genes successfully. More importantly, Tempo takes the network alignment problem one huge step forward by moving beyond the classical static network models.
Collapse
Affiliation(s)
- Rasha Elhesha
- University of Florida, CISE Department, Gainesville, Florida, 32611, US
| | - Aisharjya Sarkar
- University of Florida, CISE Department, Gainesville, Florida, 32611, US
| | - Christina Boucher
- University of Florida, CISE Department, Gainesville, Florida, 32611, US
| | - Tamer Kahveci
- University of Florida, CISE Department, Gainesville, Florida, 32611, US.
| |
Collapse
|
25
|
Guzzi PH, Milenkovic T. Survey of local and global biological network alignment: the need to reconcile the two sides of the same coin. Brief Bioinform 2019; 19:472-481. [PMID: 28062413 DOI: 10.1093/bib/bbw132] [Citation(s) in RCA: 30] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2016] [Indexed: 12/23/2022] Open
Abstract
Analogous to genomic sequence alignment that allows for across-species transfer of biological knowledge between conserved sequence regions, biological network alignment can be used to guide the knowledge transfer between conserved regions of molecular networks of different species. Hence, biological network alignment can be used to redefine the traditional notion of a sequence-based homology to a new notion of network-based homology. Analogous to genomic sequence alignment, there exist local and global biological network alignments. Here, we survey prominent and recent computational approaches of each network alignment type and discuss their (dis)advantages. Then, as it was recently shown that the two approach types are complementary, in the sense that they capture different slices of cellular functioning, we discuss the need to reconcile the two network alignment types and present a recent first step in this direction. We conclude with some open research problems on this topic and comment on the usefulness of network alignment in other domains besides computational biology.
Collapse
Affiliation(s)
- Pietro Hiram Guzzi
- Department of Surgical and Medical Sciences, University Magna Graecia, Catanzaro, 88100 Italy
| | - Tijana Milenkovic
- Department of Computer Science and Engineering, Interdisciplinary Center for Network Science and Applications (iCeNSA), ECK Institute for Global Health, University of Notre Dame, Notre Dame, IN 46556, USA
| |
Collapse
|
26
|
Jing F, Zhang SW, Zhang S. Brief Survey of Biological Network Alignment and a Variant with Incorporation of Functional Annotations. Curr Bioinform 2018. [DOI: 10.2174/1574893612666171020103747] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Background:Biological network alignment has been widely studied in the context of protein-protein interaction (PPI) networks, metabolic networks and others in bioinformatics. The topological structure of networks and genomic sequence are generally used by existing methods for achieving this task.Objective and Method:Here we briefly survey the methods generally used for this task and introduce a variant with incorporation of functional annotations based on similarity in Gene Ontology (GO). Making full use of GO information is beneficial to provide insights into precise biological network alignment.Results and Conclusion:We analyze the effect of incorporation of GO information to network alignment. Finally, we make a brief summary and discuss future directions about this topic.
Collapse
Affiliation(s)
- Fang Jing
- Key Laboratory of Information Fusion Technology of Ministry of Education, College of Automation, Northwestern Polytechnical University, Xi'an 710072, China
| | - Shao-Wu Zhang
- Key Laboratory of Information Fusion Technology of Ministry of Education, College of Automation, Northwestern Polytechnical University, Xi'an 710072, China
| | - Shihua Zhang
- NCMIS, CEMS, RCSDS, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China
| |
Collapse
|
27
|
Hu J, Gao Y, He J, Zheng Y, Shang X. WebNetCoffee: a web-based application to identify functionally conserved proteins from Multiple PPI networks. BMC Bioinformatics 2018; 19:422. [PMID: 30419809 PMCID: PMC6233501 DOI: 10.1186/s12859-018-2443-4] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2018] [Accepted: 10/22/2018] [Indexed: 12/29/2022] Open
Abstract
BACKGROUND The discovery of functionally conserved proteins is a tough and important task in system biology. Global network alignment provides a systematic framework to search for these proteins from multiple protein-protein interaction (PPI) networks. Although there exist many web servers for network alignment, no one allows to perform global multiple network alignment tasks on users' test datasets. RESULTS Here, we developed a web server WebNetcoffee based on the algorithm of NetCoffee to search for a global network alignment from multiple networks. To build a series of online test datasets, we manually collected 218,339 proteins, 4,009,541 interactions and many other associated protein annotations from several public databases. All these datasets and alignment results are available for download, which can support users to perform algorithm comparison and downstream analyses. CONCLUSION WebNetCoffee provides a versatile, interactive and user-friendly interface for easily running alignment tasks on both online datasets and users' test datasets, managing submitted jobs and visualizing the alignment results through a web browser. Additionally, our web server also facilitates graphical visualization of induced subnetworks for a given protein and its neighborhood. To the best of our knowledge, it is the first web server that facilitates the performing of global alignment for multiple PPI networks. AVAILABILITY http://www.nwpu-bioinformatics.com/WebNetCoffee.
Collapse
Affiliation(s)
- Jialu Hu
- School of Computer Science, Northwestern Polytechnical University, Xi’an, 710072 China
- Centre for Multidisciplinary Convergence Computing, School of Computer Science, Northwestern Polytechnical University, Xi’an, 710129 China
| | - Yiqun Gao
- School of Computer Science, Northwestern Polytechnical University, Xi’an, 710072 China
| | - Junhao He
- School of Computer Science, Northwestern Polytechnical University, Xi’an, 710072 China
| | - Yan Zheng
- School of Computer Science, Northwestern Polytechnical University, Xi’an, 710072 China
| | - Xuequn Shang
- School of Computer Science, Northwestern Polytechnical University, Xi’an, 710072 China
| |
Collapse
|
28
|
Djeddi WE, Yahia SB, Nguifo EM. A Novel Computational Approach for Global Alignment for Multiple Biological Networks. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2018; 15:2060-2066. [PMID: 29994444 DOI: 10.1109/tcbb.2018.2808529] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Due to the rapid progress of biological networks for modeling biological systems, a lot of biomolecular networks have been producing more and more protein-protein interaction (PPI) data. Analyzing protein-protein interaction networks aims to find regions of topological and functional (dis)similarities between molecular networks of different species. The study of PPI networks has the potential to teach us as much about life process and diseases at the molecular level. Although few methods have been developed for multiple PPI network alignment and thus, new network alignment methods are of a compelling need. In this paper, we propose a novel algorithm for a global alignment of multiple protein-protein interaction networks called MAPPIN. The latter relies on information available for the proteins in the networks, such as sequence, function, and network topology. Our algorithm is perfectly designed to exploit current multi-core CPU architectures, and has been extensively tested on a real data (eight species). Our experimental results show that MAPPIN significantly outperforms NetCoffee in terms of coverage. Nevertheless, MAPPIN is handicapped by the time required to load the gene annotation file. An extensive comparison versus the pioneering PPI methods also show that MAPPIN is often efficient in terms of coverage, mean entropy, or mean normalized.
Collapse
|
29
|
|
30
|
Jeong H, Qian X, Yoon BJ. CUFID-query: accurate network querying through random walk based network flow estimation. BMC Bioinformatics 2017; 18:500. [PMID: 29297279 PMCID: PMC5751815 DOI: 10.1186/s12859-017-1899-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023] Open
Abstract
BACKGROUND Functional modules in biological networks consist of numerous biomolecules and their complicated interactions. Recent studies have shown that biomolecules in a functional module tend to have similar interaction patterns and that such modules are often conserved across biological networks of different species. As a result, such conserved functional modules can be identified through comparative analysis of biological networks. RESULTS In this work, we propose a novel network querying algorithm based on the CUFID (Comparative network analysis Using the steady-state network Flow to IDentify orthologous proteins) framework combined with an efficient seed-and-extension approach. The proposed algorithm, CUFID-query, can accurately detect conserved functional modules as small subnetworks in the target network that are expected to perform similar functions to the given query functional module. The CUFID framework was recently developed for probabilistic pairwise global comparison of biological networks, and it has been applied to pairwise global network alignment, where the framework was shown to yield accurate network alignment results. In the proposed CUFID-query algorithm, we adopt the CUFID framework and extend it for local network alignment, specifically to solve network querying problems. First, in the seed selection phase, the proposed method utilizes the CUFID framework to compare the query and the target networks and to predict the probabilistic node-to-node correspondence between the networks. Next, the algorithm selects and greedily extends the seed in the target network by iteratively adding nodes that have frequent interactions with other nodes in the seed network, in a way that the conductance of the extended network is maximally reduced. Finally, CUFID-query removes irrelevant nodes from the querying results based on the personalized PageRank vector for the induced network that includes the fully extended network and its neighboring nodes. CONCLUSIONS Through extensive performance evaluation based on biological networks with known functional modules, we show that CUFID-query outperforms the existing state-of-the-art algorithms in terms of prediction accuracy and biological significance of the predictions.
Collapse
Affiliation(s)
- Hyundoo Jeong
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, 77843, TX, USA.,Department of Neuroogy, Baylor College of Medicine, Houston, TX, USA.,Jan and Dan Duncan Neurological Research Institute, Texas Children's Hospital, Houston, TX, USA
| | - Xiaoning Qian
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, 77843, TX, USA.,TEES-AgriLife Center for Bioinformatics and Genomic Systems Engineering (CBGSE), College Station, TX, USA
| | - Byung-Jun Yoon
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, 77843, TX, USA. .,TEES-AgriLife Center for Bioinformatics and Genomic Systems Engineering (CBGSE), College Station, TX, USA.
| |
Collapse
|
31
|
Alkan F, Erten C. RedNemo: topology-based PPI network reconstruction via repeated diffusion with neighborhood modifications. Bioinformatics 2017; 33:537-544. [PMID: 27797764 DOI: 10.1093/bioinformatics/btw655] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2016] [Accepted: 10/12/2016] [Indexed: 01/28/2023] Open
Abstract
Motivation Analysis of protein-protein interaction (PPI) networks provides invaluable insight into several systems biology problems. High-throughput experimental techniques together with computational methods provide large-scale PPI networks. However, a major issue with these networks is their erroneous nature; they contain false-positive interactions and usually many more false-negatives. Recently, several computational methods have been proposed for network reconstruction based on topology, where given an input PPI network the goal is to reconstruct the network by identifying false-positives/-negatives as correctly as possible. Results We observe that the existing topology-based network reconstruction algorithms suffer several shortcomings. An important issue is regarding the scalability of their computational requirements, especially in terms of execution times, with the network sizes. They have only been tested on small-scale networks thus far and when applied on large-scale networks of popular PPI databases, the executions require unreasonable amounts of time, or may even crash without producing any output for some instances even after several months of execution. We provide an algorithm, RedNemo, for the topology-based network reconstruction problem. It provides more accurate networks than the alternatives as far as biological qualities measured in terms of most metrics based on gene ontology annotations. The recovery of a high-confidence network modified via random edge removals and rewirings is also better with RedNemo than with the alternatives under most of the experimented removal/rewiring ratios. Furthermore, through extensive tests on databases of varying sizes, we show that RedNemo achieves these results with much better running time performances. Availability and Implementation Supplementary material including source code, useful scripts, experimental data and the results are available at http://webprs.khas.edu.tr/~cesim/RedNemo.tar.gz. Contact cesim@khas.edu.tr. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Ferhat Alkan
- Center for Non-coding RNA in Technology and Health.,Department of Veterinary Clinical and Animal Sciences, University of Copenhagen, Grønnegardsvej 3, Frederiksberg, DK1870, Denmark
| | - Cesim Erten
- Department of Computer Engineering, Kadir Has University, Cibali, 34083 Istanbul, Turkey
| |
Collapse
|
32
|
Mir A, Naghibzadeh M, Saadati N. INDEX: Incremental depth extension approach for protein-protein interaction networks alignment. Biosystems 2017; 162:24-34. [PMID: 28860070 DOI: 10.1016/j.biosystems.2017.08.005] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2016] [Revised: 05/29/2017] [Accepted: 08/17/2017] [Indexed: 12/11/2022]
Abstract
High-throughput methods have provided us with a large amount of data pertaining to protein-protein interaction networks. The alignment of these networks enables us to better understand biological systems. Given the fact that the alignment of networks is computationally intractable, it is important to introduce a more efficient and accurate algorithm which finds as large as possible similar areas among networks. This paper proposes a new algorithm named INDEX for the global alignment of protein-protein interaction networks. INDEX has multiple phases. First, it computes topological and biological scores of proteins and creates the initial alignment based on the proposed matching score strategy. Using networks topologies and aligned proteins, it then selects a set of high scoring proteins in each phase and extends new alignments around them until final alignment is obtained. Proposing a new alignment strategy, detailed consideration of matching scores, and growth of the alignment core has led INDEX to obtain a larger common connected subgraph with a much greater number of edges compared with previous methods. Regarding other measures such as edge correctness, symmetric substructure score, and runtime, the proposed algorithm performed considerably better than existing popular methods. Our results show that INDEX can be a promising method for identifying functionally conserved interactions. AVAILABILITY The INDEX executable file is available at https://github.com/a-mir/index/.
Collapse
Affiliation(s)
- Abolfazl Mir
- Department of Computer Software Engineering, Mashhad Branch, Islamic Azad University, Mashhad, Iran.
| | - Mahmoud Naghibzadeh
- Department of Computer Engineering, Ferdowsi University of Mashhad, Mashhad, Iran
| | - Nayyereh Saadati
- Department of Internal Medicine, Ghaem Hospital, Mashhad University of Medical Sciences, Mashhad, Iran
| |
Collapse
|
33
|
Gao J, Song B, Ke W, Hu X. BalanceAli: Multiple PPI Network Alignment With Balanced High Coverage and Consistency. IEEE Trans Nanobioscience 2017; 16:333-340. [PMID: 28541215 DOI: 10.1109/tnb.2017.2705521] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Coverage and consistency are two most considered metrics to evaluate the effectiveness of network alignment. But they are a pair of contradictory evaluation metrics in protein-protein interaction (PPI) network alignment. It is difficult, if not impossible, to achieve high coverage and consistency simultaneously. Furthermore, existing methods of multiple PPI network alignment mostly ignore k-coverage or k-consistency, where k indicates the number of aligned species. In this paper, we propose BalanceAli, a novel approach for global alignment of multiple PPI networks that achieves high k-coverage and k-consistency simultaneously. With six data sets consisting of various numbers of PPI networks from five species, we evaluate the experimental results using different k values. The performance evaluations of our approach against other three state-of-the-art methods demonstrate the preferable comprehensive strength of our approach.
Collapse
|
34
|
|
35
|
Jeong H, Yoon BJ. SEQUOIA: significance enhanced network querying through context-sensitive random walk and minimization of network conductance. BMC SYSTEMS BIOLOGY 2017; 11:20. [PMID: 28361708 PMCID: PMC5374659 DOI: 10.1186/s12918-017-0404-6] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
BACKGROUND Network querying algorithms provide computational means to identify conserved network modules in large-scale biological networks that are similar to known functional modules, such as pathways or molecular complexes. Two main challenges for network querying algorithms are the high computational complexity of detecting potential isomorphism between the query and the target graphs and ensuring the biological significance of the query results. RESULTS In this paper, we propose SEQUOIA, a novel network querying algorithm that effectively addresses these issues by utilizing a context-sensitive random walk (CSRW) model for network comparison and minimizing the network conductance of potential matches in the target network. The CSRW model, inspired by the pair hidden Markov model (pair-HMM) that has been widely used for sequence comparison and alignment, can accurately assess the node-to-node correspondence between different graphs by accounting for node insertions and deletions. The proposed algorithm identifies high-scoring network regions based on the CSRW scores, which are subsequently extended by maximally reducing the network conductance of the identified subnetworks. CONCLUSIONS Performance assessment based on real PPI networks and known molecular complexes show that SEQUOIA outperforms existing methods and clearly enhances the biological significance of the query results. The source code and datasets can be downloaded from http://www.ece.tamu.edu/~bjyoon/SEQUOIA .
Collapse
Affiliation(s)
- Hyundoo Jeong
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX, USA
| | - Byung-Jun Yoon
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX, USA.
| |
Collapse
|
36
|
Hashemifar S, Huang Q, Xu J. Joint Alignment of Multiple Protein–Protein Interaction Networks via Convex Optimization. J Comput Biol 2016; 23:903-911. [DOI: 10.1089/cmb.2016.0025] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Affiliation(s)
| | - Qixing Huang
- Toyota Technological Institute at Chicago, Chicago, Illinois
| | - Jinbo Xu
- Toyota Technological Institute at Chicago, Chicago, Illinois
| |
Collapse
|
37
|
Jeong H, Qian X, Yoon BJ. Effective comparative analysis of protein-protein interaction networks by measuring the steady-state network flow using a Markov model. BMC Bioinformatics 2016; 17:395. [PMID: 27766938 PMCID: PMC5073945 DOI: 10.1186/s12859-016-1215-2] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
Abstract
Background Comparative analysis of protein-protein interaction (PPI) networks provides an effective means of detecting conserved functional network modules across different species. Such modules typically consist of orthologous proteins with conserved interactions, which can be exploited to computationally predict the modules through network comparison. Results In this work, we propose a novel probabilistic framework for comparing PPI networks and effectively predicting the correspondence between proteins, represented as network nodes, that belong to conserved functional modules across the given PPI networks. The basic idea is to estimate the steady-state network flow between nodes that belong to different PPI networks based on a Markov random walk model. The random walker is designed to make random moves to adjacent nodes within a PPI network as well as cross-network moves between potential orthologous nodes with high sequence similarity. Based on this Markov random walk model, we estimate the steady-state network flow – or the long-term relative frequency of the transitions that the random walker makes – between nodes in different PPI networks, which can be used as a probabilistic score measuring their potential correspondence. Subsequently, the estimated scores can be used for detecting orthologous proteins in conserved functional modules through network alignment. Conclusions Through evaluations based on multiple real PPI networks, we demonstrate that the proposed scheme leads to improved alignment results that are biologically more meaningful at reduced computational cost, outperforming the current state-of-the-art algorithms. The source code and datasets can be downloaded from http://www.ece.tamu.edu/~bjyoon/CUFID.
Collapse
Affiliation(s)
- Hyundoo Jeong
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, USA
| | - Xiaoning Qian
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, USA
| | - Byung-Jun Yoon
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, USA.
| |
Collapse
|
38
|
Dohrmann J, Singh R. The SMAL web server: global multiple network alignment from pairwise alignments. Bioinformatics 2016; 32:3330-3332. [DOI: 10.1093/bioinformatics/btw402] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2016] [Accepted: 06/20/2016] [Indexed: 12/21/2022] Open
|
39
|
Elmsallati A, Clark C, Kalita J. Global Alignment of Protein-Protein Interaction Networks: A Survey. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2016; 13:689-705. [PMID: 26336140 DOI: 10.1109/tcbb.2015.2474391] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
In this paper, we survey algorithms that perform global alignment of networks or graphs. Global network alignment aligns two or more given networks to find the best mapping from nodes in one network to nodes in other networks. Since graphs are a common method of data representation, graph alignment has become important with many significant applications. Protein-protein interactions can be modeled as networks and aligning these networks of protein interactions has many applications in biological research. In this survey, we review algorithms for global pairwise alignment highlighting various proposed approaches, and classify them based on their methodology. Evaluation metrics that are used to measure the quality of the resulting alignments are also surveyed. We discuss and present a comparison between selected aligners on the same datasets and evaluate using the same evaluation metrics. Finally, a quick overview of the most popular databases of protein interaction networks is presented focusing on datasets that have been used recently.
Collapse
|
40
|
Gligorijević V, Malod-Dognin N, Pržulj N. Fuse: multiple network alignment via data fusion. Bioinformatics 2015; 32:1195-203. [DOI: 10.1093/bioinformatics/btv731] [Citation(s) in RCA: 36] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2014] [Accepted: 10/09/2015] [Indexed: 02/07/2023] Open
|
41
|
Faisal FE, Meng L, Crawford J, Milenković T. The post-genomic era of biological network alignment. EURASIP JOURNAL ON BIOINFORMATICS & SYSTEMS BIOLOGY 2015; 2015:3. [PMID: 28194172 PMCID: PMC5270500 DOI: 10.1186/s13637-015-0022-9] [Citation(s) in RCA: 48] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/21/2015] [Accepted: 05/18/2015] [Indexed: 11/10/2022]
Abstract
Biological network alignment aims to find regions of topological and functional (dis)similarities between molecular networks of different species. Then, network alignment can guide the transfer of biological knowledge from well-studied model species to less well-studied species between conserved (aligned) network regions, thus complementing valuable insights that have already been provided by genomic sequence alignment. Here, we review computational challenges behind the network alignment problem, existing approaches for solving the problem, ways of evaluating their alignment quality, and the approaches' biomedical applications. We discuss recent innovative efforts of improving the existing view of network alignment. We conclude with open research questions in comparative biological network research that could further our understanding of principles of life, evolution, disease, and therapeutics.
Collapse
Affiliation(s)
- Fazle E Faisal
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN, 46556 USA
- Interdisciplinary Center for Network Science and Applications, University of Notre Dame, Notre Dame, IN, 46556 USA
- ECK Institute for Global Health, University of Notre Dame, Notre Dame, IN, 46556 USA
| | - Lei Meng
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN, 46556 USA
| | - Joseph Crawford
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN, 46556 USA
- Interdisciplinary Center for Network Science and Applications, University of Notre Dame, Notre Dame, IN, 46556 USA
- ECK Institute for Global Health, University of Notre Dame, Notre Dame, IN, 46556 USA
| | - Tijana Milenković
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN, 46556 USA
- Interdisciplinary Center for Network Science and Applications, University of Notre Dame, Notre Dame, IN, 46556 USA
- ECK Institute for Global Health, University of Notre Dame, Notre Dame, IN, 46556 USA
| |
Collapse
|
42
|
Dohrmann J, Puchin J, Singh R. Global multiple protein-protein interaction network alignment by combining pairwise network alignments. BMC Bioinformatics 2015; 16 Suppl 13:S11. [PMID: 26423128 PMCID: PMC4597059 DOI: 10.1186/1471-2105-16-s13-s11] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023] Open
Abstract
BACKGROUND A wealth of protein interaction data has become available in recent years, creating an urgent need for powerful analysis techniques. In this context, the problem of finding biologically meaningful correspondences between different protein-protein interaction networks (PPIN) is of particular interest. The PPIN of a species can be compared with that of other species through the process of PPIN alignment. Such an alignment can provide insight into basic problems like species evolution and network component function determination, as well as translational problems such as target identification and elucidation of mechanisms of disease spread. Furthermore, multiple PPINs can be aligned simultaneously, expanding the analytical implications of the result. While there are several pairwise network alignment algorithms, few methods are capable of multiple network alignment. RESULTS We propose SMAL, a MNA algorithm based on the philosophy of scaffold-based alignment. SMAL is capable of converting results from any global pairwise alignment algorithms into a MNA in linear time. Using this method, we have built multiple network alignments based on combining pairwise alignments from a number of publicly available (pairwise) network aligners. We tested SMAL using PPINs of eight species derived from the IntAct repository and employed a number of measures to evaluate performance. Additionally, as part of our experimental investigations, we compared the effectiveness of SMAL while aligning up to eight input PPINs, and examined the effect of scaffold network choice on the alignments. CONCLUSIONS A key advantage of SMAL lies in its ability to create MNAs through the use of pairwise network aligners for which native MNA implementations do not exist. Experiments indicate that the performance of SMAL was comparable to that of the native MNA implementation of established methods such as IsoRankN and SMETANA. However, in terms of computational time, SMAL was significantly faster. SMAL was also able to retain many important characteristics of the native pairwise alignments, such as the number of aligned nodes and edges, as well as the functional and homologene similarity of aligned nodes. The speed, flexibility and the ability to retain prior correspondences as new networks are aligned, makes SMAL a compelling choice for alignment of multiple large networks.
Collapse
|
43
|
Rinnone F, Micale G, Bonnici V, Bader GD, Shasha D, Ferro A, Pulvirenti A, Giugno R. NetMatchStar: an enhanced Cytoscape network querying app. F1000Res 2015; 4:479. [PMID: 26594341 DOI: 10.12688/f1000research.6656.1] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 07/20/2015] [Indexed: 02/03/2023] Open
Abstract
We present NetMatchStar, a Cytoscape app to find all the occurrences of a query graph in a network and check for its significance as a motif with respect to seven different random models. The query can be uploaded or built from scratch using Cytoscape facilities. The app significantly enhances the previous NetMatch in style, performance and functionality. Notably NetMatchStar allows queries with wildcards.
Collapse
Affiliation(s)
- Fabio Rinnone
- Department of Math and Computer Science, University of Catania, Catania, 95125, Italy
| | - Giovanni Micale
- Department of Math and Computer Science, University of Catania, Catania, 95125, Italy
| | - Vincenzo Bonnici
- Department of Computer Science, University of Verona, Verona, 37134, Italy
| | - Gary D Bader
- The Donnelly Centre, University of Toronto, Toronto, ON, M5S 3E1, Canada
| | - Dennis Shasha
- Department of Computer Science, Courant Institute of Mathematical Sciences, New York University, New York, NY, 10012, USA
| | - Alfredo Ferro
- Department of Clinical and Experimental Medicine, University of Catania, Catania, 95125, Italy
| | - Alfredo Pulvirenti
- Department of Clinical and Experimental Medicine, University of Catania, Catania, 95125, Italy
| | - Rosalba Giugno
- Department of Clinical and Experimental Medicine, University of Catania, Catania, 95125, Italy
| |
Collapse
|
44
|
Rinnone F, Micale G, Bonnici V, Bader GD, Shasha D, Ferro A, Pulvirenti A, Giugno R. NetMatchStar: an enhanced Cytoscape network querying app. F1000Res 2015; 4:479. [PMID: 26594341 PMCID: PMC4642848 DOI: 10.12688/f1000research.6656.2] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 09/25/2015] [Indexed: 02/03/2023] Open
Abstract
We present NetMatchStar, a Cytoscape app to find all the occurrences of a query graph in a network and check for its significance as a motif with respect to seven different random models. The query can be uploaded or built from scratch using Cytoscape facilities. The app significantly enhances the previous NetMatch in style, performance and functionality. Notably NetMatchStar allows queries with wildcards.
Collapse
Affiliation(s)
- Fabio Rinnone
- Department of Math and Computer Science, University of Catania, Catania, 95125, Italy
| | - Giovanni Micale
- Department of Math and Computer Science, University of Catania, Catania, 95125, Italy
| | - Vincenzo Bonnici
- Department of Computer Science, University of Verona, Verona, 37134, Italy
| | - Gary D Bader
- The Donnelly Centre, University of Toronto, Toronto, ON, M5S 3E1, Canada
| | - Dennis Shasha
- Department of Computer Science, Courant Institute of Mathematical Sciences, New York University, New York, NY, 10012, USA
| | - Alfredo Ferro
- Department of Clinical and Experimental Medicine, University of Catania, Catania, 95125, Italy
| | - Alfredo Pulvirenti
- Department of Clinical and Experimental Medicine, University of Catania, Catania, 95125, Italy
| | - Rosalba Giugno
- Department of Clinical and Experimental Medicine, University of Catania, Catania, 95125, Italy
| |
Collapse
|
45
|
Radu A, Charleston M. Node Handprinting: A Scalable and Accurate Algorithm for Aligning Multiple Biological Networks. J Comput Biol 2015; 22:687-97. [DOI: 10.1089/cmb.2014.0247] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Affiliation(s)
- Alex Radu
- School of Information Technologies, The University of Sydney, Sydney, Australia
| | - Michael Charleston
- School of Information Technologies, The University of Sydney, Sydney, Australia
- Centre for Mathematical Biology, The University of Sydney, Sydney, Australia
- Sydney Emerging Infections and Biosecurity Institute, The University of Sydney, Sydney, Australia
- Marie Bashir Institute, The University of Sydney, Syndey, Australia
| |
Collapse
|
46
|
Micale G, Ferro A, Pulvirenti A, Giugno R. SPECTRA: An Integrated Knowledge Base for Comparing Tissue and Tumor-Specific PPI Networks in Human. Front Bioeng Biotechnol 2015; 3:58. [PMID: 26005672 PMCID: PMC4424906 DOI: 10.3389/fbioe.2015.00058] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2014] [Accepted: 04/17/2015] [Indexed: 12/11/2022] Open
Abstract
Protein–protein interaction (PPI) networks available in public repositories usually represent relationships between proteins within the cell. They ignore the specific set of tissues or tumors where the interactions take place. Indeed, proteins can form tissue-selective complexes, while they remain inactive in other tissues. For these reasons, a great attention has been recently paid to tissue-specific PPI networks, in which nodes are proteins of the global PPI network whose corresponding genes are preferentially expressed in specific tissues. In this paper, we present SPECTRA, a knowledge base to build and compare tissue or tumor-specific PPI networks. SPECTRA integrates gene expression and protein interaction data from the most authoritative online repositories. We also provide tools for visualizing and comparing such networks, in order to identify the expression and interaction changes of proteins across tissues, or between the normal and pathological states of the same tissue. SPECTRA is available as a web server at http://alpha.dmi.unict.it/spectra.
Collapse
Affiliation(s)
- Giovanni Micale
- Department of Computer Science, University of Pisa , Pisa , Italy
| | - Alfredo Ferro
- Department of Clinical and Molecular Biomedicine, University of Catania , Catania , Italy
| | - Alfredo Pulvirenti
- Department of Clinical and Molecular Biomedicine, University of Catania , Catania , Italy
| | - Rosalba Giugno
- Department of Clinical and Molecular Biomedicine, University of Catania , Catania , Italy
| |
Collapse
|
47
|
Alkan F, Erten C. SiPAN: simultaneous prediction and alignment of protein-protein interaction networks. Bioinformatics 2015; 31:2356-63. [PMID: 25788620 DOI: 10.1093/bioinformatics/btv160] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2014] [Accepted: 03/14/2015] [Indexed: 01/18/2023] Open
Abstract
MOTIVATION Network prediction as applied to protein-protein interaction (PPI) networks has received considerable attention within the last decade. Because of the limitations of experimental techniques for interaction detection and network construction, several computational methods for PPI network reconstruction and growth have been suggested. Such methods usually limit the scope of study to a single network, employing data based on genomic context, structure, domain, sequence information or existing network topology. Incorporating multiple species network data for network reconstruction and growth entails the design of novel models encompassing both network reconstruction and network alignment, since the goal of network alignment is to provide functionally orthologous proteins from multiple networks and such orthology information can be used in guiding interolog transfers. However, such an approach raises the classical chicken or egg problem; alignment methods assume error-free networks, whereas network prediction via orthology works affectively if the functionally orthologous proteins are determined with high precision. Thus to resolve this intertwinement, we propose a framework to handle both problems simultaneously, that of SImultaneous Prediction and Alignment of Networks (SiPAN). RESULTS We present an algorithm that solves the SiPAN problem in accordance with its simultaneous nature. Bearing the same name as the defined problem itself, the SiPAN algorithm employs state-of-the-art alignment and topology-based interaction confidence construction algorithms, which are used as benchmark methods for comparison purposes as well. To demonstrate the effectiveness of the proposed network reconstruction via SiPAN, we consider two scenarios; one that preserves the network sizes and the other where the network sizes are increased. Through extensive tests on real-world biological data, we show that the network qualities of SiPAN reconstructions are as good as those of original networks and in some cases SiPAN networks are even better, especially for the former scenario. An alternative state-of-the-art network reconstruction algorithm random walk with resistance produces networks considerably worse than the original networks and those reproduced via SiPAN in both cases. AVAILABILITY AND IMPLEMENTATION Freely available at http://webprs.khas.edu.tr/∼cesim/SiPAN.tar.gz.
Collapse
Affiliation(s)
- Ferhat Alkan
- Center for Non-Coding RNA in Technology and Health, Department of Veterinary Clinical and Animal Sciences, University of Copenhagen, Grønnegardsvej 3, DK-1870 Frederiksberg, Denmark and Department of Computer Engineering, Kadir Has University, Cibali, Istanbul 34083, Turkey
| | - Cesim Erten
- Department of Computer Engineering, Kadir Has University, Cibali, Istanbul 34083, Turkey
| |
Collapse
|
48
|
Jeong H, Yoon BJ. Accurate multiple network alignment through context-sensitive random walk. BMC SYSTEMS BIOLOGY 2015; 9 Suppl 1:S7. [PMID: 25707987 PMCID: PMC4331682 DOI: 10.1186/1752-0509-9-s1-s7] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]
Abstract
Background Comparative network analysis can provide an effective means of analyzing large-scale biological networks and gaining novel insights into their structure and organization. Global network alignment aims to predict the best overall mapping between a given set of biological networks, thereby identifying important similarities as well as differences among the networks. It has been shown that network alignment methods can be used to detect pathways or network modules that are conserved across different networks. Until now, a number of network alignment algorithms have been proposed based on different formulations and approaches, many of them focusing on pairwise alignment. Results In this work, we propose a novel multiple network alignment algorithm based on a context-sensitive random walk model. The random walker employed in the proposed algorithm switches between two different modes, namely, an individual walk on a single network and a simultaneous walk on two networks. The switching decision is made in a context-sensitive manner by examining the current neighborhood, which is effective for quantitatively estimating the degree of correspondence between nodes that belong to different networks, in a manner that sensibly integrates node similarity and topological similarity. The resulting node correspondence scores are then used to predict the maximum expected accuracy (MEA) alignment of the given networks. Conclusions Performance evaluation based on synthetic networks as well as real protein-protein interaction networks shows that the proposed algorithm can construct more accurate multiple network alignments compared to other leading methods.
Collapse
|
49
|
Hu J, Reinert K. LocalAli: an evolutionary-based local alignment approach to identify functionally conserved modules in multiple networks. ACTA ACUST UNITED AC 2014; 31:363-72. [PMID: 25282642 DOI: 10.1093/bioinformatics/btu652] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
MOTIVATION Sequences and protein interaction data are of significance to understand the underlying molecular mechanism of organisms. Local network alignment is one of key systematic ways for predicting protein functions, identifying functional modules and understanding the phylogeny from these data. Most of currently existing tools, however, encounter their limitations, which are mainly concerned with scoring scheme, speed and scalability. Therefore, there are growing demands for sophisticated network evolution models and efficient local alignment algorithms. RESULTS We developed a fast and scalable local network alignment tool called LocalAli for the identification of functionally conserved modules in multiple networks. In this algorithm, we firstly proposed a new framework to reconstruct the evolution history of conserved modules based on a maximum-parsimony evolutionary model. By relying on this model, LocalAli facilitates interpretation of resulting local alignments in terms of conserved modules, which have been evolved from a common ancestral module through a series of evolutionary events. A meta-heuristic method simulated annealing was used to search for the optimal or near-optimal inner nodes (i.e. ancestral modules) of the evolutionary tree. To evaluate the performance and the statistical significance, LocalAli were tested on 26 real datasets and 1040 randomly generated datasets. The results suggest that LocalAli outperforms all existing algorithms in terms of coverage, consistency and scalability, meanwhile retains a high precision in the identification of functionally coherent subnetworks. AVAILABILITY The source code and test datasets are freely available for download under the GNU GPL v3 license at https://code.google.com/p/localali/. CONTACT jialu.hu@fu-berlin.de or knut.reinert@fu-berlin.de. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jialu Hu
- Department of Mathematics and Computer Science, Freie Universität Berlin, Takustrasse 9, 14195 Berlin, Germany
| | - Knut Reinert
- Department of Mathematics and Computer Science, Freie Universität Berlin, Takustrasse 9, 14195 Berlin, Germany
| |
Collapse
|
50
|
Micale G, Continella A, Ferro A, Giugno R, Pulvirenti A. GASOLINE: a Cytoscape app for multiple local alignment of PPI networks. F1000Res 2014; 3:140. [PMID: 25324964 PMCID: PMC4197741 DOI: 10.12688/f1000research.4537.2] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 09/23/2014] [Indexed: 01/17/2023] Open
Abstract
Comparing protein interaction networks can reveal interesting patterns of interactions for a specific function or process in distantly related species. In this paper we present GASOLINE, a Cytoscape app for multiple local alignments of PPI (protein-protein interaction) networks. The app is based on the homonymous greedy and stochastic algorithm. GASOLINE starts with the identification of sets of similar nodes, called seeds of the alignment. Alignments are then extended in a greedy manner and finally refined. Both the identification of seeds and the extension of alignments are performed through an iterative Gibbs sampling strategy. GASOLINE is a Cytoscape app for computing and visualizing local alignments, without requiring any post-processing operations. GO terms can be easily attached to the aligned proteins for further functional analysis of alignments. GASOLINE can perform the alignment task in few minutes, even for a large number of input networks.
Collapse
Affiliation(s)
- Giovanni Micale
- Department of Computer Science, University of Pisa, Pisa, 56127, Italy
| | - Andrea Continella
- Department of Electronics, Information and Bioengineering, Polytechnic University of Milan, Milan, 20133, Italy
| | - Alfredo Ferro
- Department of Clinical and Molecular Biomedicine, University of Catania, Catania, 95125, Italy
| | - Rosalba Giugno
- Department of Clinical and Molecular Biomedicine, University of Catania, Catania, 95125, Italy
| | - Alfredo Pulvirenti
- Department of Clinical and Molecular Biomedicine, University of Catania, Catania, 95125, Italy
| |
Collapse
|