1
|
Di Rocco L, Ferraro Petrillo U, Rombo SE. DIAMIN: a software library for the distributed analysis of large-scale molecular interaction networks. BMC Bioinformatics 2022; 23:474. [PMID: 36368948 PMCID: PMC9652854 DOI: 10.1186/s12859-022-05026-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2022] [Accepted: 10/29/2022] [Indexed: 11/13/2022] Open
Abstract
BACKGROUND Huge amounts of molecular interaction data are continuously produced and stored in public databases. Although many bioinformatics tools have been proposed in the literature for their analysis, based on their modeling through different types of biological networks, several problems still remain unsolved when the problem turns on a large scale. RESULTS We propose DIAMIN, that is, a high-level software library to facilitate the development of applications for the efficient analysis of large-scale molecular interaction networks. DIAMIN relies on distributed computing, and it is implemented in Java upon the framework Apache Spark. It delivers a set of functionalities implementing different tasks on an abstract representation of very large graphs, providing a built-in support for methods and algorithms commonly used to analyze these networks. DIAMIN has been tested on data retrieved from two of the most used molecular interactions databases, resulting to be highly efficient and scalable. As shown by different provided examples, DIAMIN can be exploited by users without any distributed programming experience, in order to perform various types of data analysis, and to implement new algorithms based on its primitives. CONCLUSIONS The proposed DIAMIN has been proved to be successful in allowing users to solve specific biological problems that can be modeled relying on biological networks, by using its functionalities. The software is freely available and this will hopefully allow its rapid diffusion through the scientific community, to solve both specific data analysis and more complex tasks.
Collapse
Affiliation(s)
- Lorenzo Di Rocco
- Department of Statistics, University of Rome La Sapienza, Rome, Italy
| | | | - Simona E. Rombo
- Department of Mathematics and Computer Science, University of Palermo, Palermo, Italy
| |
Collapse
|
2
|
Bonomo M, Giancarlo R, Greco D, Rombo SE. Topological ranks reveal functional knowledge encoded in biological networks: a comparative analysis. Brief Bioinform 2022; 23:6563936. [PMID: 35381599 DOI: 10.1093/bib/bbac101] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2021] [Revised: 01/31/2022] [Accepted: 02/28/2022] [Indexed: 12/21/2022] Open
Abstract
MOTIVATION Biological networks topology yields important insights into biological function, occurrence of diseases and drug design. In the last few years, different types of topological measures have been introduced and applied to infer the biological relevance of network components/interactions, according to their position within the network structure. Although comparisons of such measures have been previously proposed, to what extent the topology per se may lead to the extraction of novel biological knowledge has never been critically examined nor formalized in the literature. RESULTS We present a comparative analysis of nine outstanding topological measures, based on compact views obtained from the rank they induce on a given input biological network. The goal is to understand their ability in correctly positioning nodes/edges in the rank, according to the functional knowledge implicitly encoded in biological networks. To this aim, both internal and external (gold standard) validation criteria are taken into account, and six networks involving three different organisms (yeast, worm and human) are included in the comparison. The results show that a distinct handful of best-performing measures can be identified for each of the considered organisms, independently from the reference gold standard. AVAILABILITY Input files and code for the computation of the considered topological measures and K-haus distance are available at https://gitlab.com/MaryBonomo/ranking. CONTACT simona.rombo@unipa.it. SUPPLEMENTARY INFORMATION Supplementary data are available at Briefings in Bioinformatics online.
Collapse
Affiliation(s)
- Mariella Bonomo
- Department of Engineering, University of Palermo, Palermo, 90121, Italy, Palermo
| | - Raffaele Giancarlo
- Department of Mathematics and Computer Science, University of Palermo, Palermo, 90121, Italy, Palermo
| | - Daniele Greco
- Department of Mathematics and Computer Science, University of Palermo, Palermo, 90121, Italy, Palermo
| | - Simona E Rombo
- Department of Mathematics and Computer Science, University of Palermo, Palermo, 90121, Italy, Palermo
| |
Collapse
|
3
|
Non-coding RNA regulatory networks. BIOCHIMICA ET BIOPHYSICA ACTA-GENE REGULATORY MECHANISMS 2019; 1863:194417. [PMID: 31493559 DOI: 10.1016/j.bbagrm.2019.194417] [Citation(s) in RCA: 245] [Impact Index Per Article: 49.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/28/2019] [Revised: 08/13/2019] [Accepted: 08/13/2019] [Indexed: 02/06/2023]
Abstract
It is well established that the vast majority of human RNA transcripts do not encode for proteins and that non-coding RNAs regulate cell physiology and shape cellular functions. A subset of them is involved in gene regulation at different levels, from epigenetic gene silencing to post-transcriptional regulation of mRNA stability. Notably, the aberrant expression of many non-coding RNAs has been associated with aggressive pathologies. Rapid advances in network biology indicates that the robustness of cellular processes is the result of specific properties of biological networks such as scale-free degree distribution and hierarchical modularity, suggesting that regulatory network analyses could provide new insights on gene regulation and dysfunction mechanisms. In this study we present an overview of public repositories where non-coding RNA-regulatory interactions are collected and annotated, we discuss unresolved questions for data integration and we recall existing resources to build and analyse networks.
Collapse
|
4
|
Wang T, Peng J, Peng Q, Wang Y, Chen J. FSM: Fast and scalable network motif discovery for exploring higher-order network organizations. Methods 2019; 173:83-93. [PMID: 31306744 DOI: 10.1016/j.ymeth.2019.07.008] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2019] [Revised: 06/30/2019] [Accepted: 07/09/2019] [Indexed: 01/06/2023] Open
Abstract
Networks exhibit rich and diverse higher-order organizational structures. Network motifs, which are recurring significant patterns of inter-connections, are recognized as fundamental units to study the higher-order organizations of networks. However, the principle of selecting representative network motifs for local motif based clustering remains largely unexplored. We present a scalable algorithm called FSM for network motif discovery. FSM is advantageous in twofold. First, it accelerates the motif discovery process by effectively reducing the number of times for subgraph isomorphism labeling. Second, FSM adopts multiple heuristic optimizations for subgraph enumeration and classification to further improve its performance. Experimental results on biological networks show that, comparing with the existing network motif discovery algorithm, FSM is more efficient on computational efficiency and memory usage. Furthermore, with the large, frequent, and sparse network motifs discovered by FSM, the higher-order organizational structures of biological networks were successfully revealed, indicating that FSM is suitable to select network representative network motifs for exploring high-order network organizations.
Collapse
Affiliation(s)
- Tao Wang
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Jiajie Peng
- School of Computer Science, Northwestern Polytechnical University, Xi'an, China
| | - Qidi Peng
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Yadong Wang
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China.
| | - Jin Chen
- Institute for Biomedical Informatics, University of Kentucky, Lexington, KY, USA.
| |
Collapse
|
5
|
Guzzi PH, Milenkovic T. Survey of local and global biological network alignment: the need to reconcile the two sides of the same coin. Brief Bioinform 2019; 19:472-481. [PMID: 28062413 DOI: 10.1093/bib/bbw132] [Citation(s) in RCA: 30] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2016] [Indexed: 12/23/2022] Open
Abstract
Analogous to genomic sequence alignment that allows for across-species transfer of biological knowledge between conserved sequence regions, biological network alignment can be used to guide the knowledge transfer between conserved regions of molecular networks of different species. Hence, biological network alignment can be used to redefine the traditional notion of a sequence-based homology to a new notion of network-based homology. Analogous to genomic sequence alignment, there exist local and global biological network alignments. Here, we survey prominent and recent computational approaches of each network alignment type and discuss their (dis)advantages. Then, as it was recently shown that the two approach types are complementary, in the sense that they capture different slices of cellular functioning, we discuss the need to reconcile the two network alignment types and present a recent first step in this direction. We conclude with some open research problems on this topic and comment on the usefulness of network alignment in other domains besides computational biology.
Collapse
Affiliation(s)
- Pietro Hiram Guzzi
- Department of Surgical and Medical Sciences, University Magna Graecia, Catanzaro, 88100 Italy
| | - Tijana Milenkovic
- Department of Computer Science and Engineering, Interdisciplinary Center for Network Science and Applications (iCeNSA), ECK Institute for Global Health, University of Notre Dame, Notre Dame, IN 46556, USA
| |
Collapse
|
6
|
Zhang J, Kwong S, Jia Y, Wong KC. NSSRF: global network similarity search with subgraph signatures and its applications. Bioinformatics 2017; 33:1696-1702. [PMID: 28158419 DOI: 10.1093/bioinformatics/btx051] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2016] [Accepted: 01/24/2017] [Indexed: 11/12/2022] Open
Abstract
Motivation The exponential growth of biological network database has increasingly rendered the global network similarity search (NSS) computationally intensive. Given a query network and a network database, it aims to find out the top similar networks in the database against the query network based on a topological similarity measure of interest. With the advent of big network data, the existing search methods may become unsuitable since some of them could render queries unsuccessful by returning empty answers or arbitrary query restrictions. Therefore, the design of NSS algorithm remains challenging under the dilemma between accuracy and efficiency. Results We propose a global NSS method based on regression, denotated as NSSRF, which boosts the search speed without any significant sacrifice in practical performance. As motivated from the nature, subgraph signatures are heavily involved. Two phases are proposed in NSSRF: offline model building phase and similarity query phase. In the offline model building phase, the subgraph signatures and cosine similarity scores are used for efficient random forest regression (RFR) model training. In the similarity query phase, the trained regression model is queried to return similar networks. We have extensively validated NSSRF on biological pathways and molecular structures; NSSRF demonstrates competitive performance over the state-of-the-arts. Remarkably, NSSRF works especially well for large networks, which indicates that the proposed approach can be promising in the era of big data. Case studies have proven the efficiencies and uniqueness of NSSRF which could be missed by the existing state-of-the-arts. Availability and Implementation The source code of two versions of NSSRF are freely available for downloading at https://github.com/zhangjiaobxy/nssrfBinary and https://github.com/zhangjiaobxy/nssrfPackage . Contact kc.w@cityu.edu.hk. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jiao Zhang
- Department of Computer Science, City University of Hong Kong, Kowloon Tong, Hong Kong
| | - Sam Kwong
- Department of Computer Science, City University of Hong Kong, Kowloon Tong, Hong Kong
| | - Yuheng Jia
- Department of Computer Science, City University of Hong Kong, Kowloon Tong, Hong Kong
| | - Ka-Chun Wong
- Department of Computer Science, City University of Hong Kong, Kowloon Tong, Hong Kong
| |
Collapse
|
7
|
Bonnici V, Busato F, Micale G, Bombieri N, Pulvirenti A, Giugno R. APPAGATO: an APproximate PArallel and stochastic GrAph querying TOol for biological networks. Bioinformatics 2016; 32:2159-66. [DOI: 10.1093/bioinformatics/btw223] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2015] [Accepted: 04/10/2016] [Indexed: 02/02/2023] Open
Affiliation(s)
- Vincenzo Bonnici
- Department of Computer Science, University of Verona, Strada Le Grazie, Verona
| | - Federico Busato
- Department of Computer Science, University of Verona, Strada Le Grazie, Verona
| | - Giovanni Micale
- Department of Math and Computer Science, University of Catania, Viale a. Doria, Catania
| | - Nicola Bombieri
- Department of Computer Science, University of Verona, Strada Le Grazie, Verona
| | - Alfredo Pulvirenti
- Department of Clinical and Experimental Medicine, University of Catania, via Palermo, Catania
| | - Rosalba Giugno
- Department of Computer Science, University of Verona, Strada Le Grazie, Verona
- Department of Clinical and Experimental Medicine, University of Catania, via Palermo, Catania
| |
Collapse
|
8
|
Liang C, Li Y, Luo J, Zhang Z. A novel motif-discovery algorithm to identify co-regulatory motifs in large transcription factor and microRNA co-regulatory networks in human. Bioinformatics 2015; 31:2348-55. [DOI: 10.1093/bioinformatics/btv159] [Citation(s) in RCA: 29] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2014] [Accepted: 03/13/2015] [Indexed: 01/23/2023] Open
|
9
|
Jeong H, Yoon BJ. Accurate multiple network alignment through context-sensitive random walk. BMC SYSTEMS BIOLOGY 2015; 9 Suppl 1:S7. [PMID: 25707987 PMCID: PMC4331682 DOI: 10.1186/1752-0509-9-s1-s7] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]
Abstract
Background Comparative network analysis can provide an effective means of analyzing large-scale biological networks and gaining novel insights into their structure and organization. Global network alignment aims to predict the best overall mapping between a given set of biological networks, thereby identifying important similarities as well as differences among the networks. It has been shown that network alignment methods can be used to detect pathways or network modules that are conserved across different networks. Until now, a number of network alignment algorithms have been proposed based on different formulations and approaches, many of them focusing on pairwise alignment. Results In this work, we propose a novel multiple network alignment algorithm based on a context-sensitive random walk model. The random walker employed in the proposed algorithm switches between two different modes, namely, an individual walk on a single network and a simultaneous walk on two networks. The switching decision is made in a context-sensitive manner by examining the current neighborhood, which is effective for quantitatively estimating the degree of correspondence between nodes that belong to different networks, in a manner that sensibly integrates node similarity and topological similarity. The resulting node correspondence scores are then used to predict the maximum expected accuracy (MEA) alignment of the given networks. Conclusions Performance evaluation based on synthetic networks as well as real protein-protein interaction networks shows that the proposed algorithm can construct more accurate multiple network alignments compared to other leading methods.
Collapse
|