1
|
Perlasca P, Frasca M, Ba CT, Gliozzo J, Notaro M, Pennacchioni M, Valentini G, Mesiti M. Multi-resolution visualization and analysis of biomolecular networks through hierarchical community detection and web-based graphical tools. PLoS One 2020; 15:e0244241. [PMID: 33351828 PMCID: PMC7755227 DOI: 10.1371/journal.pone.0244241] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2020] [Accepted: 12/04/2020] [Indexed: 11/19/2022] Open
Abstract
The visual exploration and analysis of biomolecular networks is of paramount importance for identifying hidden and complex interaction patterns among proteins. Although many tools have been proposed for this task, they are mainly focused on the query and visualization of a single protein with its neighborhood. The global exploration of the entire network and the interpretation of its underlying structure still remains difficult, mainly due to the excessively large size of the biomolecular networks. In this paper we propose a novel multi-resolution representation and exploration approach that exploits hierarchical community detection algorithms for the identification of communities occurring in biomolecular networks. The proposed graphical rendering combines two types of nodes (protein and communities) and three types of edges (protein-protein, community-community, protein-community), and displays communities at different resolutions, allowing the user to interactively zoom in and out from different levels of the hierarchy. Links among communities are shown in terms of relationships and functional correlations among the biomolecules they contain. This form of navigation can be also combined by the user with a vertex centric visualization for identifying the communities holding a target biomolecule. Since communities gather limited-size groups of correlated proteins, the visualization and exploration of complex and large networks becomes feasible on off-the-shelf computer machines. The proposed graphical exploration strategies have been implemented and integrated in UNIPred-Web, a web application that we recently introduced for combining the UNIPred algorithm, able to address both integration and protein function prediction in an imbalance-aware fashion, with an easy to use vertex-centric exploration of the integrated network. The tool has been deeply amended from different standpoints, including the prediction core algorithm. Several tests on networks of different size and connectivity have been conducted to show off the vast potential of our methodology; moreover, enrichment analyses have been performed to assess the biological meaningfulness of detected communities. Finally, a CoV-human network has been embedded in the system, and a corresponding case study presented, including the visualization and the prediction of human host proteins that potentially interact with SARS-CoV2 proteins.
Collapse
Affiliation(s)
- Paolo Perlasca
- AnacletoLab, Department of Computer Science, University of Milan, Milan, Italy
| | - Marco Frasca
- AnacletoLab, Department of Computer Science, University of Milan, Milan, Italy
| | - Cheick Tidiane Ba
- AnacletoLab, Department of Computer Science, University of Milan, Milan, Italy
| | - Jessica Gliozzo
- Neuroradiology Unit, IRCCS San Raffaele Hospital, Milan, Italy
| | - Marco Notaro
- AnacletoLab, Department of Computer Science, University of Milan, Milan, Italy
| | - Mario Pennacchioni
- AnacletoLab, Department of Computer Science, University of Milan, Milan, Italy
| | - Giorgio Valentini
- AnacletoLab, Department of Computer Science, University of Milan, Milan, Italy
- CINI National Laboratory in Artificial Intelligence and Intelligent Systems—AIIS, Rome, Italy
| | - Marco Mesiti
- AnacletoLab, Department of Computer Science, University of Milan, Milan, Italy
| |
Collapse
|
2
|
Frasca M, Grossi G, Gliozzo J, Mesiti M, Notaro M, Perlasca P, Petrini A, Valentini G. A GPU-based algorithm for fast node label learning in large and unbalanced biomolecular networks. BMC Bioinformatics 2018; 19:353. [PMID: 30367594 PMCID: PMC6191976 DOI: 10.1186/s12859-018-2301-4] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open
Abstract
BACKGROUND Several problems in network biology and medicine can be cast into a framework where entities are represented through partially labeled networks, and the aim is inferring the labels (usually binary) of the unlabeled part. Connections represent functional or genetic similarity between entities, while the labellings often are highly unbalanced, that is one class is largely under-represented: for instance in the automated protein function prediction (AFP) for most Gene Ontology terms only few proteins are annotated, or in the disease-gene prioritization problem only few genes are actually known to be involved in the etiology of a given disease. Imbalance-aware approaches to accurately predict node labels in biological networks are thereby required. Furthermore, such methods must be scalable, since input data can be large-sized as, for instance, in the context of multi-species protein networks. RESULTS We propose a novel semi-supervised parallel enhancement of COSNET, an imbalance-aware algorithm build on Hopfield neural model recently suggested to solve the AFP problem. By adopting an efficient representation of the graph and assuming a sparse network topology, we empirically show that it can be efficiently applied to networks with millions of nodes. The key strategy to speed up the computations is to partition nodes into independent sets so as to process each set in parallel by exploiting the power of GPU accelerators. This parallel technique ensures the convergence to asymptotically stable attractors, while preserving the asynchronous dynamics of the original model. Detailed experiments on real data and artificial big instances of the problem highlight scalability and efficiency of the proposed method. CONCLUSIONS By parallelizing COSNET we achieved on average a speed-up of 180x in solving the AFP problem in the S. cerevisiae, Mus musculus and Homo sapiens organisms, while lowering memory requirements. In addition, to show the potential applicability of the method to huge biomolecular networks, we predicted node labels in artificially generated sparse networks involving hundreds of thousands to millions of nodes.
Collapse
Affiliation(s)
- Marco Frasca
- AnacletoLab - Department of Computer Science, Università degli Studi di Milano, Via Comelico 39, Milano, 20135 Italy
| | - Giuliano Grossi
- AnacletoLab - Department of Computer Science, Università degli Studi di Milano, Via Comelico 39, Milano, 20135 Italy
| | - Jessica Gliozzo
- Department of Dermatology, Fondazione IRCCS Ca’ Granda,, Ospedale Maggiore Policlinico, Milan, 20122 Italy
| | - Marco Mesiti
- AnacletoLab - Department of Computer Science, Università degli Studi di Milano, Via Comelico 39, Milano, 20135 Italy
| | - Marco Notaro
- AnacletoLab - Department of Computer Science, Università degli Studi di Milano, Via Comelico 39, Milano, 20135 Italy
| | - Paolo Perlasca
- AnacletoLab - Department of Computer Science, Università degli Studi di Milano, Via Comelico 39, Milano, 20135 Italy
| | - Alessandro Petrini
- AnacletoLab - Department of Computer Science, Università degli Studi di Milano, Via Comelico 39, Milano, 20135 Italy
| | - Giorgio Valentini
- AnacletoLab - Department of Computer Science, Università degli Studi di Milano, Via Comelico 39, Milano, 20135 Italy
| |
Collapse
|
3
|
Ehsani Ardakani MJ, Safaei A, Arefi Oskouie A, Haghparast H, Haghazali M, Mohaghegh Shalmani H, Peyvandi H, Naderi N, Zali MR. Evaluation of liver cirrhosis and hepatocellular carcinoma using Protein-Protein Interaction Networks. GASTROENTEROLOGY AND HEPATOLOGY FROM BED TO BENCH 2016; 9:S14-S22. [PMID: 28224023 PMCID: PMC5310795] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
AIM In the current study, we analysised only the articles that investigate serum proteome profile of cirrhosis patients or HCC patients versus healthy controls. BACKGROUND Increased understanding of cancer biology has enabled identification of molecular events that lead to the discovery of numerous potential biomarkers in diseases. Protein-protein interaction networks is one of aspect that could elevate the understanding level of molecular events and protein connections that lead to the identification of genes and proteins associated with diseases. METHODS Gene expression data, including 63 gene or protein names for hepatocellular carcinoma and 29 gene or protein names for cirrhosis, were extracted from a number of previous investigations. The networks of related differentially expressed genes were explored using Cytoscape and the PPI analysis methods such as MCODE and ClueGO. Centrality and cluster screening identified hub genes, including APOE, TTR, CLU, and APOA1 in cirrhosis. RESULTS CLU and APOE belong to the regulation of positive regulation of neurofibrillary tangle assembly. HP and APOE involved in cellular oxidant detoxification. C4B and C4BP belong to the complement activation, classical pathway and acute inflammation response pathway. Also, it was reported TTR, TFRC, VWF, CLU, A2M, APOA1, CKAP5, ZNF648, CASP8, and HSP27 as hubs in HCC. In HCC, these include A2M that are corresponding to platelet degranulation, humoral immune response, and negative regulation of immune effector process. CLU belong to the reverse cholesterol transport, platelet degranulation and human immune response. APOA1 corresponds to the reverse cholesterol transport, platelet degranulation and humoral immune response, as well as negative regulation of immune effector process pathway. CONCLUSION In conclusion, this study suggests that there is a common molecular relationship between cirrhosis and hepatocellular cancer that may help with identification of target molecules for early treatment that is essential in cancer therapy.
Collapse
Affiliation(s)
- Mohammad Javad Ehsani Ardakani
- Gastroenterology and Liver Diseases Research Center, Research Institute for Gastroenterology and Liver Diseases, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Akram Safaei
- Proteomics Research Center, Faculty of Paramedical Sciences, Shahid Beheshti University of Medical Sciences, Tehran, Iran.
| | - Afsaneh Arefi Oskouie
- Department of Basic Sciences, Faculty of Paramedical Sciences, Shahid Beheshti University of Medical Sciences, Tehran, Iran.
| | - Hesam Haghparast
- Basic and Molecular Epidemiology of Gastrointestinal Disorders Research Center, Research Institute for Gastroenterologyand Liver Diseases, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Mehrdad Haghazali
- Behbood Gastroenterology and Liver Diseases Research Center, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Hamid Mohaghegh Shalmani
- Basic and Molecular Epidemiology of Gastrointestinal Disorders Research Center, Research Institute for Gastroenterologyand Liver Diseases, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Hassan Peyvandi
- Hearing Disorders Research Center, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Nosratollah Naderi
- Basic and Molecular Epidemiology of Gastrointestinal Disorders Research Center, Research Institute for Gastroenterologyand Liver Diseases, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Mohammad Reza Zali
- Gastroenterology and Liver Diseases Research Center, Research Institute for Gastroenterology and Liver Diseases, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| |
Collapse
|
4
|
Yu G, Fu G, Wang J, Zhu H. Predicting Protein Function via Semantic Integration of Multiple Networks. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2016; 13:220-232. [PMID: 26800544 DOI: 10.1109/tcbb.2015.2459713] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Determining the biological functions of proteins is one of the key challenges in the post-genomic era. The rapidly accumulated large volumes of proteomic and genomic data drives to develop computational models for automatically predicting protein function in large scale. Recent approaches focus on integrating multiple heterogeneous data sources and they often get better results than methods that use single data source alone. In this paper, we investigate how to integrate multiple biological data sources with the biological knowledge, i.e., Gene Ontology (GO), for protein function prediction. We propose a method, called SimNet, to Semantically integrate multiple functional association Networks derived from heterogenous data sources. SimNet firstly utilizes GO annotations of proteins to capture the semantic similarity between proteins and introduces a semantic kernel based on the similarity. Next, SimNet constructs a composite network, obtained as a weighted summation of individual networks, and aligns the network with the kernel to get the weights assigned to individual networks. Then, it applies a network-based classifier on the composite network to predict protein function. Experiment results on heterogenous proteomic data sources of Yeast, Human, Mouse, and Fly show that, SimNet not only achieves better (or comparable) results than other related competitive approaches, but also takes much less time. The Matlab codes of SimNet are available at https://sites.google.com/site/guoxian85/simnet.
Collapse
|
5
|
Frasca M, Bertoni A, Valentini G. UNIPred: Unbalance-Aware Network Integration and Prediction of Protein Functions. J Comput Biol 2015; 22:1057-74. [PMID: 26402488 DOI: 10.1089/cmb.2014.0110] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022] Open
Abstract
The proper integration of multiple sources of data and the unbalance between annotated and unannotated proteins represent two of the main issues of the automated function prediction (AFP) problem. Most of supervised and semisupervised learning algorithms for AFP proposed in literature do not jointly consider these items, with a negative impact on both sensitivity and precision performances, due to the unbalance between annotated and unannotated proteins that characterize the majority of functional classes and to the specific and complementary information content embedded in each available source of data. We propose UNIPred (unbalance-aware network integration and prediction of protein functions), an algorithm that properly combines different biomolecular networks and predicts protein functions using parametric semisupervised neural models. The algorithm explicitly takes into account the unbalance between unannotated and annotated proteins both to construct the integrated network and to predict protein annotations for each functional class. Full-genome and ontology-wide experiments with three eukaryotic model organisms show that the proposed method compares favorably with state-of-the-art learning algorithms for AFP.
Collapse
Affiliation(s)
- Marco Frasca
- DI - Department of Computer Science, University of Milan , Milan, Italy
| | - Alberto Bertoni
- DI - Department of Computer Science, University of Milan , Milan, Italy
| | - Giorgio Valentini
- DI - Department of Computer Science, University of Milan , Milan, Italy
| |
Collapse
|
6
|
Frasca M, Bassis S, Valentini G. Learning node labels with multi-category Hopfield networks. Neural Comput Appl 2015. [DOI: 10.1007/s00521-015-1965-1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023]
|
7
|
Abstract
The challenging task of studying and modeling complex dynamics of biological systems in order to describe various human diseases has gathered great interest in recent years. Major biological processes are mediated through protein interactions, hence there is a need to understand the chaotic network that forms these processes in pursuance of understanding human diseases. The applications of protein interaction networks to disease datasets allow the identification of genes and proteins associated with diseases, the study of network properties, identification of subnetworks, and network-based disease gene classification. Although various protein interaction network analysis strategies have been employed, grand challenges are still existing. Global understanding of protein interaction networks via integration of high-throughput functional genomics data from different levels will allow researchers to examine the disease pathways and identify strategies to control them. As a result, it seems likely that more personalized, more accurate and more rapid disease gene diagnostic techniques will be devised in the future, as well as novel strategies that are more personalized. This mini-review summarizes the current practice of protein interaction networks in medical research as well as challenges to be overcome.
Collapse
Affiliation(s)
- Tuba Sevimoglu
- Department of Bioengineering, Marmara University, Goztepe, 34722 Istanbul, Turkey
| | - Kazim Yalcin Arga
- Department of Bioengineering, Marmara University, Goztepe, 34722 Istanbul, Turkey
| |
Collapse
|
8
|
Mesiti M, Re M, Valentini G. Think globally and solve locally: secondary memory-based network learning for automated multi-species function prediction. Gigascience 2014; 3:5. [PMID: 24843788 PMCID: PMC4006453 DOI: 10.1186/2047-217x-3-5] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2013] [Accepted: 04/01/2014] [Indexed: 01/08/2023] Open
Abstract
Background Network-based learning algorithms for automated function prediction (AFP) are negatively affected by the limited coverage of experimental data and limited a priori known functional annotations. As a consequence their application to model organisms is often restricted to well characterized biological processes and pathways, and their effectiveness with poorly annotated species is relatively limited. A possible solution to this problem might consist in the construction of big networks including multiple species, but this in turn poses challenging computational problems, due to the scalability limitations of existing algorithms and the main memory requirements induced by the construction of big networks. Distributed computation or the usage of big computers could in principle respond to these issues, but raises further algorithmic problems and require resources not satisfiable with simple off-the-shelf computers. Results We propose a novel framework for scalable network-based learning of multi-species protein functions based on both a local implementation of existing algorithms and the adoption of innovative technologies: we solve “locally” the AFP problem, by designing “vertex-centric” implementations of network-based algorithms, but we do not give up thinking “globally” by exploiting the overall topology of the network. This is made possible by the adoption of secondary memory-based technologies that allow the efficient use of the large memory available on disks, thus overcoming the main memory limitations of modern off-the-shelf computers. This approach has been applied to the analysis of a large multi-species network including more than 300 species of bacteria and to a network with more than 200,000 proteins belonging to 13 Eukaryotic species. To our knowledge this is the first work where secondary-memory based network analysis has been applied to multi-species function prediction using biological networks with hundreds of thousands of proteins. Conclusions The combination of these algorithmic and technological approaches makes feasible the analysis of large multi-species networks using ordinary computers with limited speed and primary memory, and in perspective could enable the analysis of huge networks (e.g. the whole proteomes available in SwissProt), using well-equipped stand-alone machines.
Collapse
Affiliation(s)
- Marco Mesiti
- AnacletoLab - Department of Computer Science, University of Milano, Via Comelico 39/41, 20135 Milano, Italy
| | - Matteo Re
- AnacletoLab - Department of Computer Science, University of Milano, Via Comelico 39/41, 20135 Milano, Italy
| | - Giorgio Valentini
- AnacletoLab - Department of Computer Science, University of Milano, Via Comelico 39/41, 20135 Milano, Italy
| |
Collapse
|