1
|
Idhaya T, Suruliandi A, Raja SP. A Comprehensive Review on Machine Learning Techniques for Protein Family Prediction. Protein J 2024; 43:171-186. [PMID: 38427271 DOI: 10.1007/s10930-024-10181-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/19/2024] [Indexed: 03/02/2024]
Abstract
Proteomics is a field dedicated to the analysis of proteins in cells, tissues, and organisms, aiming to gain insights into their structures, functions, and interactions. A crucial aspect within proteomics is protein family prediction, which involves identifying evolutionary relationships between proteins by examining similarities in their sequences or structures. This approach holds great potential for applications such as drug discovery and functional annotation of genomes. However, current methods for protein family prediction have certain limitations, including limited accuracy, high false positive rates, and challenges in handling large datasets. Some methods also rely on homologous sequences or protein structures, which introduce biases and restrict their applicability to specific protein families or structures. To overcome these limitations, researchers have turned to machine learning (ML) approaches that can identify connections between protein features and simplify complex high-dimensional datasets. This paper presents a comprehensive survey of articles that employ various ML techniques for predicting protein families. The primary objective is to explore and improve ML techniques specifically for protein family prediction, thus advancing future research in the field. Through qualitative and quantitative analyses of ML techniques, it is evident that multiple methods utilizing a range of classifiers have been applied for protein family prediction. However, there has been limited focus on developing novel classifiers for protein family classification, highlighting the urgent need for improved approaches in this area. By addressing these challenges, this research aims to enhance the accuracy and effectiveness of protein family prediction, ultimately facilitating advancements in proteomics and its diverse applications.
Collapse
Affiliation(s)
- T Idhaya
- Department of Computer Science and Engineering, Manonmaniam Sundaranar University, Tirunelveli, TamilNadu, India.
| | - A Suruliandi
- Department of Computer Science and Engineering, Manonmaniam Sundaranar University, Tirunelveli, TamilNadu, India
| | - S P Raja
- School of Computer Science and Engineering, Vellore Institute of Technology, Vellore, TamilNadu, India
| |
Collapse
|
2
|
Santos TG, Silva KS, Lima RM, Silva LC, Pereira M. State of the art in protein-protein interactions within the fungi kingdom. Future Microbiol 2023; 18:1119-1131. [PMID: 37540069 DOI: 10.2217/fmb-2022-0274] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/05/2023] Open
Abstract
Proteins rarely exert their function by themselves. Protein-protein interactions (PPIs) regulate virtually every biological process that takes place in a cell. Such interactions are targets for new therapeutic agents against all sorts of diseases, through the screening and design of a variety of inhibitors. Here we discuss several aspects of PPIs that contribute to prediction of protein function and drug discovery. As the high-throughput techniques continue to release biological data, targets for fungal therapeutics that rely on PPIs are being proposed worldwide. Computational approaches have reduced the time taken to develop new therapeutic approaches. The near future brings the possibility of developing new PPI and interaction network inhibitors and a revolution in the way we treat fungal diseases.
Collapse
Affiliation(s)
- Thaynara G Santos
- Laboratório de Biologia Molecular, Instituto de Ciências Biológicas, Universidade Federal de Goiás, Goiânia, Goiás, 74 000, Brazil
| | - Kleber Sf Silva
- Laboratório de Biologia Molecular, Instituto de Ciências Biológicas, Universidade Federal de Goiás, Goiânia, Goiás, 74 000, Brazil
| | - Raisa M Lima
- Laboratório de Biologia Molecular, Instituto de Ciências Biológicas, Universidade Federal de Goiás, Goiânia, Goiás, 74 000, Brazil
| | - Lívia C Silva
- Laboratório de Biologia Molecular, Instituto de Ciências Biológicas, Universidade Federal de Goiás, Goiânia, Goiás, 74 000, Brazil
| | - Maristela Pereira
- Laboratório de Biologia Molecular, Instituto de Ciências Biológicas, Universidade Federal de Goiás, Goiânia, Goiás, 74 000, Brazil
| |
Collapse
|
3
|
Lu X, Chen G, Li J, Hu X, Sun F. MAGCN: A Multiple Attention Graph Convolution Networks for Predicting Synthetic Lethality. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:2681-2689. [PMID: 36374879 DOI: 10.1109/tcbb.2022.3221736] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
Synthetic lethality (SL) is a potential cancer therapeutic strategy and drug discovery. Computational approaches to identify synthetic lethality genes have become an effective complement to wet experiments which are time consuming and costly. Graph convolutional networks (GCN) has been utilized to such prediction task as be good at capturing the neighborhood dependency in a graph. However, it is still a lack of the mechanism of aggregating the complementary neighboring information from various heterogeneous graphs. Here, we propose the Multiple Attention Graph Convolution Networks for predicting synthetic lethality (MAGCN). First, we obtain the functional similarity features and topological structure features of genes from different data sources respectively, such as Gene Ontology data and Protein-Protein Interaction. Then, graph convolutional network is utilized to accumulate the knowledge from neighbor nodes according to synthetic lethal associations. Meanwhile, we propose a multiple graphs attention model and construct a multiple graphs attention network to learn the contribution factors of different graphs to generate embedded representation by aggregating these graphs. Finally, the generated feature matrix is decoded to predict potential synthetic lethal interaction. Experimental results show that MAGCN is superior to other baseline methods. Case study demonstrates the ability of MAGCN to predict human SL gene pairs.
Collapse
|
4
|
Vora DS, Kalakoti Y, Sundar D. Computational Methods and Deep Learning for Elucidating Protein Interaction Networks. Methods Mol Biol 2023; 2553:285-323. [PMID: 36227550 DOI: 10.1007/978-1-0716-2617-7_15] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
Protein interactions play a critical role in all biological processes, but experimental identification of protein interactions is a time- and resource-intensive process. The advances in next-generation sequencing and multi-omics technologies have greatly benefited large-scale predictions of protein interactions using machine learning methods. A wide range of tools have been developed to predict protein-protein, protein-nucleic acid, and protein-drug interactions. Here, we discuss the applications, methods, and challenges faced when employing the various prediction methods. We also briefly describe ways to overcome the challenges and prospective future developments in the field of protein interaction biology.
Collapse
Affiliation(s)
- Dhvani Sandip Vora
- Department of Biochemical Engineering and Biotechnology, Indian Institute of Technology Delhi, Hauz Khas, New Delhi, India
| | - Yogesh Kalakoti
- Department of Biochemical Engineering and Biotechnology, Indian Institute of Technology Delhi, Hauz Khas, New Delhi, India
| | - Durai Sundar
- Department of Biochemical Engineering and Biotechnology, Indian Institute of Technology Delhi, Hauz Khas, New Delhi, India.
- School of Artificial Intelligence, Indian Institute of Technology Delhi, Hauz Khas, New Delhi, India.
| |
Collapse
|
5
|
Sarker B, Khare N, Devignes MD, Aridhi S. Improving automatic GO annotation with semantic similarity. BMC Bioinformatics 2022; 23:433. [PMID: 36510133 PMCID: PMC9743508 DOI: 10.1186/s12859-022-04958-7] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2022] [Accepted: 09/19/2022] [Indexed: 12/14/2022] Open
Abstract
BACKGROUND Automatic functional annotation of proteins is an open research problem in bioinformatics. The growing number of protein entries in public databases, for example in UniProtKB, poses challenges in manual functional annotation. Manual annotation requires expert human curators to search and read related research articles, interpret the results, and assign the annotations to the proteins. Thus, it is a time-consuming and expensive process. Therefore, designing computational tools to perform automatic annotation leveraging the high quality manual annotations that already exist in UniProtKB/SwissProt is an important research problem RESULTS: In this paper, we extend and adapt the GrAPFI (graph-based automatic protein function inference) (Sarker et al. in BMC Bioinform 21, 2020; Sarker et al., in: Proceedings of 7th international conference on complex networks and their applications, Cambridge, 2018) method for automatic annotation of proteins with gene ontology (GO) terms renaming it as GrAPFI-GO. The original GrAPFI method uses label propagation in a similarity graph where proteins are linked through the domains, families, and superfamilies that they share. Here, we also explore various types of similarity measures based on common neighbors in the graph. Moreover, GO terms are arranged in a hierarchical manner according to semantic parent-child relations. Therefore, we propose an efficient pruning and post-processing technique that integrates both semantic similarity and hierarchical relations between the GO terms. We produce experimental results comparing the GrAPFI-GO method with and without considering common neighbors similarity. We also test the performance of GrAPFI-GO and other annotation tools for GO annotation on a benchmark of proteins with and without the proposed pruning and post-processing procedure. CONCLUSION Our results show that the proposed semantic hierarchical post-processing potentially improves the performance of GrAPFI-GO and of other annotation tools as well. Thus, GrAPFI-GO exposes an original efficient and reusable procedure, to exploit the semantic relations among the GO terms in order to improve the automatic annotation of protein functions.
Collapse
Affiliation(s)
- Bishnu Sarker
- grid.29172.3f0000 0001 2194 6418CNRS, Inria, LORIA, University of Lorraine, 54000 Nancy, France ,grid.443078.c0000 0004 0371 4228Khulna University of Engineering and Technology, Khulna, Bangladesh ,grid.259870.10000 0001 0286 752XSchool of Applied Computational Sciences, Meharry Medical College, Nashville, TN USA
| | - Navya Khare
- grid.29172.3f0000 0001 2194 6418CNRS, Inria, LORIA, University of Lorraine, 54000 Nancy, France ,grid.419361.80000 0004 1759 7632International Institute of Information Technology, Hyderabad, India
| | | | - Sabeur Aridhi
- grid.29172.3f0000 0001 2194 6418CNRS, Inria, LORIA, University of Lorraine, 54000 Nancy, France
| |
Collapse
|
6
|
Hu S, Luo Y, Zhang Z, Xiong H, Yan W, Jiang M, Zhao B. Protein function annotation based on heterogeneous biological networks. BMC Bioinformatics 2022; 23:493. [DOI: 10.1186/s12859-022-05057-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2022] [Accepted: 11/15/2022] [Indexed: 11/19/2022] Open
Abstract
Abstract
Background
Accurate annotation of protein function is the key to understanding life at the molecular level and has great implications for biomedicine and pharmaceuticals. The rapid developments of high-throughput technologies have generated huge amounts of protein–protein interaction (PPI) data, which prompts the emergence of computational methods to determine protein function. Plagued by errors and noises hidden in PPI data, these computational methods have undertaken to focus on the prediction of functions by integrating the topology of protein interaction networks and multi-source biological data. Despite effective improvement of these computational methods, it is still challenging to build a suitable network model for integrating multiplex biological data.
Results
In this paper, we constructed a heterogeneous biological network by initially integrating original protein interaction networks, protein-domain association data and protein complexes. To prove the effectiveness of the heterogeneous biological network, we applied the propagation algorithm on this network, and proposed a novel iterative model, named Propagate on Heterogeneous Biological Networks (PHN) to score and rank functions in descending order from all functional partners, Finally, we picked out top L of these predicted functions as candidates to annotate the target protein. Our comprehensive experimental results demonstrated that PHN outperformed seven other competing approaches using cross-validation. Experimental results indicated that PHN performs significantly better than competing methods and improves the Area Under the Receiver-Operating Curve (AUROC) in Biological Process (BP), Molecular Function (MF) and Cellular Components (CC) by no less than 33%, 15% and 28%, respectively.
Conclusions
We demonstrated that integrating multi-source data into a heterogeneous biological network can preserve the complex relationship among multiplex biological data and improve the prediction accuracy of protein function by getting rid of the constraints of errors in PPI networks effectively. PHN, our proposed method, is effective for protein function prediction.
Collapse
|
7
|
Sengupta K, Saha S, Halder AK, Chatterjee P, Nasipuri M, Basu S, Plewczynski D. PFP-GO: Integrating protein sequence, domain and protein-protein interaction information for protein function prediction using ranked GO terms. Front Genet 2022; 13:969915. [PMID: 36246645 PMCID: PMC9556876 DOI: 10.3389/fgene.2022.969915] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2022] [Accepted: 08/31/2022] [Indexed: 11/13/2022] Open
Abstract
Protein function prediction is gradually emerging as an essential field in biological and computational studies. Though the latter has clinched a significant footprint, it has been observed that the application of computational information gathered from multiple sources has more significant influence than the one derived from a single source. Considering this fact, a methodology, PFP-GO, is proposed where heterogeneous sources like Protein Sequence, Protein Domain, and Protein-Protein Interaction Network have been processed separately for ranking each individual functional GO term. Based on this ranking, GO terms are propagated to the target proteins. While Protein sequence enriches the sequence-based information, Protein Domain and Protein-Protein Interaction Networks embed structural/functional and topological based information, respectively, during the phase of GO ranking. Performance analysis of PFP-GO is also based on Precision, Recall, and F-Score. The same was found to perform reasonably better when compared to the other existing state-of-art. PFP-GO has achieved an overall Precision, Recall, and F-Score of 0.67, 0.58, and 0.62, respectively. Furthermore, we check some of the top-ranked GO terms predicted by PFP-GO through multilayer network propagation that affect the 3D structure of the genome. The complete source code of PFP-GO is freely available at https://sites.google.com/view/pfp-go/.
Collapse
Affiliation(s)
- Kaustav Sengupta
- Laboratory of Functional and Structural Genomics, Center of New Technologies, University of Warsaw, Warsaw, Poland
- Department of Computer Science and Engineering, Jadavpur University, Kolkata, India
- Laboratory of Bioinformatics and Computational Genomics, Faculty of Mathematics and Information Science, Warsaw University of Technology, Warsaw, Poland
| | - Sovan Saha
- Department of Computer Science and Engineering, Institute of Engineering and Management, Kolkata, West Bengal, India
| | - Anup Kumar Halder
- Laboratory of Functional and Structural Genomics, Center of New Technologies, University of Warsaw, Warsaw, Poland
- Laboratory of Bioinformatics and Computational Genomics, Faculty of Mathematics and Information Science, Warsaw University of Technology, Warsaw, Poland
| | - Piyali Chatterjee
- Department of Computer Science and Engineering, Netaji Subhash Engineering College, Kolkata, India
| | - Mita Nasipuri
- Department of Computer Science and Engineering, Jadavpur University, Kolkata, India
| | - Subhadip Basu
- Department of Computer Science and Engineering, Jadavpur University, Kolkata, India
- *Correspondence: Subhadip Basu, Dariusz Plewczynski,
| | - Dariusz Plewczynski
- Laboratory of Functional and Structural Genomics, Center of New Technologies, University of Warsaw, Warsaw, Poland
- Laboratory of Bioinformatics and Computational Genomics, Faculty of Mathematics and Information Science, Warsaw University of Technology, Warsaw, Poland
- *Correspondence: Subhadip Basu, Dariusz Plewczynski,
| |
Collapse
|
8
|
Lazarsfeld J, Rodriguez J, Erden M, Liu Y, Cowen LJ. Majority Vote Cascading: A Semi-Supervised Framework for Improving Protein Function Prediction. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:1933-1945. [PMID: 33591921 DOI: 10.1109/tcbb.2021.3059812] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
A method to improve protein function prediction for sparsely annotated PPI networks is introduced. The method extends the DSD majority vote algorithm introduced by Cao et al. to give confidence scores on predicted labels and to use predictions of high confidence to predict the labels of other nodes in subsequent rounds. We call this a majority vote cascade. Several cascade variants are tested in a stringent cross-validation experiment on PPI networks from S. cerevisiae and D. melanogaster, and we show that for many different settings with several alternative confidence functions, cascading improves the accuracy of the predictions. A list of the most confident new label predictions in the two networks is also reported. Code and networks for the cross-validation experiments appear at http://bcb.cs.tufts.edu/cascade.
Collapse
|
9
|
Hu S, Zhang Z, Xiong H, Jiang M, Luo Y, Yan W, Zhao B. A tensor-based bi-random walks model for protein function prediction. BMC Bioinformatics 2022; 23:199. [PMID: 35637427 PMCID: PMC9150346 DOI: 10.1186/s12859-022-04747-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2022] [Accepted: 05/24/2022] [Indexed: 11/26/2022] Open
Abstract
Background The accurate characterization of protein functions is critical to understanding life at the molecular level and has a huge impact on biomedicine and pharmaceuticals. Computationally predicting protein function has been studied in the past decades. Plagued by noise and errors in protein–protein interaction (PPI) networks, researchers have undertaken to focus on the fusion of multi-omics data in recent years. A data model that appropriately integrates network topologies with biological data and preserves their intrinsic characteristics is still a bottleneck and an aspirational goal for protein function prediction. Results In this paper, we propose the RWRT (Random Walks with Restart on Tensor) method to accomplish protein function prediction by applying bi-random walks on the tensor. RWRT firstly constructs a functional similarity tensor by combining protein interaction networks with multi-omics data derived from domain annotation and protein complex information. After this, RWRT extends the bi-random walks algorithm from a two-dimensional matrix to the tensor for scoring functional similarity between proteins. Finally, RWRT filters out possible pretenders based on the concept of cohesiveness coefficient and annotates target proteins with functions of the remaining functional partners. Experimental results indicate that RWRT performs significantly better than the state-of-the-art methods and improves the area under the receiver-operating curve (AUROC) by no less than 18%. Conclusions The functional similarity tensor offers us an alternative, in that it is a collection of networks sharing the same nodes; however, the edges belong to different categories or represent interactions of different nature. We demonstrate that the tensor-based random walk model can not only discover more partners with similar functions but also free from the constraints of errors in protein interaction networks effectively. We believe that the performance of function prediction depends greatly on whether we can extract and exploit proper functional similarity information on protein correlations. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-022-04747-2.
Collapse
Affiliation(s)
- Sai Hu
- College of Computer Engineering and Applied Mathematics, Changsha University, Changsha, 410022, Hunan, China
| | - Zhihong Zhang
- College of Computer Engineering and Applied Mathematics, Changsha University, Changsha, 410022, Hunan, China.,Hunan Provincial Key Laboratory of Industrial Internet Technology and Security, Changsha University, Changsha, 410022, Hunan, China
| | - Huijun Xiong
- College of Computer Engineering and Applied Mathematics, Changsha University, Changsha, 410022, Hunan, China
| | - Meiping Jiang
- Department of Ultrasound, Hunan Provincial Maternal and Child Health Care Hospital, Changsha, 410008, Hunan, China.,NHC Key Laboratory of Birth Defect for Research and Prevention, Hunan Provincial Maternal and Child Health Care Hospital), Changsha, 410100, Hunan, China
| | - Yingchun Luo
- Department of Ultrasound, Hunan Provincial Maternal and Child Health Care Hospital, Changsha, 410008, Hunan, China.,NHC Key Laboratory of Birth Defect for Research and Prevention, Hunan Provincial Maternal and Child Health Care Hospital), Changsha, 410100, Hunan, China
| | - Wei Yan
- College of Computer Engineering and Applied Mathematics, Changsha University, Changsha, 410022, Hunan, China
| | - Bihai Zhao
- College of Computer Engineering and Applied Mathematics, Changsha University, Changsha, 410022, Hunan, China. .,Hunan Provincial Key Laboratory of Industrial Internet Technology and Security, Changsha University, Changsha, 410022, Hunan, China.
| |
Collapse
|
10
|
Abstract
Since the large-scale experimental characterization of protein–protein interactions (PPIs) is not possible for all species, several computational PPI prediction methods have been developed that harness existing data from other species. While PPI network prediction has been extensively used in eukaryotes, microbial network inference has lagged behind. However, bacterial interactomes can be built using the same principles and techniques; in fact, several methods are better suited to bacterial genomes. These predicted networks allow systems-level analyses in species that lack experimental interaction data. This review describes the current network inference and analysis techniques and summarizes the use of computationally-predicted microbial interactomes to date.
Collapse
|
11
|
Kabir MN, Wong L. EnsembleFam: towards more accurate protein family prediction in the twilight zone. BMC Bioinformatics 2022; 23:90. [PMID: 35287576 PMCID: PMC8919565 DOI: 10.1186/s12859-022-04626-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2021] [Accepted: 03/02/2022] [Indexed: 11/30/2022] Open
Abstract
Background Current protein family modeling methods like profile Hidden Markov Model (pHMM), k-mer based methods, and deep learning-based methods do not provide very accurate protein function prediction for proteins in the twilight zone, due to low sequence similarity to reference proteins with known functions. Results We present a novel method EnsembleFam, aiming at better function prediction for proteins in the twilight zone. EnsembleFam extracts the core characteristics of a protein family using similarity and dissimilarity features calculated from sequence homology relations. EnsembleFam trains three separate Support Vector Machine (SVM) classifiers for each family using these features, and an ensemble prediction is made to classify novel proteins into these families. Extensive experiments are conducted using the Clusters of Orthologous Groups (COG) dataset and G Protein-Coupled Receptor (GPCR) dataset. EnsembleFam not only outperforms state-of-the-art methods on the overall dataset but also provides a much more accurate prediction for twilight zone proteins. Conclusions EnsembleFam, a machine learning method to model protein families, can be used to better identify members with very low sequence homology. Using EnsembleFam protein functions can be predicted using just sequence information with better accuracy than state-of-the-art methods.
Collapse
Affiliation(s)
- Mohammad Neamul Kabir
- Department of Computer Science, National University of Singapore, 13 Computing Drive, 117417, Singapore, Singapore.
| | - Limsoon Wong
- Department of Computer Science, National University of Singapore, 13 Computing Drive, 117417, Singapore, Singapore
| |
Collapse
|
12
|
Redhu N, Thakur Z. Network biology and applications. Bioinformatics 2022. [DOI: 10.1016/b978-0-323-89775-4.00024-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022] Open
|
13
|
Zhou J, Xiong W, Wang Y, Guan J. Protein Function Prediction Based on PPI Networks: Network Reconstruction vs Edge Enrichment. Front Genet 2022; 12:758131. [PMID: 34970299 PMCID: PMC8712557 DOI: 10.3389/fgene.2021.758131] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2021] [Accepted: 11/11/2021] [Indexed: 01/21/2023] Open
Abstract
Over the past decades, massive amounts of protein-protein interaction (PPI) data have been accumulated due to the advancement of high-throughput technologies, and but data quality issues (noise or incompleteness) of PPI have been still affecting protein function prediction accuracy based on PPI networks. Although two main strategies of network reconstruction and edge enrichment have been reported on the effectiveness of boosting the prediction performance in numerous literature studies, there still lack comparative studies of the performance differences between network reconstruction and edge enrichment. Inspired by the question, this study first uses three protein similarity metrics (local, global and sequence) for network reconstruction and edge enrichment in PPI networks, and then evaluates the performance differences of network reconstruction, edge enrichment and the original networks on two real PPI datasets. The experimental results demonstrate that edge enrichment work better than both network reconstruction and original networks. Moreover, for the edge enrichment of PPI networks, the sequence similarity outperformes both local and global similarity. In summary, our study can help biologists select suitable pre-processing schemes and achieve better protein function prediction for PPI networks.
Collapse
Affiliation(s)
- Jiaogen Zhou
- Jiangsu Provincial Engineering Research Center for Intelligent Monitoring and Ecological Management of Pond and Reservoir Water Environment, Huaiyin Normal University, Huian, China
| | - Wei Xiong
- Shanghai Key Lab of Intelligent Information Processing, and School of Computer Science, Fudan University, Shanghai, China
| | - Yang Wang
- Department of Computer Science and Technology, Tongji University, Shanghai, China
| | - Jihong Guan
- Department of Computer Science and Technology, Tongji University, Shanghai, China
| |
Collapse
|
14
|
Paul M, Anand A. A New Family of Similarity Measures for Scoring Confidence of Protein Interactions Using Gene Ontology. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:19-30. [PMID: 34029194 DOI: 10.1109/tcbb.2021.3083150] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
The large-scale protein-protein interaction (PPI) data has the potential to play a significant role in the endeavor of understanding cellular processes. However, the presence of a considerable fraction of false positives is a bottleneck in realizing this potential. There have been continuous efforts to utilize complementary resources for scoring confidence of PPIs in a manner that false positive interactions get a low confidence score. Gene Ontology (GO), a taxonomy of biological terms to represent the properties of gene products and their relations, has been widely used for this purpose. We utilize GO to introduce a new set of specificity measures: Relative Depth Specificity (RDS), Relative Node-based Specificity (RNS), and Relative Edge-based Specificity (RES), leading to a new family of similarity measures. We use these similarity measures to obtain a confidence score for each PPI. We evaluate the new measures using four different benchmarks. We show that all the three measures are quite effective. Notably, RNS and RES more effectively distinguish true PPIs from false positives than the existing alternatives. RES also shows a robust set-discriminating power and can be useful for protein functional clustering as well.
Collapse
|
15
|
Cappellato M, Baruzzo G, Patuzzi I, Di Camillo B. Modeling Microbial Community Networks: Methods and Tools. Curr Genomics 2021; 22:267-290. [PMID: 35273458 PMCID: PMC8822226 DOI: 10.2174/1389202921999200905133146] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2020] [Revised: 07/22/2020] [Accepted: 07/29/2020] [Indexed: 11/22/2022] Open
Abstract
In the current research landscape, microbiota composition studies are of extreme interest, since it has been widely shown that resident microorganisms affect and shape the ecological niche they inhabit. This complex micro-world is characterized by different types of interactions. Understanding these relationships provides a useful tool for decoding the causes and effects of communities' organizations. Next-Generation Sequencing technologies allow to reconstruct the internal composition of the whole microbial community present in a sample. Sequencing data can then be investigated through statistical and computational method coming from network theory to infer the network of interactions among microbial species. Since there are several network inference approaches in the literature, in this paper we tried to shed light on their main characteristics and challenges, providing a useful tool not only to those interested in using the methods, but also to those who want to develop new ones. In addition, we focused on the frameworks used to produce synthetic data, starting from the simulation of network structures up to their integration with abundance models, with the aim of clarifying the key points of the entire generative process.
Collapse
Affiliation(s)
| | | | | | - Barbara Di Camillo
- Address correspondence to this author at the Department of Information Engineering, University of Padova, Padova, Italy; E-mail:
| |
Collapse
|
16
|
Arsenescu V, Devkota K, Erden M, Shpilker P, Werenski M, Cowen LJ. MUNDO: protein function prediction embedded in a multispecies world. BIOINFORMATICS ADVANCES 2021; 2:vbab025. [PMID: 36699351 PMCID: PMC9710620 DOI: 10.1093/bioadv/vbab025] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/15/2021] [Revised: 09/11/2021] [Accepted: 09/23/2021] [Indexed: 01/28/2023]
Abstract
Motivation Leveraging cross-species information in protein function prediction can add significant power to network-based protein function prediction methods, because so much functional information is conserved across at least close scales of evolution. We introduce MUNDO, a new cross-species co-embedding method that combines a single-network embedding method with a co-embedding method to predict functional annotations in a target species, leveraging also functional annotations in a model species network. Results Across a wide range of parameter choices, MUNDO performs best at predicting annotations in the mouse network, when trained on mouse and human protein-protein interaction (PPI) networks, in the human network, when trained on human and mouse PPIs, and in Baker's yeast, when trained on Fission and Baker's yeast, as compared to competitor methods. MUNDO also outperforms all the cross-species methods when predicting in Fission yeast when trained on Fission and Baker's yeast; however, in this single case, discarding the information from the other species and using annotations from the Fission yeast network alone usually performs best. Availability and implementation All code is available and can be accessed here: github.com/v0rtex20k/MUNDO. Supplementary information Supplementary data are available at Bioinformatics Advances online. Additional experimental results are on our github site.
Collapse
Affiliation(s)
- Victor Arsenescu
- Department of Computer Science, Tufts University, Medford, MA 02155, USA
| | - Kapil Devkota
- Department of Computer Science, Tufts University, Medford, MA 02155, USA
| | - Mert Erden
- Department of Computer Science, Tufts University, Medford, MA 02155, USA
| | - Polina Shpilker
- Department of Computer Science, Tufts University, Medford, MA 02155, USA
| | - Matthew Werenski
- Department of Computer Science, Tufts University, Medford, MA 02155, USA
| | - Lenore J Cowen
- Department of Computer Science, Tufts University, Medford, MA 02155, USA
| |
Collapse
|
17
|
Peng J, Kuang L, Zhang Z, Tan Y, Chen Z, Wang L. A Novel Model for Identifying Essential Proteins Based on Key Target Convergence Sets. Front Genet 2021; 12:721486. [PMID: 34394201 PMCID: PMC8358660 DOI: 10.3389/fgene.2021.721486] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2021] [Accepted: 06/30/2021] [Indexed: 11/20/2022] Open
Abstract
In recent years, many computational models have been designed to detect essential proteins based on protein-protein interaction (PPI) networks. However, due to the incompleteness of PPI networks, the prediction accuracy of these models is still not satisfactory. In this manuscript, a novel key target convergence sets based prediction model (KTCSPM) is proposed to identify essential proteins. In KTCSPM, a weighted PPI network and a weighted (Domain-Domain Interaction) network are constructed first based on known PPIs and PDIs downloaded from benchmark databases. And then, by integrating these two kinds of networks, a novel weighted PDI network is built. Next, through assigning a unique key target convergence set (KTCS) for each node in the weighted PDI network, an improved method based on the random walk with restart is designed to identify essential proteins. Finally, in order to evaluate the predictive effects of KTCSPM, it is compared with 12 competitive state-of-the-art models, and experimental results show that KTCSPM can achieve better prediction accuracy. Considering the satisfactory predictive performance achieved by KTCSPM, it indicates that KTCSPM might be a good supplement to the future research on prediction of essential proteins.
Collapse
Affiliation(s)
- Jiaxin Peng
- College of Computer, Xiangtan University, Xiangtan, China.,College of Computer Engineering and Applied Mathematics, Changsha University, Changsha, China
| | - Linai Kuang
- College of Computer, Xiangtan University, Xiangtan, China
| | - Zhen Zhang
- College of Computer Engineering and Applied Mathematics, Changsha University, Changsha, China
| | - Yihong Tan
- College of Computer Engineering and Applied Mathematics, Changsha University, Changsha, China
| | - Zhiping Chen
- College of Computer Engineering and Applied Mathematics, Changsha University, Changsha, China
| | - Lei Wang
- College of Computer, Xiangtan University, Xiangtan, China.,College of Computer Engineering and Applied Mathematics, Changsha University, Changsha, China
| |
Collapse
|
18
|
Matchado MS, Lauber M, Reitmeier S, Kacprowski T, Baumbach J, Haller D, List M. Network analysis methods for studying microbial communities: A mini review. Comput Struct Biotechnol J 2021; 19:2687-2698. [PMID: 34093985 PMCID: PMC8131268 DOI: 10.1016/j.csbj.2021.05.001] [Citation(s) in RCA: 109] [Impact Index Per Article: 36.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2021] [Revised: 05/01/2021] [Accepted: 05/01/2021] [Indexed: 12/20/2022] Open
Abstract
Microorganisms including bacteria, fungi, viruses, protists and archaea live as communities in complex and contiguous environments. They engage in numerous inter- and intra- kingdom interactions which can be inferred from microbiome profiling data. In particular, network-based approaches have proven helpful in deciphering complex microbial interaction patterns. Here we give an overview of state-of-the-art methods to infer intra-kingdom interactions ranging from simple correlation- to complex conditional dependence-based methods. We highlight common biases encountered in microbial profiles and discuss mitigation strategies employed by different tools and their trade-off with increased computational complexity. Finally, we discuss current limitations that motivate further method development to infer inter-kingdom interactions and to robustly and comprehensively characterize microbial environments in the future.
Collapse
Affiliation(s)
- Monica Steffi Matchado
- Chair of Experimental Bioinformatics, Technical University of Munich, 85354 Freising, Germany
| | - Michael Lauber
- Chair of Experimental Bioinformatics, Technical University of Munich, 85354 Freising, Germany
| | - Sandra Reitmeier
- ZIEL - Institute for Food & Health, Technical University of Munich, 85354 Freising, Germany
- Chair of Nutrition and Immunology, Technical University of Munich, 85354 Freising, Germany
| | - Tim Kacprowski
- Division Data Science in Biomedicine, Peter L. Reichertz Institute for Medical Informatics, TU Braunschweig and Hannover Medical School, 38106 Brunswick, Germany
- Braunschweig Integrated Centre of Systems Biology (BRICS), 38106 Brunswick, Germany
| | - Jan Baumbach
- Institute of Mathematics and Computer Science, University of Southern Denmark, 5230 Odense, Denmark
- Chair of Computational Systems Biology, University of Hamburg, 22607 Hamburg, Germany
| | - Dirk Haller
- ZIEL - Institute for Food & Health, Technical University of Munich, 85354 Freising, Germany
- Chair of Nutrition and Immunology, Technical University of Munich, 85354 Freising, Germany
| | - Markus List
- Chair of Experimental Bioinformatics, Technical University of Munich, 85354 Freising, Germany
| |
Collapse
|
19
|
Chen Q, Li Y, Tan K, Qiao Y, Pan S, Jiang T, Chen YPP. Network-based methods for gene function prediction. Brief Funct Genomics 2021; 20:249-257. [PMID: 33686431 DOI: 10.1093/bfgp/elab006] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2020] [Revised: 01/25/2021] [Accepted: 01/26/2021] [Indexed: 12/23/2022] Open
Abstract
The rapid development of high-throughput technology has generated a large number of biological networks. Network-based methods are able to provide rich information for inferring gene function. This is composed of analyzing the topological characteristics of genes in related networks, integrating biological information, and considering data from different data sources. To promote network biology and related biotechnology research, this article provides a survey for the state of the art of advanced methods of network-based gene function prediction and discusses the potential challenges.
Collapse
Affiliation(s)
- Qingfeng Chen
- University of Technology Sydney, China and Hundred-Talent Program
| | - Yongjie Li
- School of Computer and Electronic Information at Guangxi University
| | - Kai Tan
- School of Computer and Electronic Information at Guangxi University
| | - Yvlu Qiao
- School of Computer and Electronic Information at Guangxi University
| | - Shirui Pan
- Computer science from the University of Technology Sydney
| | - Taijiao Jiang
- Suzhou Institute of System Medicine, Institute of Basic Medical Sciences, Chinese Academy of Medical Sciences & Peking Union Medical College
| | - Yi-Ping Phoebe Chen
- Department of Computer Science and Computer Engineering, La Trobe University, Melbourne, Australia
| |
Collapse
|
20
|
GTB-PPI: Predict Protein-protein Interactions Based on L1-regularized Logistic Regression and Gradient Tree Boosting. GENOMICS PROTEOMICS & BIOINFORMATICS 2021; 18:582-592. [PMID: 33515750 PMCID: PMC8377384 DOI: 10.1016/j.gpb.2021.01.001] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/05/2018] [Revised: 12/21/2019] [Accepted: 05/12/2020] [Indexed: 11/20/2022]
Abstract
Protein–protein interactions (PPIs) are of great importance to understand genetic mechanisms, delineate disease pathogenesis, and guide drug design. With the increase of PPI data and development of machine learning technologies, prediction and identification of PPIs have become a research hotspot in proteomics. In this study, we propose a new prediction pipeline for PPIs based on gradient tree boosting (GTB). First, the initial feature vector is extracted by fusing pseudo amino acid composition (PseAAC), pseudo position-specific scoring matrix (PsePSSM), reduced sequence and index-vectors (RSIV), and autocorrelation descriptor (AD). Second, to remove redundancy and noise, we employ L1-regularized logistic regression (L1-RLR) to select an optimal feature subset. Finally, GTB-PPI model is constructed. Five-fold cross-validation showed that GTB-PPI achieved the accuracies of 95.15% and 90.47% on Saccharomyces cerevisiae and Helicobacter pylori datasets, respectively. In addition, GTB-PPI could be applied to predict the independent test datasets for Caenorhabditis elegans, Escherichia coli, Homo sapiens, and Mus musculus, the one-core PPI network for CD9, and the crossover PPI network for the Wnt-related signaling pathways. The results show that GTB-PPI can significantly improve accuracy of PPI prediction. The code and datasets of GTB-PPI can be downloaded from https://github.com/QUST-AIBBDRC/GTB-PPI/.
Collapse
|
21
|
Montaño KJ, Loukas A, Sotillo J. Proteomic approaches to drive advances in helminth extracellular vesicle research. Mol Immunol 2021; 131:1-5. [PMID: 33440289 DOI: 10.1016/j.molimm.2020.12.030] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2020] [Revised: 12/14/2020] [Accepted: 12/21/2020] [Indexed: 12/17/2022]
Abstract
Helminths can interact with their hosts in many different ways, including through the secretion of soluble molecules (such as lipids, glycans and proteins) and extracellular vesicles (EVs). The field of helminth secreted EVs has significantly advanced in recent years, mainly due to the molecular characterisation of EV proteomes and research highlighting the potential of EVs and their constituent molecules in the diagnosis and control of parasitic infections. Despite these advancements, the lack of appropriate isolation and purification methods is impeding the discovery of suitable biomarkers for the differentiation of helminth EV populations. In the present review we offer our viewpoint on the different proteomic techniques and approaches that have been developed, as well as solutions to common pitfalls and challenges that could be applied to advance the study of helminth EVs.
Collapse
Affiliation(s)
- Karen J Montaño
- Centro Nacional de Microbiología, Instituto de Salud Carlos III, Majadahonda, Madrid, Spain
| | - Alex Loukas
- Centre for Molecular Therapeutics, Australian Institute of Tropical Health and Medicine, James Cook University, Cairns, Australia
| | - Javier Sotillo
- Centro Nacional de Microbiología, Instituto de Salud Carlos III, Majadahonda, Madrid, Spain.
| |
Collapse
|
22
|
Reyna MA, Chitra U, Elyanow R, Raphael BJ. NetMix: A Network-Structured Mixture Model for Reduced-Bias Estimation of Altered Subnetworks. J Comput Biol 2021; 28:469-484. [PMID: 33400606 DOI: 10.1089/cmb.2020.0435] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open
Abstract
A classic problem in computational biology is the identification of altered subnetworks: subnetworks of an interaction network that contain genes/proteins that are differentially expressed, highly mutated, or otherwise aberrant compared with other genes/proteins. Numerous methods have been developed to solve this problem under various assumptions, but the statistical properties of these methods are often unknown. For example, some widely used methods are reported to output very large subnetworks that are difficult to interpret biologically. In this work, we formulate the identification of altered subnetworks as the problem of estimating the parameters of a class of probability distributions that we call the Altered Subset Distribution (ASD). We derive a connection between a popular method, jActiveModules, and the maximum likelihood estimator (MLE) of the ASD. We show that the MLE is statistically biased, explaining the large subnetworks output by jActiveModules. Based on these insights, we introduce NetMix, an algorithm that uses Gaussian mixture models to obtain less biased estimates of the parameters of the ASD. We demonstrate that NetMix outperforms existing methods in identifying altered subnetworks on both simulated and real data, including the identification of differentially expressed genes from both microarray and RNA-seq experiments and the identification of cancer driver genes in somatic mutation data.
Collapse
Affiliation(s)
- Matthew A Reyna
- Department of Biomedical Informatics, Emory University, Atlanta, Georgia, USA
| | - Uthsav Chitra
- Department of Computer Science, Princeton University, Princeton, New Jersey, USA
| | - Rebecca Elyanow
- Department of Computer Science, Princeton University, Princeton, New Jersey, USA
- Department of Computer Science, Brown University, Providence, Rhode Island, USA
| | - Benjamin J Raphael
- Department of Computer Science, Princeton University, Princeton, New Jersey, USA
| |
Collapse
|
23
|
Interactomes: Experimental and In Silico Approaches. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2021; 1346:107-117. [DOI: 10.1007/978-3-030-80352-0_6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
24
|
Review on Learning and Extracting Graph Features for Link Prediction. MACHINE LEARNING AND KNOWLEDGE EXTRACTION 2020. [DOI: 10.3390/make2040036] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
Abstract
Link prediction in complex networks has attracted considerable attention from interdisciplinary research communities, due to its ubiquitous applications in biological networks, social networks, transportation networks, telecommunication networks, and, recently, knowledge graphs. Numerous studies utilized link prediction approaches in order sto find missing links or predict the likelihood of future links as well as employed for reconstruction networks, recommender systems, privacy control, etc. This work presents an extensive review of state-of-art methods and algorithms proposed on this subject and categorizes them into four main categories: similarity-based methods, probabilistic methods, relational models, and learning-based methods. Additionally, a collection of network data sets has been presented in this paper, which can be used in order to study link prediction. We conclude this study with a discussion of recent developments and future research directions.
Collapse
|
25
|
Han Y, Cheng L, Sun W. Analysis of Protein-Protein Interaction Networks through Computational Approaches. Protein Pept Lett 2020; 27:265-278. [PMID: 31692419 DOI: 10.2174/0929866526666191105142034] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2019] [Revised: 05/08/2019] [Accepted: 09/26/2019] [Indexed: 01/02/2023]
Abstract
The interactions among proteins and genes are extremely important for cellular functions. Molecular interactions at protein or gene levels can be used to construct interaction networks in which the interacting species are categorized based on direct interactions or functional similarities. Compared with the limited experimental techniques, various computational tools make it possible to analyze, filter, and combine the interaction data to get comprehensive information about the biological pathways. By the efficient way of integrating experimental findings in discovering PPIs and computational techniques for prediction, the researchers have been able to gain many valuable data on PPIs, including some advanced databases. Moreover, many useful tools and visualization programs enable the researchers to establish, annotate, and analyze biological networks. We here review and list the computational methods, databases, and tools for protein-protein interaction prediction.
Collapse
Affiliation(s)
- Ying Han
- Cardiovascular Department, The Fourth Affiliated Hospital of Harbin Medical University, Harbin, China
| | - Liang Cheng
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Weiju Sun
- Cardiovascular Department, The First Affiliated Hospital of Harbin Medical University, Harbin, China
| |
Collapse
|
26
|
Quan Y, Zhang QY, Lv BM, Xu RF, Zhang HY. Genome-wide pathogenesis interpretation using a heat diffusion-based systems genetics method and implications for gene function annotation. Mol Genet Genomic Med 2020; 8:e1456. [PMID: 32869547 PMCID: PMC7549611 DOI: 10.1002/mgg3.1456] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2020] [Revised: 07/08/2020] [Accepted: 07/27/2020] [Indexed: 12/27/2022] Open
Abstract
Background Genetics is best dedicated to interpreting pathogenesis and revealing gene functions. The past decade has witnessed unprecedented progress in genetics, particularly in genome‐wide identification of disorder variants through Genome‐Wide Association Studies (GWAS) and Phenome‐Wide Association Studies (PheWAS). However, it is still a great challenge to use GWAS/PheWAS‐derived data to elucidate pathogenesis. Methods In this study, we used HotNet2, a heat diffusion‐based systems genetics algorithm, to calculate the networks for disease genes obtained from GWAS and PheWAS, with an attempt to get deeper insights into disease pathogenesis at a molecular level. Results Through HotNet2 calculation, significant networks for 202 (for GWAS) and 167 (for PheWAS) types of diseases were identified and evaluated, respectively. The GWAS‐derived disease networks exhibit a stronger biomedical relevance than PheWAS counterparts. Therefore, the GWAS‐derived networks were used for pathogenesis interpretation by integrating the accumulated biomedical information. As a result, the pathogenesis for 64 diseases was elucidated in terms of mutation‐caused abnormal transcriptional regulation, and 47 diseases were preliminarily interpreted in terms of mutation‐caused varied protein‐protein interactions. In addition, 3,802 genes (including 46 function‐unknown genes) were assigned with new functions by disease network information, some of which were validated through mice gene knockout experiments. Conclusions Systems genetics algorithm HotNet2 can efficiently establish genotype‐phenotype links at the level of biological networks. Compared with original GWAS/PheWAS results, HotNet2‐calculated disease‐gene associations have stronger biomedical significance, hence provide better interpretations for the pathogenesis of genome‐wide variants, and offer new insights into gene functions as well. These results are also helpful in drug development.
Collapse
Affiliation(s)
- Yuan Quan
- School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen Graduate School, Shenzhen, China.,Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan, China
| | - Qing-Ye Zhang
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan, China
| | - Bo-Min Lv
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan, China
| | - Rui-Feng Xu
- School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen Graduate School, Shenzhen, China
| | - Hong-Yu Zhang
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan, China
| |
Collapse
|
27
|
Ranjan A, Fahad MS, Fernandez-Baca D, Deepak A, Tripathi S. Deep Robust Framework for Protein Function Prediction Using Variable-Length Protein Sequences. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:1648-1659. [PMID: 30998479 DOI: 10.1109/tcbb.2019.2911609] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
The order of amino acids in a protein sequence enables the protein to acquire a conformation suitable for performing functions, thereby motivating the need to analyze these sequences for predicting functions. Although machine learning based approaches are fast compared to methods using BLAST, FASTA, etc., they fail to perform well for long protein sequences (with more than 300 amino acids). In this paper, we introduce a novel method for construction of two separate feature sets for protein using bi-directional long short-term memory network based on the analysis of fixed 1) single-sized segments and 2) multi-sized segments. The model trained on the proposed feature set based on multi-sized segments is combined with the model trained using state-of-the-art Multi-label Linear Discriminant Analysis (MLDA) features to further improve the accuracy. Extensive evaluations using separate datasets for biological processes and molecular functions demonstrate not only improved results for long sequences, but also significantly improve the overall accuracy over state-of-the-art method. The single-sized approach produces an improvement of +3.37 percent for biological processes and +5.48 percent for molecular functions over the MLDA based classifier. The corresponding numbers for multi-sized approach are +5.38 and +8.00 percent. Combining the two models, the accuracy further improves to +7.41 and +9.21 percent, respectively.
Collapse
|
28
|
NPF:network propagation for protein function prediction. BMC Bioinformatics 2020; 21:355. [PMID: 32787776 PMCID: PMC7430911 DOI: 10.1186/s12859-020-03663-7] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/29/2020] [Accepted: 07/14/2020] [Indexed: 11/29/2022] Open
Abstract
Background The accurate annotation of protein functions is of great significance in elucidating the phenomena of life, treating disease and developing new medicines. Various methods have been developed to facilitate the prediction of these functions by combining protein interaction networks (PINs) with multi-omics data. However, it is still challenging to make full use of multiple biological to improve the performance of functions annotation. Results We presented NPF (Network Propagation for Functions prediction), an integrative protein function predicting framework assisted by network propagation and functional module detection, for discovering interacting partners with similar functions to target proteins. NPF leverages knowledge of the protein interaction network architecture and multi-omics data, such as domain annotation and protein complex information, to augment protein-protein functional similarity in a propagation manner. We have verified the great potential of NPF for accurately inferring protein functions. According to the comprehensive evaluation of NPF, it delivered a better performance than other competing methods in terms of leave-one-out cross-validation and ten-fold cross validation. Conclusions We demonstrated that network propagation, together with multi-omics data, can both discover more partners with similar function, and is unconstricted by the “small-world” feature of protein interaction networks. We conclude that the performance of function prediction depends greatly on whether we can extract and exploit proper functional information of similarity from protein correlations.
Collapse
|
29
|
Paul M, Anand A. Impact of low-confidence interactions on computational identification of protein complexes. J Bioinform Comput Biol 2020; 18:2050025. [PMID: 32757809 DOI: 10.1142/s0219720020500250] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Protein complexes are the cornerstones of most of the biological processes. Identifying protein complexes is crucial in understanding the principles of cellular organization with several important applications, including in disease diagnosis. Several computational techniques have been developed to identify protein complexes from protein-protein interaction (PPI) data (equivalently, from PPI networks). These PPI data have a significant amount of false positives, which is a bottleneck in identifying protein complexes correctly. Gene ontology (GO)-based semantic similarity measures can be used to assign a confidence score to PPIs. Consequently, low-confidence PPIs are highly likely to be false positives. In this paper, we systematically study the impact of low-confidence PPIs on the performance of complex detection methods using GO-based semantic similarity measures. We consider five state-of-the-art complex detection algorithms and nine GO-based similarity measures in the evaluation. We find that each complex detection algorithm significantly improves its performance after the filtration of low-similarity scored PPIs. It is also observed that the percentage improvement and the filtration percentage (of low-confidence PPIs) are highly correlated.
Collapse
Affiliation(s)
- Madhusudan Paul
- Department of Computer Science and Engineering, Indian Institute of Technology Guwahati, Guwahati 781039, Assam, India.,Department of Computer and System Sciences, Visva-Bharati, Santiniketan 731235, West Bengal, India
| | - Ashish Anand
- Department of Computer Science and Engineering, Indian Institute of Technology Guwahati, Guwahati 781039, Assam, India
| |
Collapse
|
30
|
Saha S, Prasad A, Chatterjee P, Basu S, Nasipuri M. Protein function prediction from dynamic protein interaction network using gene expression data. J Bioinform Comput Biol 2020; 17:1950025. [PMID: 31617461 DOI: 10.1142/s0219720019500252] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Computational prediction of functional annotation of proteins is an uphill task. There is an ever increasing gap between functional characterization of protein sequences and deluge of protein sequences generated by large-scale sequencing projects. The dynamic nature of protein interactions is frequently observed which is mostly influenced by any new change of state or change in stimuli. Functional characterization of proteins can be inferred from their interactions with each other, which is dynamic in nature. In this work, we have used a dynamic protein-protein interaction network (PPIN), time course gene expression data and protein sequence information for prediction of functional annotation of proteins. During progression of a particular function, it has also been observed that not all the proteins are active at all time points. For unannotated active proteins, our proposed methodology explores the dynamic PPIN consisting of level-1 and level-2 neighboring proteins at different time points, filtered by Damerau-Levenshtein edit distance to estimate the similarity between two protein sequences and coefficient variation methods to assess the strength of an edge in a network. Finally, from the filtered dynamic PPIN, at each time point, functional annotations of the level-2 proteins are assigned to the unknown and unannotated active proteins through the level-1 neighbor, following a bottom-up strategy. Our proposed methodology achieves an average precision, recall and F-Score of 0.59, 0.76 and 0.61 respectively, which is significantly higher than the reported state-of-the-art methods.
Collapse
Affiliation(s)
- Sovan Saha
- Department of Computer Science & Engineering, Dr. Sudhir Chandra Sur Degree Engineering College, 540, Dum Dum Road, Near Dum Dum Jn. Station, Surermath, Kolkata 700074, India
| | - Abhimanyu Prasad
- Department of Computer Science & Engineering, Dr. Sudhir Chandra Sur Degree Engineering College, 540, Dum Dum Road, Near Dum Dum Jn. Station, Surermath, Kolkata 700074, India
| | - Piyali Chatterjee
- Department of Computer Science & Engineering, Netaji Subhash Engineering College, Techno City, Panchpota, Garia, Kolkata 700152, India
| | - Subhadip Basu
- Department of Computer Science & Engineering, Jadavpur University, 188, Raja S.C. Mallick Road, Kolkata 700032, India
| | - Mita Nasipuri
- Department of Computer Science & Engineering, Netaji Subhash Engineering College, Techno City, Panchpota, Garia, Kolkata 700152, India
| |
Collapse
|
31
|
Niss K, Gomez-Casado C, Hjaltelin JX, Joeris T, Agace WW, Belling KG, Brunak S. Complete Topological Mapping of a Cellular Protein Interactome Reveals Bow-Tie Motifs as Ubiquitous Connectors of Protein Complexes. Cell Rep 2020; 31:107763. [PMID: 32553166 DOI: 10.1016/j.celrep.2020.107763] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2019] [Revised: 02/03/2020] [Accepted: 05/21/2020] [Indexed: 11/18/2022] Open
Abstract
The network topology of a protein interactome is shaped by the function of each protein, making it a resource of functional knowledge in tissues and in single cells. Today, this resource is underused, as complete network topology characterization has proved difficult for large protein interactomes. We apply a matrix visualization and decoding approach to a physical protein interactome of a dendritic cell, thereby characterizing its topology with no prior assumptions of structure. We discover 294 proteins, each forming topological motifs called "bow-ties" that tie together the majority of observed protein complexes. The central proteins of these bow-ties have unique network properties, display multifunctional capabilities, are enriched for essential proteins, and are widely expressed in other cells and tissues. Collectively, the bow-tie motifs are a pervasive and previously unnoted topological trend in cellular interactomes. As such, these results provide fundamental knowledge on how intracellular protein connectivity is organized and operates.
Collapse
Affiliation(s)
- Kristoffer Niss
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, 2200 Copenhagen, Denmark
| | - Cristina Gomez-Casado
- Immunology Section, Lund University, BMC D14, 221-84 Lund, Sweden; Institute of Applied Molecular Medicine, Faculty of Medicine, San Pablo CEU University, 28925 Madrid, Spain
| | - Jessica X Hjaltelin
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, 2200 Copenhagen, Denmark
| | - Thorsten Joeris
- Immunology Section, Lund University, BMC D14, 221-84 Lund, Sweden
| | - William W Agace
- Immunology Section, Lund University, BMC D14, 221-84 Lund, Sweden; Mucosal Immunology Group, Department of Health Technology, Technical University of Denmark, 2800 Kgs. Lyngby, Denmark
| | - Kirstine G Belling
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, 2200 Copenhagen, Denmark
| | - Søren Brunak
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, 2200 Copenhagen, Denmark.
| |
Collapse
|
32
|
Liu Y, Wu M, Liu C, Li XL, Zheng J. SL 2MF: Predicting Synthetic Lethality in Human Cancers via Logistic Matrix Factorization. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:748-757. [PMID: 30969932 DOI: 10.1109/tcbb.2019.2909908] [Citation(s) in RCA: 24] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Synthetic lethality (SL) is a promising concept for novel discovery of anti-cancer drug targets. However, wet-lab experiments for detecting SLs are faced with various challenges, such as high cost, low consistency across platforms, or cell lines. Therefore, computational prediction methods are needed to address these issues. This paper proposes a novel SL prediction method, named SL2 MF, which employs logistic matrix factorization to learn latent representations of genes from the observed SL data. The probability that two genes are likely to form SL is modeled by the linear combination of gene latent vectors. As known SL pairs are more trustworthy than unknown pairs, we design importance weighting schemes to assign higher importance weights for known SL pairs and lower importance weights for unknown pairs in SL2 MF. Moreover, we also incorporate biological knowledge about genes from protein-protein interaction (PPI) data and Gene Ontology (GO). In particular, we calculate the similarity between genes based on their GO annotations and topological properties in the PPI network. Extensive experiments on the SL interaction data from SynLethDB database have been conducted to demonstrate the effectiveness of SL2 MF.
Collapse
|
33
|
Sarker B, Ritchie DW, Aridhi S. GrAPFI: predicting enzymatic function of proteins from domain similarity graphs. BMC Bioinformatics 2020; 21:168. [PMID: 32349654 PMCID: PMC7191693 DOI: 10.1186/s12859-020-3460-7] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2019] [Accepted: 03/19/2020] [Indexed: 01/20/2023] Open
Abstract
An amendment to this paper has been published and can be accessed via the original article.
Collapse
Affiliation(s)
- Bishnu Sarker
- University of Lorraine, CNRS, Inria, LORIA, F-54000 Nancy, France
| | - David W Ritchie
- University of Lorraine, CNRS, Inria, LORIA, F-54000 Nancy, France
| | - Sabeur Aridhi
- University of Lorraine, CNRS, Inria, LORIA, F-54000 Nancy, France.
| |
Collapse
|
34
|
Grbić M, Matić D, Kartelj A, Vračević S, Filipović V. A three-phase method for identifying functionally related protein groups in weighted PPI networks. Comput Biol Chem 2020; 86:107246. [PMID: 32339914 DOI: 10.1016/j.compbiolchem.2020.107246] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2019] [Revised: 01/27/2020] [Accepted: 03/03/2020] [Indexed: 01/17/2023]
Abstract
Identifying significant protein groups is of great importance for further understanding protein functions. This paper introduces a novel three-phase heuristic method for identifying such groups in weighted PPI networks. In the first phase a variable neighborhood search (VNS) algorithm is applied on a weighted PPI network, in order to support protein complexes by adding a minimum number of new PPIs. In the second phase proteins from different complexes are merged into larger protein groups. In the third phase these groups are expanded by a number of 2-level neighbor proteins, favoring proteins that have higher average gene co-expression with the base group proteins. Experimental results show that: (i) the proposed VNS algorithm outperforms the existing approach described in literature and (ii) the above-mentioned three-phase method identifies protein groups with very high statistical significance.
Collapse
Affiliation(s)
- Milana Grbić
- University of Banjaluka, Faculty of Natural Sciences and Mathematics, Mladena Stojanovića 2, 78000 Banjaluka, Bosnia and Herzegovina.
| | - Dragan Matić
- University of Banjaluka, Faculty of Natural Sciences and Mathematics, Mladena Stojanovića 2, 78000 Banjaluka, Bosnia and Herzegovina.
| | - Aleksandar Kartelj
- University of Belgrade, Faculty of Mathematics, Studentski trg 16/IV 11 000, Belgrade, Serbia.
| | - Savka Vračević
- University of Banjaluka, Faculty of Natural Sciences and Mathematics, Mladena Stojanovića 2, 78000 Banjaluka, Bosnia and Herzegovina.
| | - Vladimir Filipović
- University of Belgrade, Faculty of Mathematics, Studentski trg 16/IV 11 000, Belgrade, Serbia.
| |
Collapse
|
35
|
Khan IK, Jain A, Rawi R, Bensmail H, Kihara D. Prediction of protein group function by iterative classification on functional relevance network. Bioinformatics 2020; 35:1388-1394. [PMID: 30192921 DOI: 10.1093/bioinformatics/bty787] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2018] [Revised: 08/28/2018] [Accepted: 09/04/2018] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Biological experiments including proteomics and transcriptomics approaches often reveal sets of proteins that are most likely to be involved in a disease/disorder. To understand the functional nature of a set of proteins, it is important to capture the function of the proteins as a group, even in cases where function of individual proteins is not known. In this work, we propose a model that takes groups of proteins found to work together in a certain biological context, integrates them into functional relevance networks, and subsequently employs an iterative inference on graphical models to identify group functions of the proteins, which are then extended to predict function of individual proteins. RESULTS The proposed algorithm, iterative group function prediction (iGFP), depicts proteins as a graph that represents functional relevance of proteins considering their known functional, proteomics and transcriptional features. Proteins in the graph will be clustered into groups by their mutual functional relevance, which is iteratively updated using a probabilistic graphical model, the conditional random field. iGFP showed robust accuracy even when substantial amount of GO annotations were missing. The perspective of 'group' function annotation opens up novel approaches for understanding functional nature of proteins in biological systems.Availability and implementation: http://kiharalab.org/iGFP/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Ishita K Khan
- Department of Computer Science, Purdue University, West Lafayette, IN, USA.,eBay Search Science, San Jose, CA, USA
| | - Aashish Jain
- Department of Computer Science, Purdue University, West Lafayette, IN, USA
| | - Reda Rawi
- Qatar Computing Research Institute, Hamad Bin Khalifa University, Doha, Qatar.,Vaccine Research Center, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD, USA
| | - Halima Bensmail
- Qatar Computing Research Institute, Hamad Bin Khalifa University, Doha, Qatar
| | - Daisuke Kihara
- Department of Computer Science, Purdue University, West Lafayette, IN, USA.,Department of Biological Sciences, Purdue University, West Lafayette, IN, USA
| |
Collapse
|
36
|
Handling Noise in Protein Interaction Networks. BIOMED RESEARCH INTERNATIONAL 2019; 2019:8984248. [PMID: 31828144 PMCID: PMC6885184 DOI: 10.1155/2019/8984248] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/17/2019] [Accepted: 09/23/2019] [Indexed: 12/22/2022]
Abstract
Protein-protein interactions (PPIs) can be conveniently represented as networks, allowing the use of graph theory for their study. Network topology studies may reveal patterns associated with specific organisms. Here, we propose a new methodology to denoise PPI networks and predict missing links solely based on the network topology, the organization measurement (OM) method. The OM methodology was applied in the denoising of the PPI networks of two Saccharomyces cerevisiae datasets (Yeast and CS2007) and one Homo sapiens dataset (Human). To evaluate the denoising capabilities of the OM methodology, two strategies were applied. The first strategy compared its application in random networks and in the reference set networks, while the second strategy perturbed the networks with the gradual random addition and removal of edges. The application of the OM methodology to the Yeast and Human reference sets achieved an AUC of 0.95 and 0.87, in Yeast and Human networks, respectively. The random removal of 80% of the Yeast and Human reference set interactions resulted in an AUC of 0.71 and 0.62, whereas the random addition of 80% interactions resulted in an AUC of 0.75 and 0.72, respectively. Applying the OM methodology to the CS2007 dataset yields an AUC of 0.99. We also perturbed the network of the CS2007 dataset by randomly inserting and removing edges in the same proportions previously described. The false positives identified and removed from the network varied from 97%, when inserting 20% more edges, to 89%, when 80% more edges were inserted. The true positives identified and inserted in the network varied from 95%, when removing 20% of the edges, to 40%, after the random deletion of 80% edges. The OM methodology is sensitive to the topological structure of the biological networks. The obtained results suggest that the present approach can efficiently be used to denoise PPI networks.
Collapse
|
37
|
Huang G. Computational Models or Methods for Protein Function Prediction. CURR PROTEOMICS 2019. [DOI: 10.2174/157016461605190510114117] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Affiliation(s)
- Guohua Huang
- Provincial Key Laboratory of Informational Service for Rural Area of Southwestern Hunan Shaoyang University Shaoyang, Shaoyang, Hunan 422000, China
| |
Collapse
|
38
|
Chen W, Li W, Huang G, Flavel M. The Applications of Clustering Methods in Predicting Protein Functions. CURR PROTEOMICS 2019. [DOI: 10.2174/1570164616666181212114612] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Background:
The understanding of protein function is essential to the study of biological
processes. However, the prediction of protein function has been a difficult task for bioinformatics to
overcome. This has resulted in many scholars focusing on the development of computational methods
to address this problem.
Objective:
In this review, we introduce the recently developed computational methods of protein function
prediction and assess the validity of these methods. We then introduce the applications of clustering
methods in predicting protein functions.
Collapse
Affiliation(s)
- Weiyang Chen
- College of Information, Qilu University of Technology (Shandong Academy of Sciences), Jinan, China
| | - Weiwei Li
- College of Information, Qilu University of Technology (Shandong Academy of Sciences), Jinan, China
| | - Guohua Huang
- College of Information Engineering, Shaoyang University, Shaoyang, Hunan 422000, China
| | - Matthew Flavel
- School of Life Sciences, La Trobe University, Bundoora, Vic 3083, Australia
| |
Collapse
|
39
|
Zhao B, Zhao Y, Zhang X, Zhang Z, Zhang F, Wang L. An iteration method for identifying yeast essential proteins from heterogeneous network. BMC Bioinformatics 2019; 20:355. [PMID: 31234779 PMCID: PMC6591974 DOI: 10.1186/s12859-019-2930-2] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2019] [Accepted: 06/04/2019] [Indexed: 02/02/2023] Open
Abstract
BACKGROUND Essential proteins are distinctly important for an organism's survival and development and crucial to disease analysis and drug design as well. Large-scale protein-protein interaction (PPI) data sets exist in Saccharomyces cerevisiae, which provides us with a valuable opportunity to predict identify essential proteins from PPI networks. Many network topology-based computational methods have been designed to detect essential proteins. However, these methods are limited by the completeness of available PPI data. To break out of these restraints, some computational methods have been proposed by integrating PPI networks and multi-source biological data. Despite the progress in the research of multiple data fusion, it is still challenging to improve the prediction accuracy of the computational methods. RESULTS In this paper, we design a novel iterative model for essential proteins prediction, named Randomly Walking in the Heterogeneous Network (RWHN). In RWHN, a weighted protein-protein interaction network and a domain-domain association network are constructed according to the original PPI network and the known protein-domain association network, firstly. And then, we establish a new heterogeneous matrix by combining the two constructed networks with the protein-domain association network. Based on the heterogeneous matrix, a transition probability matrix is established by normalized operation. Finally, an improved PageRank algorithm is adopted on the heterogeneous network for essential proteins prediction. In order to eliminate the influence of the false negative, information on orthologous proteins and the subcellular localization information of proteins are integrated to initialize the score vector of proteins. In RWHN, the topology, conservative and functional features of essential proteins are all taken into account in the prediction process. The experimental results show that RWHN obviously exceeds in predicting essential proteins ten other competing methods. CONCLUSIONS We demonstrated that integrating multi-source data into a heterogeneous network can preserve the complex relationship among multiple biological data and improve the prediction accuracy of essential proteins. RWHN, our proposed method, is effective for the prediction of essential proteins.
Collapse
Affiliation(s)
- Bihai Zhao
- College of Computer Engineering and Applied Mathematics, Changsha University, Changsha, Hunan 410022 People’s Republic of China
- Hunan Provincial Key Laboratory of Nutrition and Quality Control of Aquatic Animals, Department of Biological and Environmental Engineering, Changsha University, Changsha, Hunan 410022 China
| | - Yulin Zhao
- College of Computer Engineering and Applied Mathematics, Changsha University, Changsha, Hunan 410022 People’s Republic of China
| | - Xiaoxia Zhang
- College of Computer Engineering and Applied Mathematics, Changsha University, Changsha, Hunan 410022 People’s Republic of China
| | - Zhihong Zhang
- College of Computer Engineering and Applied Mathematics, Changsha University, Changsha, Hunan 410022 People’s Republic of China
| | - Fan Zhang
- College of Computer Engineering and Applied Mathematics, Changsha University, Changsha, Hunan 410022 People’s Republic of China
| | - Lei Wang
- College of Computer Engineering and Applied Mathematics, Changsha University, Changsha, Hunan 410022 People’s Republic of China
- College of Information Engineering, Xiangtan University, Xiangtan, 411105 Hunan China
| |
Collapse
|
40
|
Saha S, Chatterjee P, Basu S, Nasipuri M, Plewczynski D. FunPred 3.0: improved protein function prediction using protein interaction network. PeerJ 2019; 7:e6830. [PMID: 31198622 PMCID: PMC6535044 DOI: 10.7717/peerj.6830] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2018] [Accepted: 03/21/2019] [Indexed: 11/23/2022] Open
Abstract
Proteins are the most versatile macromolecules in living systems and perform crucial biological functions. In the advent of the post-genomic era, the next generation sequencing is done routinely at the population scale for a variety of species. The challenging problem is to massively determine the functions of proteins that are yet not characterized by detailed experimental studies. Identification of protein functions experimentally is a laborious and time-consuming task involving many resources. We therefore propose the automated protein function prediction methodology using in silico algorithms trained on carefully curated experimental datasets. We present the improved protein function prediction tool FunPred 3.0, an extended version of our previous methodology FunPred 2, which exploits neighborhood properties in protein–protein interaction network (PPIN) and physicochemical properties of amino acids. Our method is validated using the available functional annotations in the PPIN network of Saccharomyces cerevisiae in the latest Munich information center for protein (MIPS) dataset. The PPIN data of S. cerevisiae in MIPS dataset includes 4,554 unique proteins in 13,528 protein–protein interactions after the elimination of the self-replicating and the self-interacting protein pairs. Using the developed FunPred 3.0 tool, we are able to achieve the mean precision, the recall and the F-score values of 0.55, 0.82 and 0.66, respectively. FunPred 3.0 is then used to predict the functions of unpredicted protein pairs (incomplete and missing functional annotations) in MIPS dataset of S. cerevisiae. The method is also capable of predicting the subcellular localization of proteins along with its corresponding functions. The code and the complete prediction results are available freely at: https://github.com/SovanSaha/FunPred-3.0.git.
Collapse
Affiliation(s)
- Sovan Saha
- Department of Computer Science and Engineering, Dr. Sudhir Chandra Sur Degree Engineering College, Kolkata, West Bengal, India
| | - Piyali Chatterjee
- Department of Computer Science and Engineering, Netaji Subhash Engineering College, Kolkata, India
| | - Subhadip Basu
- Department of Computer Science and Engineering, Jadavpur University, Kolkata, West Bengal, India
| | - Mita Nasipuri
- Department of Computer Science and Engineering, Jadavpur University, Kolkata, West Bengal, India
| | - Dariusz Plewczynski
- Laboratory of Functional and Structural Genomics, Centre of New Technologies, University of Warsaw, Warsaw, Poland.,Faculty of Mathematics and Information Science, Warsaw University of Technology, Warsaw, Poland
| |
Collapse
|
41
|
Ye W, Ji G, Ye P, Long Y, Xiao X, Li S, Su Y, Wu X. scNPF: an integrative framework assisted by network propagation and network fusion for preprocessing of single-cell RNA-seq data. BMC Genomics 2019; 20:347. [PMID: 31068142 PMCID: PMC6505295 DOI: 10.1186/s12864-019-5747-5] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2018] [Accepted: 04/29/2019] [Indexed: 12/15/2022] Open
Abstract
Background Single-cell RNA-sequencing (scRNA-seq) is fast becoming a powerful tool for profiling genome-scale transcriptomes of individual cells and capturing transcriptome-wide cell-to-cell variability. However, scRNA-seq technologies suffer from high levels of technical noise and variability, hindering reliable quantification of lowly and moderately expressed genes. Since most downstream analyses on scRNA-seq, such as cell type clustering and differential expression analysis, rely on the gene-cell expression matrix, preprocessing of scRNA-seq data is a critical preliminary step in the analysis of scRNA-seq data. Results We presented scNPF, an integrative scRNA-seq preprocessing framework assisted by network propagation and network fusion, for recovering gene expression loss, correcting gene expression measurements, and learning similarities between cells. scNPF leverages the context-specific topology inherent in the given data and the priori knowledge derived from publicly available molecular gene-gene interaction networks to augment gene-gene relationships in a data driven manner. We have demonstrated the great potential of scNPF in scRNA-seq preprocessing for accurately recovering gene expression values and learning cell similarity networks. Comprehensive evaluation of scNPF across a wide spectrum of scRNA-seq data sets showed that scNPF achieved comparable or higher performance than the competing approaches according to various metrics of internal validation and clustering accuracy. We have made scNPF an easy-to-use R package, which can be used as a versatile preprocessing plug-in for most existing scRNA-seq analysis pipelines or tools. Conclusions scNPF is a universal tool for preprocessing of scRNA-seq data, which jointly incorporates the global topology of priori interaction networks and the context-specific information encapsulated in the scRNA-seq data to capture both shared and complementary knowledge from diverse data sources. scNPF could be used to recover gene signatures and learn cell-to-cell similarities from emerging scRNA-seq data to facilitate downstream analyses such as dimension reduction, cell type clustering, and visualization. Electronic supplementary material The online version of this article (10.1186/s12864-019-5747-5) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Wenbin Ye
- Department of Automation, Xiamen University, Xiamen, 361005, China.,Xiamen Research Institute of National Center of Healthcare Big Data, Xiamen, China
| | - Guoli Ji
- Department of Automation, Xiamen University, Xiamen, 361005, China.,Xiamen Research Institute of National Center of Healthcare Big Data, Xiamen, China.,Innovation Center for Cell Biology, Xiamen University, Xiamen, 361005, China
| | - Pengchao Ye
- Department of Automation, Xiamen University, Xiamen, 361005, China.,Xiamen Research Institute of National Center of Healthcare Big Data, Xiamen, China
| | - Yuqi Long
- Software Quality Testing Engineering Research Center, China Electronic Product Reliability and Environmental Testing Research Institute, Guangzhou, 510610, China
| | - Xuesong Xiao
- Department of Automation, Xiamen University, Xiamen, 361005, China.,Xiamen Research Institute of National Center of Healthcare Big Data, Xiamen, China
| | - Shuchao Li
- Department of Automation, Xiamen University, Xiamen, 361005, China.,Xiamen Research Institute of National Center of Healthcare Big Data, Xiamen, China
| | - Yaru Su
- College of Mathematics and Computer Science, Fuzhou University, Fuzhou, 350116, China
| | - Xiaohui Wu
- Department of Automation, Xiamen University, Xiamen, 361005, China. .,Xiamen Research Institute of National Center of Healthcare Big Data, Xiamen, China. .,Innovation Center for Cell Biology, Xiamen University, Xiamen, 361005, China.
| |
Collapse
|
42
|
Li SS, Zhao XB, Tian JM, Wang HR, Wei TH. Prediction of seed gene function in progressive diabetic neuropathy by a network-based inference method. Exp Ther Med 2019; 17:4176-4182. [PMID: 31007748 PMCID: PMC6468912 DOI: 10.3892/etm.2019.7441] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2018] [Accepted: 03/07/2019] [Indexed: 11/07/2022] Open
Abstract
Guilt by association (GBA) algorithm has been widely used to statistically predict gene functions, and network-based approach increases the confidence and veracity of identifying molecular signatures for diseases. This work proposed a network-based GBA method by integrating the GBA algorithm and network, to identify seed gene functions for progressive diabetic neuropathy (PDN). The inference of predicting seed gene functions comprised of three steps: i) Preparing gene lists and sets; ii) constructing a co-expression matrix (CEM) on gene lists by Spearman correlation coefficient (SCC) method and iii) predicting gene functions by GBA algorithm. Ultimately, seed gene functions were selected according to the area under the receiver operating characteristics curve (AUC) index. A total of 79 differentially expressed genes (DEGs) and 40 background gene ontology (GO) terms were regarded as gene lists and sets for the subsequent analyses, respectively. The predicted results obtained from the network-based GBA approach showed that 27.5% of all gene sets had a good classified performance with AUC >0.5. Most significantly, 3 gene sets with AUC >0.6 were denoted as seed gene functions for PDN, including binding, molecular function and regulation of the metabolic process. In summary, we predicted 3 seed gene functions for PDN compared with non-progressors utilizing network-based GBA algorithm. The findings provide insights to reveal pathological and molecular mechanism underlying PDN.
Collapse
Affiliation(s)
- Shan-Shan Li
- Department of Endocrinology, Linyi People's Hospital, Linyi, Shandong 276000, P.R. China
| | - Xin-Bo Zhao
- Department of Endocrinology, Linyi People's Hospital, Linyi, Shandong 276000, P.R. China
| | - Jia-Mei Tian
- Department of Pediatrics, Linyi People's Hospital, Linyi, Shandong 276000, P.R. China
| | - Hao-Ren Wang
- Department of Medicine, Linyi Luozhuang Central Hospital, Linyi, Shandong 276000, P.R. China
| | - Tong-Huan Wei
- Department of Medicine, People's Hospital of Linyi High-Tech Industrial Development Zone, Linyi, Shandong 276000, P.R. China
| |
Collapse
|
43
|
Kc K, Li R, Cui F, Yu Q, Haake AR. GNE: a deep learning framework for gene network inference by aggregating biological information. BMC SYSTEMS BIOLOGY 2019; 13:38. [PMID: 30953525 PMCID: PMC6449883 DOI: 10.1186/s12918-019-0694-y] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
Background The topological landscape of gene interaction networks provides a rich source of information for inferring functional patterns of genes or proteins. However, it is still a challenging task to aggregate heterogeneous biological information such as gene expression and gene interactions to achieve more accurate inference for prediction and discovery of new gene interactions. In particular, how to generate a unified vector representation to integrate diverse input data is a key challenge addressed here. Results We propose a scalable and robust deep learning framework to learn embedded representations to unify known gene interactions and gene expression for gene interaction predictions. These low- dimensional embeddings derive deeper insights into the structure of rapidly accumulating and diverse gene interaction networks and greatly simplify downstream modeling. We compare the predictive power of our deep embeddings to the strong baselines. The results suggest that our deep embeddings achieve significantly more accurate predictions. Moreover, a set of novel gene interaction predictions are validated by up-to-date literature-based database entries. Conclusion The proposed model demonstrates the importance of integrating heterogeneous information about genes for gene network inference. GNE is freely available under the GNU General Public License and can be downloaded from GitHub (https://github.com/kckishan/GNE). Electronic supplementary material The online version of this article (10.1186/s12918-019-0694-y) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Kishan Kc
- Golisano College of Computing and Information Sciences, Rochester Institute of Technology, 20 Lomb Memorial Drive, Rochester, New York, 14623, USA.
| | - Rui Li
- Golisano College of Computing and Information Sciences, Rochester Institute of Technology, 20 Lomb Memorial Drive, Rochester, New York, 14623, USA
| | - Feng Cui
- Thomas H. Gosnell School of Life Sciences, Rochester Institute of Technology, 84 Lomb Memorial Drive, Rochester, New York, 14623, USA
| | - Qi Yu
- Golisano College of Computing and Information Sciences, Rochester Institute of Technology, 20 Lomb Memorial Drive, Rochester, New York, 14623, USA
| | - Anne R Haake
- Golisano College of Computing and Information Sciences, Rochester Institute of Technology, 20 Lomb Memorial Drive, Rochester, New York, 14623, USA
| |
Collapse
|
44
|
Yang P, Yu S, Cheng L, Ning K. Meta-network: optimized species-species network analysis for microbial communities. BMC Genomics 2019; 20:187. [PMID: 30967118 PMCID: PMC6457071 DOI: 10.1186/s12864-019-5471-1] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
Background The explosive growth of microbiome data provides ample opportunities to gain a better understanding of the microbes and their interactions in microbial communities. Given these massive data, optimized data mining methods become important and necessary to perform deep and comprehensive analysis. Among the various priorities for microbiome data mining, the examination of species-species co-occurrence patterns becomes one of the key themes in urgent need. Results Hence, in this work, we propose the Meta-Network framework to lucubrate the microbial communities. Rooted in loose definitions of network (two species co-exist in a certain samples rather than all samples) as well as association rule mining (mining more complex forms of correlations like indirect correlation and mutual information), this framework outperforms other methods in restoring the microbial communities, based on two cohorts of microbial communities: (a) the loose definition strategy is capable to generate more reasonable relationships among species in the species-species co-occurrence network; (b) important species-species co-occurrence patterns could not be identified by other existing approaches, but could successfully generated by association rule mining. Conclusions Results have shown that the species-species co-occurrence network we generated are much more informative than those based on traditional methods. Meta-Network has consistently constructed more meaningful networks with biologically important clusters, hubs, and provides a general approach towards deciphering the species-species co-occurrence networks. Electronic supplementary material The online version of this article (10.1186/s12864-019-5471-1) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Pengshuo Yang
- Key Laboratory of Molecular Biophysics of the Ministry of Education, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan, 430074, Hubei, China
| | - Shaojun Yu
- Key Laboratory of Molecular Biophysics of the Ministry of Education, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan, 430074, Hubei, China
| | - Lin Cheng
- , Department of Engineering, Trinity College, 300 Summit Street, Hartford, CT, 06106, USA
| | - Kang Ning
- Key Laboratory of Molecular Biophysics of the Ministry of Education, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan, 430074, Hubei, China.
| |
Collapse
|
45
|
Pan ZG, Zhang XZ, Zhang ZM, Dong YJ. Optimal pathways involved in the treatment of sevoflurane or propofol for patients undergoing coronary artery bypass graft surgery. Exp Ther Med 2019; 17:3637-3643. [PMID: 30988747 PMCID: PMC6447764 DOI: 10.3892/etm.2019.7354] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2018] [Accepted: 02/14/2019] [Indexed: 01/02/2023] Open
Abstract
The cardio-protection mechanisms of sevoflurane and propofol still remain unclear in patients undergoing coronary artery bypass grafting (CABG). We designed the present study to identify the optimal pathways through integrating differential co-expressed network (DCN)-based guilt by association (GBA) principle based on the expression data of E-GEOD-4386 downloaded from EMBL-EBI. Differentially expressed genes (DEGs) were firstly identified and then DCN and sub-DCN were established. The seed pathways were predicted through GBA principle using the area under the curve (AUC) for pathway categories, and the pathway terms with AUC >0.9 were defined as the seed pathways. KEGG pathway analysis was applied to the DEGs based on DAVIA to detect significant pathways. The final optimal pathways were identified based on the traditional pathway analysis and network-based pathway inference approach. There were 83 common, 99 sevoflurane-specific and 4 propofol-specific DEGs in the expression profile of artial samples. Finally, 8 and 4 pathway terms having the AUC >0.9 were identified and determined as the seed pathways in the propofol and sevoflurane group, respectively. TNF signaling pathway, NF-κB signaling pathway, as well as NOD-like receptor signaling pathway were the common optimal ones in these two groups. Only the pathway of cytokine-cytokine receptor interaction was unique to sevoflurane, and no pathway was specific to propofol. Our results suggested that sevoflurane and propofol might synergistically possess some cardio-protective properties in patients undergoing CABG.
Collapse
Affiliation(s)
- Zhen-Guo Pan
- Department of Anesthesiology, The Second People's Hospital of Liaocheng, Linqing, Shandong 252600, P.R. China
| | - Xi-Zeng Zhang
- Department of Anesthesiology, The Second People's Hospital of Liaocheng, Linqing, Shandong 252600, P.R. China
| | - Zhi-Mei Zhang
- Department of Anesthesiology, The Second People's Hospital of Liaocheng, Linqing, Shandong 252600, P.R. China
| | - Yun-Jie Dong
- Department of Medical Administration, The Second People's Hospital of Liaocheng, Linqing, Shandong 252600, P.R. China
| |
Collapse
|
46
|
Abstract
This chapter is based on exploiting the network-based representations of proteins, metagraphs, in protein-protein interaction network to identify candidate disease-causing proteins. Protein-protein interaction (PPI) networks are effective tools in studying the functional roles of proteins in the development of various diseases. However, they are insufficient without the support of additional biological knowledge for proteins such as their molecular functions and biological processes. To enhance PPI networks, we utilize biological properties of individual proteins as well. More specifically, we integrate keywords from UniProt database describing protein properties into the PPI network and construct a novel heterogeneous PPI-Keyword (PPIK) network consisting of both proteins and keywords. As proteins with similar functional duties or involving in the same metabolic pathway tend to have similar topological characteristics, we propose to represent them with metagraphs. Compared to the traditional network motif or subgraph, a metagraph can capture the topological arrangements through not only the protein-protein interactions but also protein-keyword associations. We feed those novel metagraph representations into classifiers for disease protein prediction and conduct our experiments on three different PPI databases. They show that the proposed method consistently increases disease protein prediction performance across various classifiers, by 15.3% in AUC on average. It outperforms the diffusion-based (e.g., RWR) and the module-based baselines by 13.8-32.9% in overall disease protein prediction. Breast cancer protein prediction outperforms RWR, PRINCE, and the module-based baselines by 6.6-14.2%. Finally, our predictions also exhibit better correlations with literature findings from PubMed database.
Collapse
|
47
|
Zhang L, Yu G, Guo M, Wang J. Predicting protein-protein interactions using high-quality non-interacting pairs. BMC Bioinformatics 2018; 19:525. [PMID: 30598096 PMCID: PMC6311908 DOI: 10.1186/s12859-018-2525-3] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023] Open
Abstract
BACKGROUND Identifying protein-protein interactions (PPIs) is of paramount importance for understanding cellular processes. Machine learning-based approaches have been developed to predict PPIs, but the effectiveness of these approaches is unsatisfactory. One major reason is that they randomly choose non-interacting protein pairs (negative samples) or heuristically select non-interacting pairs with low quality. RESULTS To boost the effectiveness of predicting PPIs, we propose two novel approaches (NIP-SS and NIP-RW) to generate high quality non-interacting pairs based on sequence similarity and random walk, respectively. Specifically, the known PPIs collected from public databases are used to generate the positive samples. NIP-SS then selects the top-m dissimilar protein pairs as negative examples and controls the degree distribution of selected proteins to construct the negative dataset. NIP-RW performs random walk on the PPI network to update the adjacency matrix of the network, and then selects protein pairs not connected in the updated network as negative samples. Next, we use auto covariance (AC) descriptor to encode the feature information of amino acid sequences. After that, we employ deep neural networks (DNNs) to predict PPIs based on extracted features, positive and negative examples. Extensive experiments show that NIP-SS and NIP-RW can generate negative samples with higher quality than existing strategies and thus enable more accurate prediction. CONCLUSIONS The experimental results prove that negative datasets constructed by NIP-SS and NIP-RW can reduce the bias and have good generalization ability. NIP-SS and NIP-RW can be used as a plugin to boost the effectiveness of PPIs prediction. Codes and datasets are available at http://mlda.swu.edu.cn/codes.php?name=NIP .
Collapse
Affiliation(s)
- Long Zhang
- College of Computer and Information Sciences, Southwest University, Chongqing, China
| | - Guoxian Yu
- College of Computer and Information Sciences, Southwest University, Chongqing, China
| | - Maozu Guo
- School of Electrical and Information Engineering, Beijing University of Civil Engineering and Architecture, Beijing, China.,Beijing Key Laboratory of Intelligent Processing for Building Big Data, Beijing, China
| | - Jun Wang
- College of Computer and Information Sciences, Southwest University, Chongqing, China.
| |
Collapse
|
48
|
Integrating Multiple Interaction Networks for Gene Function Inference. Molecules 2018; 24:molecules24010030. [PMID: 30577643 PMCID: PMC6337127 DOI: 10.3390/molecules24010030] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2018] [Revised: 12/19/2018] [Accepted: 12/20/2018] [Indexed: 01/17/2023] Open
Abstract
In the past few decades, the number and variety of genomic and proteomic data available have increased dramatically. Molecular or functional interaction networks are usually constructed according to high-throughput data and the topological structure of these interaction networks provide a wealth of information for inferring the function of genes or proteins. It is a widely used way to mine functional information of genes or proteins by analyzing the association networks. However, it remains still an urgent but unresolved challenge how to combine multiple heterogeneous networks to achieve more accurate predictions. In this paper, we present a method named ReprsentConcat to improve function inference by integrating multiple interaction networks. The low-dimensional representation of each node in each network is extracted, then these representations from multiple networks are concatenated and fed to gcForest, which augment feature vectors by cascading and automatically determines the number of cascade levels. We experimentally compare ReprsentConcat with a state-of-the-art method, showing that it achieves competitive results on the datasets of yeast and human. Moreover, it is robust to the hyperparameters including the number of dimensions.
Collapse
|
49
|
Mezni H, Aridhi S, Hadjali A. The uncertain cloud: State of the art and research challenges. Int J Approx Reason 2018. [DOI: 10.1016/j.ijar.2018.09.009] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
50
|
Cannistraci CV. Modelling Self-Organization in Complex Networks Via a Brain-Inspired Network Automata Theory Improves Link Reliability in Protein Interactomes. Sci Rep 2018; 8:15760. [PMID: 30361555 PMCID: PMC6202355 DOI: 10.1038/s41598-018-33576-8] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2017] [Accepted: 09/24/2018] [Indexed: 01/14/2023] Open
Abstract
Protein interactomes are epitomes of incomplete and noisy networks. Methods for assessing link-reliability using exclusively topology are valuable in network biology, and their investigation facilitates the general understanding of topological mechanisms and models to draw and correct complex network connectivity. Here, I revise and extend the local-community-paradigm (LCP). Initially detected in brain-network topological self-organization and afterward generalized to any complex network, the LCP is a theory to model local-topology-dependent link-growth in complex networks using network automata. Four novel LCP-models are compared versus baseline local-topology-models. It emerges that the reliability of an interaction between two proteins is higher: (i) if their common neighbours are isolated in a complex (local-community) that has low tendency to interact with other external proteins; (ii) if they have a low propensity to link with other proteins external to the local-community. These two rules are mathematically combined in C1*: a proposed mechanistic model that, in fact, outperforms the others. This theoretical study elucidates basic topological rules behind self-organization principia of protein interactomes and offers the conceptual basis to extend this theory to any class of complex networks. The link-reliability improvement, based on the mere topology, can impact many applied domains such as systems biology and network medicine.
Collapse
Affiliation(s)
- Carlo Vittorio Cannistraci
- Biomedical Cybernetics Group, Biotechnology Center (BIOTEC), Center for Molecular and Cellular Bioengineering (CMCB), Center for Systems Biology Dresden (CSBD), Department of Physics, Technische Universität Dresden, Tatzberg 47/49, 01307, Dresden, Germany.
- Brain bio-inspired computing (BBC) lab, IRCCS Centro Neurolesi "Bonino Pulejo", Messina, Italy.
| |
Collapse
|