1
|
Guo H, Lv X, Li Y, Li M. Attention-based GCN integrates multi-omics data for breast cancer subtype classification and patient-specific gene marker identification. Brief Funct Genomics 2023; 22:463-474. [PMID: 37114942 DOI: 10.1093/bfgp/elad013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2022] [Revised: 02/16/2023] [Accepted: 03/17/2023] [Indexed: 04/29/2023] Open
Abstract
Breast cancer is a heterogeneous disease and can be divided into several subtypes with unique prognostic and molecular characteristics. The classification of breast cancer subtypes plays an important role in the precision treatment and prognosis of breast cancer. Benefitting from the relation-aware ability of a graph convolution network (GCN), we present a multi-omics integrative method, the attention-based GCN (AGCN), for breast cancer molecular subtype classification using messenger RNA expression, copy number variation and deoxyribonucleic acid methylation multi-omics data. In the extensive comparative studies, our AGCN models outperform state-of-the-art methods under different experimental conditions and both attention mechanisms and the graph convolution subnetwork play an important role in accurate cancer subtype classification. The layer-wise relevance propagation (LRP) algorithm is used for the interpretation of model decision, which can identify patient-specific important biomarkers that are reported to be related to the occurrence and development of breast cancer. Our results highlighted the effectiveness of the GCN and attention mechanisms in multi-omics integrative analysis and the implement of the LRP algorithm can provide biologically reasonable insights into model decision.
Collapse
Affiliation(s)
- Hui Guo
- College of Chemistry at Sichuan University
| | - Xiang Lv
- College of Chemistry at Sichuan University
| | - Yizhou Li
- College of Cyber Science and Engineering at Sichuan University
| | | |
Collapse
|
2
|
Hu S, Luo Y, Zhang Z, Xiong H, Yan W, Jiang M, Zhao B. Protein function annotation based on heterogeneous biological networks. BMC Bioinformatics 2022; 23:493. [DOI: 10.1186/s12859-022-05057-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2022] [Accepted: 11/15/2022] [Indexed: 11/19/2022] Open
Abstract
Abstract
Background
Accurate annotation of protein function is the key to understanding life at the molecular level and has great implications for biomedicine and pharmaceuticals. The rapid developments of high-throughput technologies have generated huge amounts of protein–protein interaction (PPI) data, which prompts the emergence of computational methods to determine protein function. Plagued by errors and noises hidden in PPI data, these computational methods have undertaken to focus on the prediction of functions by integrating the topology of protein interaction networks and multi-source biological data. Despite effective improvement of these computational methods, it is still challenging to build a suitable network model for integrating multiplex biological data.
Results
In this paper, we constructed a heterogeneous biological network by initially integrating original protein interaction networks, protein-domain association data and protein complexes. To prove the effectiveness of the heterogeneous biological network, we applied the propagation algorithm on this network, and proposed a novel iterative model, named Propagate on Heterogeneous Biological Networks (PHN) to score and rank functions in descending order from all functional partners, Finally, we picked out top L of these predicted functions as candidates to annotate the target protein. Our comprehensive experimental results demonstrated that PHN outperformed seven other competing approaches using cross-validation. Experimental results indicated that PHN performs significantly better than competing methods and improves the Area Under the Receiver-Operating Curve (AUROC) in Biological Process (BP), Molecular Function (MF) and Cellular Components (CC) by no less than 33%, 15% and 28%, respectively.
Conclusions
We demonstrated that integrating multi-source data into a heterogeneous biological network can preserve the complex relationship among multiplex biological data and improve the prediction accuracy of protein function by getting rid of the constraints of errors in PPI networks effectively. PHN, our proposed method, is effective for protein function prediction.
Collapse
|
3
|
Sengupta K, Saha S, Halder AK, Chatterjee P, Nasipuri M, Basu S, Plewczynski D. PFP-GO: Integrating protein sequence, domain and protein-protein interaction information for protein function prediction using ranked GO terms. Front Genet 2022; 13:969915. [PMID: 36246645 PMCID: PMC9556876 DOI: 10.3389/fgene.2022.969915] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2022] [Accepted: 08/31/2022] [Indexed: 11/13/2022] Open
Abstract
Protein function prediction is gradually emerging as an essential field in biological and computational studies. Though the latter has clinched a significant footprint, it has been observed that the application of computational information gathered from multiple sources has more significant influence than the one derived from a single source. Considering this fact, a methodology, PFP-GO, is proposed where heterogeneous sources like Protein Sequence, Protein Domain, and Protein-Protein Interaction Network have been processed separately for ranking each individual functional GO term. Based on this ranking, GO terms are propagated to the target proteins. While Protein sequence enriches the sequence-based information, Protein Domain and Protein-Protein Interaction Networks embed structural/functional and topological based information, respectively, during the phase of GO ranking. Performance analysis of PFP-GO is also based on Precision, Recall, and F-Score. The same was found to perform reasonably better when compared to the other existing state-of-art. PFP-GO has achieved an overall Precision, Recall, and F-Score of 0.67, 0.58, and 0.62, respectively. Furthermore, we check some of the top-ranked GO terms predicted by PFP-GO through multilayer network propagation that affect the 3D structure of the genome. The complete source code of PFP-GO is freely available at https://sites.google.com/view/pfp-go/.
Collapse
Affiliation(s)
- Kaustav Sengupta
- Laboratory of Functional and Structural Genomics, Center of New Technologies, University of Warsaw, Warsaw, Poland
- Department of Computer Science and Engineering, Jadavpur University, Kolkata, India
- Laboratory of Bioinformatics and Computational Genomics, Faculty of Mathematics and Information Science, Warsaw University of Technology, Warsaw, Poland
| | - Sovan Saha
- Department of Computer Science and Engineering, Institute of Engineering and Management, Kolkata, West Bengal, India
| | - Anup Kumar Halder
- Laboratory of Functional and Structural Genomics, Center of New Technologies, University of Warsaw, Warsaw, Poland
- Laboratory of Bioinformatics and Computational Genomics, Faculty of Mathematics and Information Science, Warsaw University of Technology, Warsaw, Poland
| | - Piyali Chatterjee
- Department of Computer Science and Engineering, Netaji Subhash Engineering College, Kolkata, India
| | - Mita Nasipuri
- Department of Computer Science and Engineering, Jadavpur University, Kolkata, India
| | - Subhadip Basu
- Department of Computer Science and Engineering, Jadavpur University, Kolkata, India
- *Correspondence: Subhadip Basu, Dariusz Plewczynski,
| | - Dariusz Plewczynski
- Laboratory of Functional and Structural Genomics, Center of New Technologies, University of Warsaw, Warsaw, Poland
- Laboratory of Bioinformatics and Computational Genomics, Faculty of Mathematics and Information Science, Warsaw University of Technology, Warsaw, Poland
- *Correspondence: Subhadip Basu, Dariusz Plewczynski,
| |
Collapse
|
4
|
Mansoor M, Nauman M, Rehman HU, Omar M. Gene Ontology Capsule GAN: an improved architecture for protein function prediction. PeerJ Comput Sci 2022; 8:e1014. [PMID: 36092003 PMCID: PMC9454774 DOI: 10.7717/peerj-cs.1014] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2022] [Accepted: 05/31/2022] [Indexed: 06/15/2023]
Abstract
Proteins are the core of all functions pertaining to living things. They consist of an extended amino acid chain folding into a three-dimensional shape that dictates their behavior. Currently, convolutional neural networks (CNNs) have been pivotal in predicting protein functions based on protein sequences. While it is a technology crucial to the niche, the computation cost and translational invariance associated with CNN make it impossible to detect spatial hierarchies between complex and simpler objects. Therefore, this research utilizes capsule networks to capture spatial information as opposed to CNNs. Since capsule networks focus on hierarchical links, they have a lot of potential for solving structural biology challenges. In comparison to the standard CNNs, our results exhibit an improvement in accuracy. Gene Ontology Capsule GAN (GOCAPGAN) achieved an F1 score of 82.6%, a precision score of 90.4% and recall score of 76.1%.
Collapse
Affiliation(s)
- Musadaq Mansoor
- National University of Computer and Emerging Sciences, Islamabad, Peshawar, KPK, Pakistan
| | - Mohammad Nauman
- National University of Computer and Emerging Sciences, Islamabad, Peshawar, KPK, Pakistan
| | - Hafeez Ur Rehman
- National University of Computer and Emerging Sciences, Islamabad, Peshawar, KPK, Pakistan
| | - Maryam Omar
- National University of Computer and Emerging Sciences, Islamabad, Peshawar, KPK, Pakistan
| |
Collapse
|
5
|
Hu S, Zhang Z, Xiong H, Jiang M, Luo Y, Yan W, Zhao B. A tensor-based bi-random walks model for protein function prediction. BMC Bioinformatics 2022; 23:199. [PMID: 35637427 PMCID: PMC9150346 DOI: 10.1186/s12859-022-04747-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2022] [Accepted: 05/24/2022] [Indexed: 11/26/2022] Open
Abstract
Background The accurate characterization of protein functions is critical to understanding life at the molecular level and has a huge impact on biomedicine and pharmaceuticals. Computationally predicting protein function has been studied in the past decades. Plagued by noise and errors in protein–protein interaction (PPI) networks, researchers have undertaken to focus on the fusion of multi-omics data in recent years. A data model that appropriately integrates network topologies with biological data and preserves their intrinsic characteristics is still a bottleneck and an aspirational goal for protein function prediction. Results In this paper, we propose the RWRT (Random Walks with Restart on Tensor) method to accomplish protein function prediction by applying bi-random walks on the tensor. RWRT firstly constructs a functional similarity tensor by combining protein interaction networks with multi-omics data derived from domain annotation and protein complex information. After this, RWRT extends the bi-random walks algorithm from a two-dimensional matrix to the tensor for scoring functional similarity between proteins. Finally, RWRT filters out possible pretenders based on the concept of cohesiveness coefficient and annotates target proteins with functions of the remaining functional partners. Experimental results indicate that RWRT performs significantly better than the state-of-the-art methods and improves the area under the receiver-operating curve (AUROC) by no less than 18%. Conclusions The functional similarity tensor offers us an alternative, in that it is a collection of networks sharing the same nodes; however, the edges belong to different categories or represent interactions of different nature. We demonstrate that the tensor-based random walk model can not only discover more partners with similar functions but also free from the constraints of errors in protein interaction networks effectively. We believe that the performance of function prediction depends greatly on whether we can extract and exploit proper functional similarity information on protein correlations. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-022-04747-2.
Collapse
Affiliation(s)
- Sai Hu
- College of Computer Engineering and Applied Mathematics, Changsha University, Changsha, 410022, Hunan, China
| | - Zhihong Zhang
- College of Computer Engineering and Applied Mathematics, Changsha University, Changsha, 410022, Hunan, China.,Hunan Provincial Key Laboratory of Industrial Internet Technology and Security, Changsha University, Changsha, 410022, Hunan, China
| | - Huijun Xiong
- College of Computer Engineering and Applied Mathematics, Changsha University, Changsha, 410022, Hunan, China
| | - Meiping Jiang
- Department of Ultrasound, Hunan Provincial Maternal and Child Health Care Hospital, Changsha, 410008, Hunan, China.,NHC Key Laboratory of Birth Defect for Research and Prevention, Hunan Provincial Maternal and Child Health Care Hospital), Changsha, 410100, Hunan, China
| | - Yingchun Luo
- Department of Ultrasound, Hunan Provincial Maternal and Child Health Care Hospital, Changsha, 410008, Hunan, China.,NHC Key Laboratory of Birth Defect for Research and Prevention, Hunan Provincial Maternal and Child Health Care Hospital), Changsha, 410100, Hunan, China
| | - Wei Yan
- College of Computer Engineering and Applied Mathematics, Changsha University, Changsha, 410022, Hunan, China
| | - Bihai Zhao
- College of Computer Engineering and Applied Mathematics, Changsha University, Changsha, 410022, Hunan, China. .,Hunan Provincial Key Laboratory of Industrial Internet Technology and Security, Changsha University, Changsha, 410022, Hunan, China.
| |
Collapse
|
6
|
Gu S, Jiang M, Guzzi PH, Milenković T. Modeling multi-scale data via a network of networks. Bioinformatics 2022; 38:2544-2553. [PMID: 35238343 PMCID: PMC9048659 DOI: 10.1093/bioinformatics/btac133] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2021] [Revised: 02/01/2022] [Accepted: 02/28/2022] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION Prediction of node and graph labels are prominent network science tasks. Data analyzed in these tasks are sometimes related: entities represented by nodes in a higher-level (higher scale) network can themselves be modeled as networks at a lower level. We argue that systems involving such entities should be integrated with a 'network of networks' (NoNs) representation. Then, we ask whether entity label prediction using multi-level NoN data via our proposed approaches is more accurate than using each of single-level node and graph data alone, i.e. than traditional node label prediction on the higher-level network and graph label prediction on the lower-level networks. To obtain data, we develop the first synthetic NoN generator and construct a real biological NoN. We evaluate accuracy of considered approaches when predicting artificial labels from the synthetic NoNs and proteins' functions from the biological NoN. RESULTS For the synthetic NoNs, our NoN approaches outperform or are as good as node- and network-level ones depending on the NoN properties. For the biological NoN, our NoN approaches outperform the single-level approaches for just under half of the protein functions, and for 30% of the functions, only our NoN approaches make meaningful predictions, while node- and network-level ones achieve random accuracy. So, NoN-based data integration is important. AVAILABILITY AND IMPLEMENTATION The software and data are available at https://nd.edu/~cone/NoNs. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Shawn Gu
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN 46556, USA
| | - Meng Jiang
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN 46556, USA
| | - Pietro Hiram Guzzi
- Department of Surgical and Medical Sciences, University Magna Graecia of Catanzaro, Catanzaro 88100, Italy
| | - Tijana Milenković
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN 46556, USA
| |
Collapse
|
7
|
Halder AK, Bandyopadhyay SS, Chatterjee P, Nasipuri M, Plewczynski D, Basu S. JUPPI: A Multi-Level Feature Based Method for PPI Prediction and a Refined Strategy for Performance Assessment. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:531-542. [PMID: 32750875 DOI: 10.1109/tcbb.2020.3004970] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Over the years, several methods have been proposed for the computational PPI prediction with different performance evaluation strategies. While attempting to benchmark performance scores, most of these methods often suffer with ill-treated cross-validation strategies, adhoc selection of positive/negative samples etc. To address these issues, in our proposed multi-level feature based PPI prediction approach (JUPPI), using sequence, domain and GO information as features, a refined evaluation strategy has been introduced. During the evaluation process, we first extract high quality negative data using three-stage filtering, and then introduce a pair-input based cross validation strategy with three difficulty levels for test-set predictions. Our proposed evaluation strategy reduces the component-level overlapping issue in test sets. Performance of JUPPI is compared with those of the state-of-the-art approaches in this domain and tested on six independent PPI datasets. In almost all the datasets, JUPPI outperforms the state-of-the-art not only at human proteome level for PPI prediction, but also for prediction of interactors for intrinsic disordered human proteins. https://figshare.com/projects/JUPPI_A_Multi-level_Feature_Based_Method_for_PPI_Prediction_and_a_Refined_Strategy_for_Performance_Assessment/81656 JUPPI tool and the developed datasets (JUPPId) are available in public domain for academic use along with supplementary materials, which can be found on the Computer Society Digital Library at http://doi.ieeecomputersociety.org/10.1109/TCBB.2020.3004970.
Collapse
|
8
|
Wang Y, Zhang H, Zhong H, Xue Z. Protein domain identification methods and online resources. Comput Struct Biotechnol J 2021; 19:1145-1153. [PMID: 33680357 PMCID: PMC7895673 DOI: 10.1016/j.csbj.2021.01.041] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2020] [Revised: 01/25/2021] [Accepted: 01/26/2021] [Indexed: 01/03/2023] Open
Abstract
Protein domains are the basic units of proteins that can fold, function, and evolve independently. Knowledge of protein domains is critical for protein classification, understanding their biological functions, annotating their evolutionary mechanisms and protein design. Thus, over the past two decades, a number of protein domain identification approaches have been developed, and a variety of protein domain databases have also been constructed. This review divides protein domain prediction methods into two categories, namely sequence-based and structure-based. These methods are introduced in detail, and their advantages and limitations are compared. Furthermore, this review also provides a comprehensive overview of popular online protein domain sequence and structure databases. Finally, we discuss potential improvements of these prediction methods.
Collapse
Affiliation(s)
- Yan Wang
- Institute of Medical Artificial Intelligence, Binzhou Medical College, Yantai, Shandong 264003, China
- School of Life Science and Technology, Huazhong University of Science and Technology, Wuhan, Hubei 430074, China
| | - Hang Zhang
- School of Life Science and Technology, Huazhong University of Science and Technology, Wuhan, Hubei 430074, China
| | - Haolin Zhong
- School of Life Science and Technology, Huazhong University of Science and Technology, Wuhan, Hubei 430074, China
| | - Zhidong Xue
- School of Software Engineering, Huazhong University of Science and Technology, Wuhan, Hubei 430074, China
| |
Collapse
|
9
|
NPF:network propagation for protein function prediction. BMC Bioinformatics 2020; 21:355. [PMID: 32787776 PMCID: PMC7430911 DOI: 10.1186/s12859-020-03663-7] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/29/2020] [Accepted: 07/14/2020] [Indexed: 11/29/2022] Open
Abstract
Background The accurate annotation of protein functions is of great significance in elucidating the phenomena of life, treating disease and developing new medicines. Various methods have been developed to facilitate the prediction of these functions by combining protein interaction networks (PINs) with multi-omics data. However, it is still challenging to make full use of multiple biological to improve the performance of functions annotation. Results We presented NPF (Network Propagation for Functions prediction), an integrative protein function predicting framework assisted by network propagation and functional module detection, for discovering interacting partners with similar functions to target proteins. NPF leverages knowledge of the protein interaction network architecture and multi-omics data, such as domain annotation and protein complex information, to augment protein-protein functional similarity in a propagation manner. We have verified the great potential of NPF for accurately inferring protein functions. According to the comprehensive evaluation of NPF, it delivered a better performance than other competing methods in terms of leave-one-out cross-validation and ten-fold cross validation. Conclusions We demonstrated that network propagation, together with multi-omics data, can both discover more partners with similar function, and is unconstricted by the “small-world” feature of protein interaction networks. We conclude that the performance of function prediction depends greatly on whether we can extract and exploit proper functional information of similarity from protein correlations.
Collapse
|
10
|
Saha S, Prasad A, Chatterjee P, Basu S, Nasipuri M. Protein function prediction from dynamic protein interaction network using gene expression data. J Bioinform Comput Biol 2020; 17:1950025. [PMID: 31617461 DOI: 10.1142/s0219720019500252] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Computational prediction of functional annotation of proteins is an uphill task. There is an ever increasing gap between functional characterization of protein sequences and deluge of protein sequences generated by large-scale sequencing projects. The dynamic nature of protein interactions is frequently observed which is mostly influenced by any new change of state or change in stimuli. Functional characterization of proteins can be inferred from their interactions with each other, which is dynamic in nature. In this work, we have used a dynamic protein-protein interaction network (PPIN), time course gene expression data and protein sequence information for prediction of functional annotation of proteins. During progression of a particular function, it has also been observed that not all the proteins are active at all time points. For unannotated active proteins, our proposed methodology explores the dynamic PPIN consisting of level-1 and level-2 neighboring proteins at different time points, filtered by Damerau-Levenshtein edit distance to estimate the similarity between two protein sequences and coefficient variation methods to assess the strength of an edge in a network. Finally, from the filtered dynamic PPIN, at each time point, functional annotations of the level-2 proteins are assigned to the unknown and unannotated active proteins through the level-1 neighbor, following a bottom-up strategy. Our proposed methodology achieves an average precision, recall and F-Score of 0.59, 0.76 and 0.61 respectively, which is significantly higher than the reported state-of-the-art methods.
Collapse
Affiliation(s)
- Sovan Saha
- Department of Computer Science & Engineering, Dr. Sudhir Chandra Sur Degree Engineering College, 540, Dum Dum Road, Near Dum Dum Jn. Station, Surermath, Kolkata 700074, India
| | - Abhimanyu Prasad
- Department of Computer Science & Engineering, Dr. Sudhir Chandra Sur Degree Engineering College, 540, Dum Dum Road, Near Dum Dum Jn. Station, Surermath, Kolkata 700074, India
| | - Piyali Chatterjee
- Department of Computer Science & Engineering, Netaji Subhash Engineering College, Techno City, Panchpota, Garia, Kolkata 700152, India
| | - Subhadip Basu
- Department of Computer Science & Engineering, Jadavpur University, 188, Raja S.C. Mallick Road, Kolkata 700032, India
| | - Mita Nasipuri
- Department of Computer Science & Engineering, Netaji Subhash Engineering College, Techno City, Panchpota, Garia, Kolkata 700152, India
| |
Collapse
|
11
|
Hong J, Luo Y, Zhang Y, Ying J, Xue W, Xie T, Tao L, Zhu F. Protein functional annotation of simultaneously improved stability, accuracy and false discovery rate achieved by a sequence-based deep learning. Brief Bioinform 2019; 21:1437-1447. [PMID: 31504150 PMCID: PMC7412958 DOI: 10.1093/bib/bbz081] [Citation(s) in RCA: 90] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2019] [Revised: 05/27/2019] [Accepted: 06/10/2019] [Indexed: 11/12/2022] Open
Abstract
Functional annotation of protein sequence with high accuracy has become one of the most important issues in modern biomedical studies, and computational approaches of significantly accelerated analysis process and enhanced accuracy are greatly desired. Although a variety of methods have been developed to elevate protein annotation accuracy, their ability in controlling false annotation rates remains either limited or not systematically evaluated. In this study, a protein encoding strategy, together with a deep learning algorithm, was proposed to control the false discovery rate in protein function annotation, and its performances were systematically compared with that of the traditional similarity-based and de novo approaches. Based on a comprehensive assessment from multiple perspectives, the proposed strategy and algorithm were found to perform better in both prediction stability and annotation accuracy compared with other de novo methods. Moreover, an in-depth assessment revealed that it possessed an improved capacity of controlling the false discovery rate compared with traditional methods. All in all, this study not only provided a comprehensive analysis on the performances of the newly proposed strategy but also provided a tool for the researcher in the fields of protein function annotation.
Collapse
Affiliation(s)
- Jiajun Hong
- Key Laboratory of Elemene Class Anti-cancer Chinese Medicine of Zhejiang Province, School of Medicine, Hangzhou Normal University, Hangzhou, China.,College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, China
| | - Yongchao Luo
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, China
| | - Yang Zhang
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, China.,School of Pharmaceutical Sciences, Chongqing University, Chongqing, China
| | - Junbiao Ying
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, China
| | - Weiwei Xue
- School of Pharmaceutical Sciences, Chongqing University, Chongqing, China
| | - Tian Xie
- Key Laboratory of Elemene Class Anti-cancer Chinese Medicine of Zhejiang Province, School of Medicine, Hangzhou Normal University, Hangzhou, China
| | - Lin Tao
- Key Laboratory of Elemene Class Anti-cancer Chinese Medicine of Zhejiang Province, School of Medicine, Hangzhou Normal University, Hangzhou, China
| | - Feng Zhu
- Key Laboratory of Elemene Class Anti-cancer Chinese Medicine of Zhejiang Province, School of Medicine, Hangzhou Normal University, Hangzhou, China.,College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, China
| |
Collapse
|
12
|
Saha S, Chatterjee P, Basu S, Nasipuri M, Plewczynski D. FunPred 3.0: improved protein function prediction using protein interaction network. PeerJ 2019; 7:e6830. [PMID: 31198622 PMCID: PMC6535044 DOI: 10.7717/peerj.6830] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2018] [Accepted: 03/21/2019] [Indexed: 11/23/2022] Open
Abstract
Proteins are the most versatile macromolecules in living systems and perform crucial biological functions. In the advent of the post-genomic era, the next generation sequencing is done routinely at the population scale for a variety of species. The challenging problem is to massively determine the functions of proteins that are yet not characterized by detailed experimental studies. Identification of protein functions experimentally is a laborious and time-consuming task involving many resources. We therefore propose the automated protein function prediction methodology using in silico algorithms trained on carefully curated experimental datasets. We present the improved protein function prediction tool FunPred 3.0, an extended version of our previous methodology FunPred 2, which exploits neighborhood properties in protein–protein interaction network (PPIN) and physicochemical properties of amino acids. Our method is validated using the available functional annotations in the PPIN network of Saccharomyces cerevisiae in the latest Munich information center for protein (MIPS) dataset. The PPIN data of S. cerevisiae in MIPS dataset includes 4,554 unique proteins in 13,528 protein–protein interactions after the elimination of the self-replicating and the self-interacting protein pairs. Using the developed FunPred 3.0 tool, we are able to achieve the mean precision, the recall and the F-score values of 0.55, 0.82 and 0.66, respectively. FunPred 3.0 is then used to predict the functions of unpredicted protein pairs (incomplete and missing functional annotations) in MIPS dataset of S. cerevisiae. The method is also capable of predicting the subcellular localization of proteins along with its corresponding functions. The code and the complete prediction results are available freely at: https://github.com/SovanSaha/FunPred-3.0.git.
Collapse
Affiliation(s)
- Sovan Saha
- Department of Computer Science and Engineering, Dr. Sudhir Chandra Sur Degree Engineering College, Kolkata, West Bengal, India
| | - Piyali Chatterjee
- Department of Computer Science and Engineering, Netaji Subhash Engineering College, Kolkata, India
| | - Subhadip Basu
- Department of Computer Science and Engineering, Jadavpur University, Kolkata, West Bengal, India
| | - Mita Nasipuri
- Department of Computer Science and Engineering, Jadavpur University, Kolkata, West Bengal, India
| | - Dariusz Plewczynski
- Laboratory of Functional and Structural Genomics, Centre of New Technologies, University of Warsaw, Warsaw, Poland.,Faculty of Mathematics and Information Science, Warsaw University of Technology, Warsaw, Poland
| |
Collapse
|
13
|
Zhang F, Song H, Zeng M, Li Y, Kurgan L, Li M. DeepFunc: A Deep Learning Framework for Accurate Prediction of Protein Functions from Protein Sequences and Interactions. Proteomics 2019; 19:e1900019. [PMID: 30941889 DOI: 10.1002/pmic.201900019] [Citation(s) in RCA: 52] [Impact Index Per Article: 10.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2019] [Revised: 03/18/2019] [Indexed: 01/06/2023]
Abstract
Annotation of protein functions plays an important role in understanding life at the molecular level. High-throughput sequencing produces massive numbers of raw proteins sequences and only about 1% of them have been manually annotated with functions. Experimental annotations of functions are expensive, time-consuming and do not keep up with the rapid growth of the sequence numbers. This motivates the development of computational approaches that predict protein functions. A novel deep learning framework, DeepFunc, is proposed which accurately predicts protein functions from protein sequence- and network-derived information. More precisely, DeepFunc uses a long and sparse binary vector to encode information concerning domains, families, and motifs collected from the InterPro tool that is associated with the input protein sequence. This vector is processed with two neural layers to obtain a low-dimensional vector which is combined with topological information extracted from protein-protein interactions (PPIs) and functional linkages. The combined information is processed by a deep neural network that predicts protein functions. DeepFunc is empirically and comparatively tested on a benchmark testing dataset and the Critical Assessment of protein Function Annotation algorithms (CAFA) 3 dataset. The experimental results demonstrate that DeepFunc outperforms current methods on the testing dataset and that it secures the highest Fmax = 0.54 and AUC = 0.94 on the CAFA3 dataset.
Collapse
Affiliation(s)
- Fuhao Zhang
- School of Computer Science and Engineering, Central South University, Changsha, 410083, P. R. China
| | - Hong Song
- School of Computer Science and Engineering, Central South University, Changsha, 410083, P. R. China
| | - Min Zeng
- School of Computer Science and Engineering, Central South University, Changsha, 410083, P. R. China
| | - Yaohang Li
- School of Computer Science and Engineering, Central South University, Changsha, 410083, P. R. China.,Department of Computer Science, Old Dominion University, Norfolk, VA, 23529, USA
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, 23284, USA
| | - Min Li
- School of Computer Science and Engineering, Central South University, Changsha, 410083, P. R. China
| |
Collapse
|
14
|
Asgharzadeh MR, Pourseif MM, Barar J, Eskandani M, Jafari Niya M, Mashayekhi MR, Omidi Y. Functional expression and impact of testis-specific gene antigen 10 in breast cancer: a combined in vitro and in silico analysis. ACTA ACUST UNITED AC 2019; 9:145-159. [PMID: 31508330 PMCID: PMC6726749 DOI: 10.15171/bi.2019.19] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2018] [Revised: 02/20/2019] [Accepted: 03/02/2019] [Indexed: 12/15/2022]
Abstract
Introduction: Testis-specific gene antigen 10 (TSGA10) is a less-known gene, which is involved in the vague biological paths of different cancers. Here, we investigated the TSGA10 expression using different concentrations of glucose under hypoxia and also its interaction with the hypoxia-inducible factor 1 (HIF-1). Methods: The breast cancer MDA-MB-231 and MCF-7 cells were cultured with different concentrations of glucose (5.5, 11.0 and 25.0 mM) under normoxia/hypoxia for 24, 48, and 72 hours and examined for the HIF-1α expression and cell migration by Western blotting and scratch assays. The qPCR was employed to analyze the expression of TSGA10. Three-dimensional (3D) structure and the energy minimization of the interacting domain of TSGA10 were performed by MODELLER v9.17 and Swiss-PDB viewer v4.1.0/UCSF Chimera v1.11. The UCSF Chimera v1.13.1 and Hex 6.0 were used for the molecular docking simulation. The Cytoscape v3.7.1 and STRING v11.0 were used for protein-protein interaction (PPI) network analysis. The HIF-1a related hypoxia pathways were obtained from BioModels database and reconstructed in CellDesigner v4.4.2. Results: The increased expression of TSGA10 was found to be significantly associated with the reduced metastasis in the MDA-MB-231 cells, while an inverse relationship was seen between the TSGA10 mRNA level and cellular migration but not in the MCF-7 cells. The C-terminal domain of TSGA10 interacted with HIF-1α with high affinity, resulting in PPI network with 10 key nodes (HIF-1α, VEGFA, HSP90AA1, AKT1, ARNT, TP53, TSGA10, VHL, JUN, and EGFR). Conclusions: Collectively, TSGA10 functional expression alters under the hyper-/hypo-glycemia and hypoxia, which indicates its importance as a candidate bio-target for the cancer therapy.
Collapse
Affiliation(s)
- Mohammad Reza Asgharzadeh
- Research Center for Pharmaceutical Nanotechnology, Biomedicine Institute, Tabriz University of Medical Sciences, Tabriz, Iran.,Department of Biology, Fars Science and Research Branch, Islamic Azad University, Marvdasht, Iran.,Department of Biology, Marvdasht Branch, Islamic Azad University, Marvdasht, Iran
| | - Mohammad M Pourseif
- Research Center for Pharmaceutical Nanotechnology, Biomedicine Institute, Tabriz University of Medical Sciences, Tabriz, Iran
| | - Jaleh Barar
- Research Center for Pharmaceutical Nanotechnology, Biomedicine Institute, Tabriz University of Medical Sciences, Tabriz, Iran.,Department of Pharmaceutics, Faculty of Pharmacy, Tabriz University of Medical Sciences, Tabriz, Iran
| | - Morteza Eskandani
- Research Center for Pharmaceutical Nanotechnology, Biomedicine Institute, Tabriz University of Medical Sciences, Tabriz, Iran
| | - Mojtaba Jafari Niya
- Department of Biology, Fars Science and Research Branch, Islamic Azad University, Marvdasht, Iran.,Department of Biology, Marvdasht Branch, Islamic Azad University, Marvdasht, Iran
| | | | - Yadollah Omidi
- Research Center for Pharmaceutical Nanotechnology, Biomedicine Institute, Tabriz University of Medical Sciences, Tabriz, Iran.,Department of Pharmaceutics, Faculty of Pharmacy, Tabriz University of Medical Sciences, Tabriz, Iran
| |
Collapse
|
15
|
Abstract
This chapter is based on exploiting the network-based representations of proteins, metagraphs, in protein-protein interaction network to identify candidate disease-causing proteins. Protein-protein interaction (PPI) networks are effective tools in studying the functional roles of proteins in the development of various diseases. However, they are insufficient without the support of additional biological knowledge for proteins such as their molecular functions and biological processes. To enhance PPI networks, we utilize biological properties of individual proteins as well. More specifically, we integrate keywords from UniProt database describing protein properties into the PPI network and construct a novel heterogeneous PPI-Keyword (PPIK) network consisting of both proteins and keywords. As proteins with similar functional duties or involving in the same metabolic pathway tend to have similar topological characteristics, we propose to represent them with metagraphs. Compared to the traditional network motif or subgraph, a metagraph can capture the topological arrangements through not only the protein-protein interactions but also protein-keyword associations. We feed those novel metagraph representations into classifiers for disease protein prediction and conduct our experiments on three different PPI databases. They show that the proposed method consistently increases disease protein prediction performance across various classifiers, by 15.3% in AUC on average. It outperforms the diffusion-based (e.g., RWR) and the module-based baselines by 13.8-32.9% in overall disease protein prediction. Breast cancer protein prediction outperforms RWR, PRINCE, and the module-based baselines by 6.6-14.2%. Finally, our predictions also exhibit better correlations with literature findings from PubMed database.
Collapse
|
16
|
Castilla IA, Woods DF, Reen FJ, O'Gara F. Harnessing Marine Biocatalytic Reservoirs for Green Chemistry Applications through Metagenomic Technologies. Mar Drugs 2018; 16:E227. [PMID: 29973493 PMCID: PMC6071119 DOI: 10.3390/md16070227] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2018] [Revised: 06/13/2018] [Accepted: 06/22/2018] [Indexed: 01/24/2023] Open
Abstract
In a demanding commercial world, large-scale chemical processes have been widely utilised to satisfy consumer related needs. Chemical industries are key to promoting economic growth and meeting the requirements of a sustainable industrialised society. The market need for diverse commodities produced by the chemical industry is rapidly expanding globally. Accompanying this demand is an increased threat to the environment and to human health, due to waste produced by increased industrial production. This increased demand has underscored the necessity to increase reaction efficiencies, in order to reduce costs and increase profits. The discovery of novel biocatalysts is a key method aimed at combating these difficulties. Metagenomic technology, as a tool for uncovering novel biocatalysts, has great potential and applicability and has already delivered many successful achievements. In this review we discuss, recent developments and achievements in the field of biocatalysis. We highlight how green chemistry principles through the application of biocatalysis, can be successfully promoted and implemented in various industrial sectors. In addition, we demonstrate how two novel lipases/esterases were mined from the marine environment by metagenomic analysis. Collectively these improvements can result in increased efficiency, decreased energy consumption, reduced waste and cost savings for the chemical industry.
Collapse
Affiliation(s)
- Ignacio Abreu Castilla
- BIOMERIT Research Centre, School of Microbiology, University College Cork, T12 K8AF Cork, Ireland.
| | - David F Woods
- BIOMERIT Research Centre, School of Microbiology, University College Cork, T12 K8AF Cork, Ireland.
| | - F Jerry Reen
- School of Microbiology, University College Cork, T12 K8AF Cork, Ireland.
| | - Fergal O'Gara
- BIOMERIT Research Centre, School of Microbiology, University College Cork, T12 K8AF Cork, Ireland.
- Telethon Kids Institute, Perth, WA 6008, Australia.
- Human Microbiome Programme, School of Pharmacy and Biomedical Sciences, Curtin Health Innovation Research Institute, Curtin University, Perth, WA 6102, Australia.
| |
Collapse
|
17
|
Chu Y, Xiao S, Su H, Liao B, Zhang J, Xu J, Chen S. Genome-wide characterization and analysis of bHLH transcription factors in Panax ginseng. Acta Pharm Sin B 2018; 8:666-677. [PMID: 30109190 PMCID: PMC6089850 DOI: 10.1016/j.apsb.2018.04.004] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2018] [Revised: 02/24/2018] [Accepted: 03/14/2018] [Indexed: 11/24/2022] Open
Abstract
Ginseng (Panax ginseng C.A. Meyer) is one of the best-selling herbal medicines, with ginsenosides as its main pharmacologically active constituents. Although extensive chemical and pharmaceutical studies of these compounds have been performed, genome-wide studies of the basic helix-loop-helix (bHLH) transcription factors of ginseng are still limited. The bHLH transcription factor family is one of the largest transcription factor families found in eukaryotic organisms, and these proteins are involved in a myriad of regulatory processes. In our study, 169 bHLH transcription factor genes were identified in the genome of P. ginseng, and phylogenetic analysis indicated that these PGbHLHs could be classified into 24 subfamilies. A total of 21 RNA-seq data sets, including two sequencing libraries for jasmonate (JA)-responsive and 19 reported libraries for organ-specific expression analyses were constructed. Through a combination of gene-specific expression patterns and chemical contents, 6 PGbHLH genes from 4 subfamilies were revealed to be potentially involved in the regulation of ginsenoside biosynthesis. These 6 PGbHLHs, which had distinct target genes, were further divided into two groups depending on the absence of MYC-N structure. Our results would provide a foundation for understanding the molecular basis and regulatory mechanisms of bHLH transcription factor action in P. ginseng.
Collapse
Affiliation(s)
- Yang Chu
- Institute of Chinese Materia Medica, China Academy of Chinese Medical Sciences, Beijing 100700, China
| | - Shuiming Xiao
- Institute of Chinese Materia Medica, China Academy of Chinese Medical Sciences, Beijing 100700, China
| | - He Su
- Institute of Chinese Materia Medica, China Academy of Chinese Medical Sciences, Beijing 100700, China
- Guangdong Provincial Hospital of Chinese Medicine, Guangzhou 510006, China
| | - Baosheng Liao
- Institute of Chinese Materia Medica, China Academy of Chinese Medical Sciences, Beijing 100700, China
| | - Jingjing Zhang
- Institute of Chinese Materia Medica, China Academy of Chinese Medical Sciences, Beijing 100700, China
- College of Pharmacy, Hubei University of Chinese Medicine, Wuhan 430065, China
| | - Jiang Xu
- Institute of Chinese Materia Medica, China Academy of Chinese Medical Sciences, Beijing 100700, China
- Corresponding authors.
| | - Shilin Chen
- Institute of Chinese Materia Medica, China Academy of Chinese Medical Sciences, Beijing 100700, China
- Corresponding authors.
| |
Collapse
|
18
|
Disease gene classification with metagraph representations. Methods 2017; 131:83-92. [DOI: 10.1016/j.ymeth.2017.06.036] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2017] [Revised: 06/23/2017] [Accepted: 06/30/2017] [Indexed: 12/28/2022] Open
|
19
|
Peng W, Li M, Chen L, Wang L. Predicting Protein Functions by Using Unbalanced Random Walk Algorithm on Three Biological Networks. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2017; 14:360-369. [PMID: 28368814 DOI: 10.1109/tcbb.2015.2394314] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
With the gap between the sequence data and their functional annotations becomes increasing wider, many computational methods have been proposed to annotate functions for unknown proteins. However, designing effective methods to make good use of various biological resources is still a big challenge for researchers due to function diversity of proteins. In this work, we propose a new method named ThrRW, which takes several steps of random walking on three different biological networks: protein interaction network (PIN), domain co-occurrence network (DCN), and functional interrelationship network (FIN), respectively, so as to infer functional information from neighbors in the corresponding networks. With respect to the topological and structural differences of the three networks, the number of walking steps in the three networks will be different. In the course of working, the functional information will be transferred from one network to another according to the associations between the nodes in different networks. The results of experiment on S. cerevisiae data show that our method achieves better prediction performance not only than the methods that consider both PIN data and GO term similarities, but also than the methods using both PIN data and protein domain information, which verifies the effectiveness of our method on integrating multiple biological data sources.
Collapse
|
20
|
Folador EL, de Carvalho PVSD, Silva WM, Ferreira RS, Silva A, Gromiha M, Ghosh P, Barh D, Azevedo V, Röttger R. In silico identification of essential proteins in Corynebacterium pseudotuberculosis based on protein-protein interaction networks. BMC SYSTEMS BIOLOGY 2016; 10:103. [PMID: 27814699 PMCID: PMC5097352 DOI: 10.1186/s12918-016-0346-4] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/07/2016] [Accepted: 10/18/2016] [Indexed: 12/27/2022]
Abstract
Background Corynebacterium pseudotuberculosis (Cp) is a gram-positive bacterium that is classified into equi and ovis serovars. The serovar ovis is the etiological agent of caseous lymphadenitis, a chronic infection affecting sheep and goats, causing economic losses due to carcass condemnation and decreased production of meat, wool, and milk. Current diagnosis or treatment protocols are not fully effective and, thus, require further research of Cp pathogenesis. Results Here, we mapped known protein-protein interactions (PPI) from various species to nine Cp strains to reconstruct parts of the potential Cp interactome and to identify potentially essential proteins serving as putative drug targets. On average, we predict 16,669 interactions for each of the nine strains (with 15,495 interactions shared among all strains). An in silico sanity check suggests that the potential networks were not formed by spurious interactions but have a strong biological bias. With the inferred Cp networks we identify 181 essential proteins, among which 41 are non-host homologous. Conclusions The list of candidate interactions of the Cp strains lay the basis for developing novel hypotheses and designing according wet-lab studies. The non-host homologous essential proteins are attractive targets for therapeutic and diagnostic proposes. They allow for searching of small molecule inhibitors of binding interactions enabling modern drug discovery. Overall, the predicted Cp PPI networks form a valuable and versatile tool for researchers interested in Corynebacterium pseudotuberculosis. Electronic supplementary material The online version of this article (doi:10.1186/s12918-016-0346-4) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Edson Luiz Folador
- Department of General Biology, Instituto de Ciências Biológicas (ICB), Federal University of Minas Gerais (UFMG), Belo Horizonte, Brazil.,Institute of Biological Sciences, Federal University of Para, Belém, PA, Brazil.,Biotechnology Center (CBiotec), Federal University of Paraiba (UFPB), João Pessoa, Brazil
| | - Paulo Vinícius Sanches Daltro de Carvalho
- Department of General Biology, Instituto de Ciências Biológicas (ICB), Federal University of Minas Gerais (UFMG), Belo Horizonte, Brazil.,Department of Mathematics and Computer Science, University of Southern Denmark, Odense, Denmark
| | - Wanderson Marques Silva
- Department of General Biology, Instituto de Ciências Biológicas (ICB), Federal University of Minas Gerais (UFMG), Belo Horizonte, Brazil
| | - Rafaela Salgado Ferreira
- Department of Biochemistry and Immunology, Federal University of Minas Gerais (UFMG), Belo Horizonte, Brazil
| | - Artur Silva
- Institute of Biological Sciences, Federal University of Para, Belém, PA, Brazil
| | - Michael Gromiha
- Department of Biotechnology, Indian Institute of Technology (IIT) Madras, Tamilnadu, India
| | - Preetam Ghosh
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, USA
| | - Debmalya Barh
- Centre for Genomics and Applied Gene Technology, Institute of Integrative Omics and Applied Biotechnology (IIOAB), Nonakuri, Purba Medinipur, West Bengal, India
| | - Vasco Azevedo
- Department of General Biology, Instituto de Ciências Biológicas (ICB), Federal University of Minas Gerais (UFMG), Belo Horizonte, Brazil
| | - Richard Röttger
- Department of Mathematics and Computer Science, University of Southern Denmark, Odense, Denmark.
| |
Collapse
|
21
|
Zhao B, Hu S, Li X, Zhang F, Tian Q, Ni W. An efficient method for protein function annotation based on multilayer protein networks. Hum Genomics 2016; 10:33. [PMID: 27678214 PMCID: PMC5039885 DOI: 10.1186/s40246-016-0087-x] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2016] [Accepted: 09/14/2016] [Indexed: 12/31/2022] Open
Abstract
Background Accurate annotation of protein functions is still a big challenge for understanding life in the post-genomic era. Many computational methods based on protein-protein interaction (PPI) networks have been proposed to predict the function of proteins. However, the precision of these predictions still needs to be improved, due to the incompletion and noise in PPI networks. Integrating network topology and biological information could improve the accuracy of protein function prediction and may also lead to the discovery of multiple interaction types between proteins. Current algorithms generate a single network, which is archived using a weighted sum of all types of protein interactions. Method The influences of different types of interactions on the prediction of protein functions are not the same. To address this, we construct multilayer protein networks (MPN) by integrating PPI networks, the domain of proteins, and information on protein complexes. In the MPN, there is more than one type of connections between pairwise proteins. Different types of connections reflect different roles and importance in protein function prediction. Based on the MPN, we propose a new protein function prediction method, named function prediction based on multilayer protein networks (FP-MPN). Given an un-annotated protein, the FP-MPN method visits each layer of the MPN in turn and generates a set of candidate neighbors with known functions. A set of predicted functions for the testing protein is then formed and all of these functions are scored and sorted. Each layer plays different importance on the prediction of protein functions. A number of top-ranking functions are selected to annotate the unknown protein. Conclusions The method proposed in this paper was a better predictor when used on Saccharomyces cerevisiae protein data than other function prediction methods previously used. The proposed FP-MPN method takes different roles of connections in protein function prediction into account to reduce the artificial noise by introducing biological information. Electronic supplementary material The online version of this article (doi:10.1186/s40246-016-0087-x) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Bihai Zhao
- Department of Mathematics and Computing Science, Changsha University, Changsha, Hunan, 410022, China
| | - Sai Hu
- Department of Mathematics and Computing Science, Changsha University, Changsha, Hunan, 410022, China.
| | - Xueyong Li
- Department of Mathematics and Computing Science, Changsha University, Changsha, Hunan, 410022, China
| | - Fan Zhang
- Department of Mathematics and Computing Science, Changsha University, Changsha, Hunan, 410022, China
| | - Qinglong Tian
- Department of Mathematics and Computing Science, Changsha University, Changsha, Hunan, 410022, China
| | - Wenyin Ni
- Department of Mathematics and Computing Science, Changsha University, Changsha, Hunan, 410022, China.
| |
Collapse
|
22
|
Zhao B, Wang J, Li M, Li X, Li Y, Wu FX, Pan Y. A New Method for Predicting Protein Functions From Dynamic Weighted Interactome Networks. IEEE Trans Nanobioscience 2016; 15:131-9. [PMID: 26955047 DOI: 10.1109/tnb.2016.2536161] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
Automated annotation of protein function is challenging. As the number of sequenced genomes rapidly grows, the overwhelming majority of proteins can only be annotated computationally. Under new conditions or stimuli, not only the number and location of proteins would be changed, but also their interactions. This dynamic feature of protein interactions, however, was not considered in the existing function prediction algorithms. Taking the dynamic nature of protein interactions into consideration, we construct a dynamic weighted interactome network (DWIN) by integrating protein-protein interaction (PPI) network and time course gene expression data, as well as proteins' domain information and protein complex information. Then, we propose a new prediction approach that predicts protein functions from the constructed dynamic weighted interactome network. For an unknown protein, the proposed method visits dynamic networks at different time points and scores functions derived from all neighbors. Finally, the method selects top N functions from these ranked candidate functions to annotate the testing protein. Experiments on PPI datasets were conducted to evaluate the effectiveness of the proposed approach in predicting unknown protein functions. The evaluation results demonstrated that the proposed method outperforms other competing methods.
Collapse
|
23
|
Wang D, Hou J. Explore the hidden treasure in protein-protein interaction networks - an iterative model for predicting protein functions. J Bioinform Comput Biol 2015; 13:1550026. [PMID: 26449174 DOI: 10.1142/s0219720015500262] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Protein-protein interaction networks constructed by high throughput technologies provide opportunities for predicting protein functions. A lot of approaches and algorithms have been applied on PPI networks to predict functions of unannotated proteins over recent decades. However, most of existing algorithms and approaches do not consider unannotated proteins and their corresponding interactions in the prediction process. On the other hand, algorithms which make use of unannotated proteins have limited prediction performance. Moreover, current algorithms are usually one-off predictions. In this paper, we propose an iterative approach that utilizes unannotated proteins and their interactions in prediction. We conducted experiments to evaluate the performance and robustness of the proposed iterative approach. The iterative approach maximally improved the prediction performance by 50%-80% when there was a high proportion of unannotated neighborhood protein in the network. The iterative approach also showed robustness in various types of protein interaction network. Importantly, our iterative approach initially proposes an idea that iteratively incorporates the interaction information of unannotated proteins into the protein function prediction and can be applied on existing prediction algorithms to improve prediction performance.
Collapse
Affiliation(s)
- Derui Wang
- School of Information Technology, Deakin University, 221 Burwood Highway Burwood, Victoria 3125, Australia
| | - Jingyu Hou
- School of Information Technology, Deakin University, 221 Burwood Highway Burwood, Victoria 3125, Australia
| |
Collapse
|
24
|
Sekhwal MK, Li P, Lam I, Wang X, Cloutier S, You FM. Disease Resistance Gene Analogs (RGAs) in Plants. Int J Mol Sci 2015; 16:19248-90. [PMID: 26287177 PMCID: PMC4581296 DOI: 10.3390/ijms160819248] [Citation(s) in RCA: 144] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2015] [Revised: 08/01/2015] [Accepted: 08/06/2015] [Indexed: 12/12/2022] Open
Abstract
Plants have developed effective mechanisms to recognize and respond to infections caused by pathogens. Plant resistance gene analogs (RGAs), as resistance (R) gene candidates, have conserved domains and motifs that play specific roles in pathogens' resistance. Well-known RGAs are nucleotide binding site leucine rich repeats, receptor like kinases, and receptor like proteins. Others include pentatricopeptide repeats and apoplastic peroxidases. RGAs can be detected using bioinformatics tools based on their conserved structural features. Thousands of RGAs have been identified from sequenced plant genomes. High-density genome-wide RGA genetic maps are useful for designing diagnostic markers and identifying quantitative trait loci (QTL) or markers associated with plant disease resistance. This review focuses on recent advances in structures and mechanisms of RGAs, and their identification from sequenced genomes using bioinformatics tools. Applications in enhancing fine mapping and cloning of plant disease resistance genes are also discussed.
Collapse
Affiliation(s)
- Manoj Kumar Sekhwal
- Cereal Research Centre, Agriculture and Agri-Food Canada, Morden, MB R6M 1Y5, Canada.
| | - Pingchuan Li
- Cereal Research Centre, Agriculture and Agri-Food Canada, Morden, MB R6M 1Y5, Canada.
| | - Irene Lam
- Cereal Research Centre, Agriculture and Agri-Food Canada, Morden, MB R6M 1Y5, Canada.
| | - Xiue Wang
- National Key Laboratory of Crop Genetics and Germplasm Enhancement, Cytogenetics Institute, Nanjing Agricultural University, Nanjing 210095, China.
| | - Sylvie Cloutier
- Eastern Cereal and Oilseed Research Centre, Agriculture and Agri-Food Canada, Ottawa, ON K1A 0C6, Canada.
| | - Frank M You
- Cereal Research Centre, Agriculture and Agri-Food Canada, Morden, MB R6M 1Y5, Canada.
- Plant Science Department, University of Manitoba, Winnipeg, MB R3T 2N6, Canada.
| |
Collapse
|
25
|
Du ZP, Wu BL, Xie JJ, Lin XH, Qiu XY, Zhan XF, Wang SH, Shen JH, Li EM, Xu LY. Network Analyses of Gene Expression following Fascin Knockdown in Esophageal Squamous Cell Carcinoma Cells. Asian Pac J Cancer Prev 2015. [DOI: 10.7314/apjcp.2015.16.13.5445] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
|
26
|
Prioritization of orphan disease-causing genes using topological feature and GO similarity between proteins in interaction networks. SCIENCE CHINA-LIFE SCIENCES 2014; 57:1064-71. [PMID: 25326068 DOI: 10.1007/s11427-014-4747-6] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/23/2014] [Accepted: 07/15/2014] [Indexed: 12/22/2022]
Abstract
Identification of disease-causing genes among a large number of candidates is a fundamental challenge in human disease studies. However, it is still time-consuming and laborious to determine the real disease-causing genes by biological experiments. With the advances of the high-throughput techniques, a large number of protein-protein interactions have been produced. Therefore, to address this issue, several methods based on protein interaction network have been proposed. In this paper, we propose a shortest path-based algorithm, named SPranker, to prioritize disease-causing genes in protein interaction networks. Considering the fact that diseases with similar phenotypes are generally caused by functionally related genes, we further propose an improved algorithm SPGOranker by integrating the semantic similarity of GO annotations. SPGOranker not only considers the topological similarity between protein pairs in a protein interaction network but also takes their functional similarity into account. The proposed algorithms SPranker and SPGOranker were applied to 1598 known orphan disease-causing genes from 172 orphan diseases and compared with three state-of-the-art approaches, ICN, VS and RWR. The experimental results show that SPranker and SPGOranker outperform ICN, VS, and RWR for the prioritization of orphan disease-causing genes. Importantly, for the case study of severe combined immunodeficiency, SPranker and SPGOranker predict several novel causal genes.
Collapse
|
27
|
Folador EL, Hassan SS, Lemke N, Barh D, Silva A, Ferreira RS, Azevedo V. An improved interolog mapping-based computational prediction of protein–protein interactions with increased network coverage. Integr Biol (Camb) 2014; 6:1080-7. [DOI: 10.1039/c4ib00136b] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
Automated and efficient methods that map ortholog interactions from several organisms and public databases (pDB) are needed to identify new interactions in an organism of interest (interolog mapping).
Collapse
Affiliation(s)
- Edson Luiz Folador
- Department of General Biology
- Instituto de Ciências Biológicas (ICB)
- Federal University of Minas Gerais (UFMG)
- Belo Horizonte, Brazil
| | - Syed Shah Hassan
- Department of General Biology
- Instituto de Ciências Biológicas (ICB)
- Federal University of Minas Gerais (UFMG)
- Belo Horizonte, Brazil
| | - Ney Lemke
- Laboratory of Bioinformatic and Computational Biofisic
- Instituto de Biociência
- Universidade Estadual de São Paulo (UNESP)
- Botucatu, Brazil
| | - Debmalya Barh
- Centre for Genomics and Applied Gene Technology
- Institute of Integrative Omics and Applied Biotechnology (IIOAB)
- Purba Medinipur, India
| | - Artur Silva
- Instituto de Ciências Biológicas
- Universidade Federal do Para
- Belém, Brazil
| | - Rafaela Salgado Ferreira
- Department of Biochemistry and Immunology
- Federal University of Minas Gerais (UFMG)
- Belo Horizonte, Brazil
| | - Vasco Azevedo
- Department of General Biology
- Instituto de Ciências Biológicas (ICB)
- Federal University of Minas Gerais (UFMG)
- Belo Horizonte, Brazil
| |
Collapse
|