1
|
Lu P, Tian J. ACDMBI: A deep learning model based on community division and multi-source biological information fusion predicts essential proteins. Comput Biol Chem 2024; 112:108115. [PMID: 38865861 DOI: 10.1016/j.compbiolchem.2024.108115] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2024] [Revised: 05/15/2024] [Accepted: 05/28/2024] [Indexed: 06/14/2024]
Abstract
Accurately identifying essential proteins is vital for drug research and disease diagnosis. Traditional centrality methods and machine learning approaches often face challenges in accurately discerning essential proteins, primarily relying on information derived from protein-protein interaction (PPI) networks. Despite attempts by some researchers to integrate biological data and PPI networks for predicting essential proteins, designing effective integration methods remains a challenge. In response to these challenges, this paper presents the ACDMBI model, specifically designed to overcome the aforementioned issues. ACDMBI is comprised of two key modules: feature extraction and classification. In terms of capturing relevant information, we draw insights from three distinct data sources. Initially, structural features of proteins are extracted from the PPI network through community division. Subsequently, these features are further optimized using Graph Convolutional Networks (GCN) and Graph Attention Networks (GAT). Moving forward, protein features are extracted from gene expression data utilizing Bidirectional Long Short-Term Memory networks (BiLSTM) and a multi-head self-attention mechanism. Finally, protein features are derived by mapping subcellular localization data to a one-dimensional vector and processing it through fully connected layers. In the classification phase, we integrate features extracted from three different data sources, crafting a multi-layer deep neural network (DNN) for protein classification prediction. Experimental results on brewing yeast data showcase the ACDMBI model's superior performance, with AUC reaching 0.9533 and AUPR reaching 0.9153. Ablation experiments further reveal that the effective integration of features from diverse biological information significantly boosts the model's performance.
Collapse
Affiliation(s)
- Pengli Lu
- School of Computer and Communication, Lanzhou University of Technology, Lanzhou 730050, China.
| | - Jialong Tian
- School of Computer and Communication, Lanzhou University of Technology, Lanzhou 730050, China.
| |
Collapse
|
2
|
Pan L, Wang H, Yang B, Li W. A protein network refinement method based on module discovery and biological information. BMC Bioinformatics 2024; 25:157. [PMID: 38643108 PMCID: PMC11031909 DOI: 10.1186/s12859-024-05772-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2024] [Accepted: 04/10/2024] [Indexed: 04/22/2024] Open
Abstract
BACKGROUND The identification of essential proteins can help in understanding the minimum requirements for cell survival and development to discover drug targets and prevent disease. Nowadays, node ranking methods are a common way to identify essential proteins, but the poor data quality of the underlying PIN has somewhat hindered the identification accuracy of essential proteins for these methods in the PIN. Therefore, researchers constructed refinement networks by considering certain biological properties of interacting protein pairs to improve the performance of node ranking methods in the PIN. Studies show that proteins in a complex are more likely to be essential than proteins not present in the complex. However, the modularity is usually ignored for the refinement methods of the PINs. METHODS Based on this, we proposed a network refinement method based on module discovery and biological information. The idea is, first, to extract the maximal connected subgraph in the PIN, and to divide it into different modules by using Fast-unfolding algorithm; then, to detect critical modules according to the orthologous information, subcellular localization information and topology information within each module; finally, to construct a more refined network (CM-PIN) by using the identified critical modules. RESULTS To evaluate the effectiveness of the proposed method, we used 12 typical node ranking methods (LAC, DC, DMNC, NC, TP, LID, CC, BC, PR, LR, PeC, WDC) to compare the overall performance of the CM-PIN with those on the S-PIN, D-PIN and RD-PIN. The experimental results showed that the CM-PIN was optimal in terms of the identification number of essential proteins, precision-recall curve, Jackknifing method and other criteria, and can help to identify essential proteins more accurately.
Collapse
Affiliation(s)
- Li Pan
- Hunan Institute of Science and Technology, Yueyang, 414006, China
- Hunan Engineering Research Center of Multimodal Health Sensing and Intelligent Analysis, Yueyang, 414006, China
| | - Haoyue Wang
- Hunan Institute of Science and Technology, Yueyang, 414006, China.
| | - Bo Yang
- Hunan Institute of Science and Technology, Yueyang, 414006, China
- Hunan Engineering Research Center of Multimodal Health Sensing and Intelligent Analysis, Yueyang, 414006, China
| | - Wenbin Li
- Hunan Institute of Science and Technology, Yueyang, 414006, China.
| |
Collapse
|
3
|
Sun J, Pan L, Li B, Wang H, Yang B, Li W. A Construction Method of Dynamic Protein Interaction Networks by Using Relevant Features of Gene Expression Data. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:2790-2801. [PMID: 37030714 DOI: 10.1109/tcbb.2023.3264241] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Essential proteins play an important role in various life activities and are considered to be a vital part of the organism. Gene expression data are an important dataset to construct dynamic protein-protein interaction networks (DPIN). The existing methods for the construction of DPINs generally utilize all features (or the features in a cycle) of the gene expression data. However, the features observed from successive time points tend to be highly correlated, and thus there are some redundant and irrelevant features in the gene expression data, which will influence the quality of the constructed network and the predictive performance of essential proteins. To address this problem, we propose a construction method of DPINs by using selected relevant features rather than continuous and periodic features. We adopt an improved unsupervised feature selection method based on Laplacian algorithm to remove irrelevant and redundant features from gene expression data, then integrate the chosen relevant features into the static protein-protein interaction network (SPIN) to construct a more concise and effective DPIN (FS-DPIN). To evaluate the effectiveness of the FS-DPIN, we apply 15 network-based centrality methods on the FS-DPIN and compare the results with those on the SPIN and the existing DPINs. Then the predictive performance of the 15 centrality methods is validated in terms of sensitivity, specificity, positive predictive value, negative predictive value, F-measure, accuracy, Jackknife and AUPRC. The experimental results show that the FS-DPIN is superior to the existing DPINs in the identification accuracy of essential proteins.
Collapse
|
4
|
Yellapu NK, Pei D, Nissen E, Thompson JA, Koestler DC. Comprehensive exploration of JQ1 and GSK2801 targets in breast cancer using network pharmacology and molecular modeling approaches. Comput Struct Biotechnol J 2023; 21:3224-3233. [PMID: 38213901 PMCID: PMC10781883 DOI: 10.1016/j.csbj.2023.06.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2023] [Revised: 06/02/2023] [Accepted: 06/02/2023] [Indexed: 01/13/2024] Open
Abstract
JQ1 and GSK2801 are bromo domain inhibitors (BDI) known to exhibit enhanced anti-cancer activity when combined with other agents. However, the underlying molecular mechanisms behind such enhanced activity remain unclear. We used network-pharmacology approaches to understand the shared molecular mechanisms behind the enhanced activity of JQ1 and GSK2801 when used together to treat breast cancer (BC). The gene targets of JQ1 and GSK2801 were intersected with known BC-targets and their putative targets against BC were derived. The key genes were explored through gene-ontology-enrichment, Protein-Protein-Interaction (PPI) networking, survival analysis, and molecular modeling simulations. The genes, CTSB, MAPK14, MET, PSEN2 and STAT3, were found to be common targets for both drugs. In total, 49 biological processes, five molecular functions and 61 metabolic pathways were similarly enriched for JQ1 and GSK2801 BC targets among which several terms are related to cancer: IL-17, TNF and JAK-STAT signaling pathways. Survival analyses revealed that all five putative synergistic targets are significantly associated with survival in BC (log-rank p < 0.05). Molecular modeling studies showed stable binding of JQ1 and GSK2801 against their targets. In conclusion, this study explored and illuminated the possible molecular mechanisms behind the enhanced activity of JQ1 and GSK2801 against BC and suggests synergistic action through their similar BC-targets and gene-ontologies.
Collapse
Affiliation(s)
- Nanda Kumar Yellapu
- Department of Biostatistics & Data Science, University of Kansas, Medical Center, Kansas City, KS, USA
| | - Dong Pei
- Department of Biostatistics & Data Science, University of Kansas, Medical Center, Kansas City, KS, USA
| | - Emily Nissen
- Department of Biostatistics & Data Science, University of Kansas, Medical Center, Kansas City, KS, USA
| | - Jeffrey A. Thompson
- Department of Biostatistics & Data Science, University of Kansas, Medical Center, Kansas City, KS, USA
| | - Devin C. Koestler
- Department of Biostatistics & Data Science, University of Kansas, Medical Center, Kansas City, KS, USA
| |
Collapse
|
5
|
Liu P, Liu C, Mao Y, Guo J, Liu F, Cai W, Zhao F. Identification of essential proteins based on edge features and the fusion of multiple-source biological information. BMC Bioinformatics 2023; 24:203. [PMID: 37198530 DOI: 10.1186/s12859-023-05315-y] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2023] [Accepted: 04/30/2023] [Indexed: 05/19/2023] Open
Abstract
BACKGROUND A major current focus in the analysis of protein-protein interaction (PPI) data is how to identify essential proteins. As massive PPI data are available, this warrants the design of efficient computing methods for identifying essential proteins. Previous studies have achieved considerable performance. However, as a consequence of the features of high noise and structural complexity in PPIs, it is still a challenge to further upgrade the performance of the identification methods. METHODS This paper proposes an identification method, named CTF, which identifies essential proteins based on edge features including h-quasi-cliques and uv-triangle graphs and the fusion of multiple-source information. We first design an edge-weight function, named EWCT, for computing the topological scores of proteins based on quasi-cliques and triangle graphs. Then, we generate an edge-weighted PPI network using EWCT and dynamic PPI data. Finally, we compute the essentiality of proteins by the fusion of topological scores and three scores of biological information. RESULTS We evaluated the performance of the CTF method by comparison with 16 other methods, such as MON, PeC, TEGS, and LBCC, the experiment results on three datasets of Saccharomyces cerevisiae show that CTF outperforms the state-of-the-art methods. Moreover, our method indicates that the fusion of other biological information is beneficial to improve the accuracy of identification.
Collapse
Affiliation(s)
- Peiqiang Liu
- School of Computer Science and Technology, Shandong Technology and Business University, Yantai, China.
| | - Chang Liu
- School of Computer Science and Technology, Shandong Technology and Business University, Yantai, China
| | - Yanyan Mao
- School of Computer Science and Technology, Shandong Technology and Business University, Yantai, China
- College of Oceanography and Space Informatics, China University of Petroleum (East China), Qingdao, China
| | - Junhong Guo
- School of Computer Science and Technology, Shandong Technology and Business University, Yantai, China
| | - Fanshu Liu
- School of Computer Science and Technology, Shandong Technology and Business University, Yantai, China
| | - Wangmin Cai
- School of Computer Science and Technology, Shandong Technology and Business University, Yantai, China
| | - Feng Zhao
- School of Computer Science and Technology, Shandong Technology and Business University, Yantai, China
| |
Collapse
|
6
|
Chen S, Huang C, Wang L, Zhou S. A disease-related essential protein prediction model based on the transfer neural network. Front Genet 2023; 13:1087294. [PMID: 36685976 PMCID: PMC9845409 DOI: 10.3389/fgene.2022.1087294] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2022] [Accepted: 12/14/2022] [Indexed: 01/06/2023] Open
Abstract
Essential proteins play important roles in the development and survival of organisms whose mutations are proven to be the drivers of common internal diseases having higher prevalence rates. Due to high costs of traditional biological experiments, an improved Transfer Neural Network (TNN) was designed to extract raw features from multiple biological information of proteins first, and then, based on the newly-constructed Transfer Neural Network, a novel computational model called TNNM was designed to infer essential proteins in this paper. Different from traditional Markov chain, since Transfer Neural Network adopted the gradient descent algorithm to automatically obtain the transition probability matrix, the prediction accuracy of TNNM was greatly improved. Moreover, additional antecedent memory coefficient and bias term were introduced in Transfer Neural Network, which further enhanced both the robustness and the non-linear expression ability of TNNM as well. Finally, in order to evaluate the identification performance of TNNM, intensive experiments have been executed based on two well-known public databases separately, and experimental results show that TNNM can achieve better performance than representative state-of-the-art prediction models in terms of both predictive accuracies and decline rate of accuracies. Therefore, TNNM may play an important role in key protein prediction in the future.
Collapse
Affiliation(s)
- Sisi Chen
- The First Hospital of Hunan University of Chinese Medicine, Changsha, Hunan, China
| | - Chiguo Huang
- Big Data Innovation and Entrepreneurship Education Center of Hunan Province, Changsha University, Changsha, China,*Correspondence: Chiguo Huang, ; Lei Wang, ; Shunxian Zhou,
| | - Lei Wang
- The First Hospital of Hunan University of Chinese Medicine, Changsha, Hunan, China,Big Data Innovation and Entrepreneurship Education Center of Hunan Province, Changsha University, Changsha, China,*Correspondence: Chiguo Huang, ; Lei Wang, ; Shunxian Zhou,
| | - Shunxian Zhou
- The First Hospital of Hunan University of Chinese Medicine, Changsha, Hunan, China,Big Data Innovation and Entrepreneurship Education Center of Hunan Province, Changsha University, Changsha, China,College of Information Science and Engineering, Hunan Women’s University, Changsha, Hunan, China,*Correspondence: Chiguo Huang, ; Lei Wang, ; Shunxian Zhou,
| |
Collapse
|
7
|
Xue X, Zhang W, Fan A. Comparative analysis of gene ontology-based semantic similarity measurements for the application of identifying essential proteins. PLoS One 2023; 18:e0284274. [PMID: 37083829 PMCID: PMC10121005 DOI: 10.1371/journal.pone.0284274] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2022] [Accepted: 03/28/2023] [Indexed: 04/22/2023] Open
Abstract
Identifying key proteins from protein-protein interaction (PPI) networks is one of the most fundamental and important tasks for computational biologists. However, the protein interactions obtained by high-throughput technology are characterized by a high false positive rate, which severely hinders the prediction accuracy of the current computational methods. In this paper, we propose a novel strategy to identify key proteins by constructing reliable PPI networks. Five Gene Ontology (GO)-based semantic similarity measurements (Jiang, Lin, Rel, Resnik, and Wang) are used to calculate the confidence scores for protein pairs under three annotation terms (Molecular function (MF), Biological process (BP), and Cellular component (CC)). The protein pairs with low similarity values are assumed to be low-confidence links, and the refined PPI networks are constructed by filtering the low-confidence links. Six topology-based centrality methods (the BC, DC, EC, NC, SC, and aveNC) are applied to test the performance of the measurements under the original network and refined network. We systematically compare the performance of the five semantic similarity metrics with the three GO annotation terms on four benchmark datasets, and the simulation results show that the performance of these centrality methods under refined PPI networks is relatively better than that under the original networks. Resnik with a BP annotation term performs best among all five metrics with the three annotation terms. These findings suggest the importance of semantic similarity metrics in measuring the reliability of the links between proteins and highlight the Resnik metric with the BP annotation term as a favourable choice.
Collapse
Affiliation(s)
- Xiaoli Xue
- School of Science, East China Jiaotong University, Nanchang, China
| | - Wei Zhang
- School of Science, East China Jiaotong University, Nanchang, China
| | - Anjing Fan
- School of Computer and Information Engineering, Anyang Normal University, Anyang, China
| |
Collapse
|
8
|
Rule-Based Pruning and In Silico Identification of Essential Proteins in Yeast PPIN. Cells 2022; 11:cells11172648. [PMID: 36078056 PMCID: PMC9454873 DOI: 10.3390/cells11172648] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2022] [Revised: 08/18/2022] [Accepted: 08/22/2022] [Indexed: 11/25/2022] Open
Abstract
Proteins are vital for the significant cellular activities of living organisms. However, not all of them are essential. Identifying essential proteins through different biological experiments is relatively more laborious and time-consuming than the computational approaches used in recent times. However, practical implementation of conventional scientific methods sometimes becomes challenging due to poor performance impact in specific scenarios. Thus, more developed and efficient computational prediction models are required for essential protein identification. An effective methodology is proposed in this research, capable of predicting essential proteins in a refined yeast protein–protein interaction network (PPIN). The rule-based refinement is done using protein complex and local interaction density information derived from the neighborhood properties of proteins in the network. Identification and pruning of non-essential proteins are equally crucial here. In the initial phase, careful assessment is performed by applying node and edge weights to identify and discard the non-essential proteins from the interaction network. Three cut-off levels are considered for each node and edge weight for pruning the non-essential proteins. Once the PPIN has been filtered out, the second phase starts with two centralities-based approaches: (1) local interaction density (LID) and (2) local interaction density with protein complex (LIDC), which are successively implemented to identify the essential proteins in the yeast PPIN. Our proposed methodology achieves better performance in comparison to the existing state-of-the-art techniques.
Collapse
|
9
|
Zhu X, Zhu Y, Tan Y, Chen Z, Wang L. An Iterative Method for Predicting Essential Proteins Based on Multifeature Fusion and Linear Neighborhood Similarity. Front Aging Neurosci 2022; 13:799500. [PMID: 35140599 PMCID: PMC8819145 DOI: 10.3389/fnagi.2021.799500] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2021] [Accepted: 12/02/2021] [Indexed: 11/13/2022] Open
Abstract
Growing evidence have demonstrated that many biological processes are inseparable from the participation of key proteins. In this paper, a novel iterative method called linear neighborhood similarity-based protein multifeatures fusion (LNSPF) is proposed to identify potential key proteins based on multifeature fusion. In LNSPF, an original protein-protein interaction (PPI) network will be constructed first based on known protein-protein interaction data downloaded from benchmark databases, based on which, topological features will be further extracted. Next, gene expression data of proteins will be adopted to transfer the original PPI network to a weighted PPI network based on the linear neighborhood similarity. After that, subcellular localization and homologous information of proteins will be integrated to extract functional features for proteins, and based on both functional and topological features obtained above. And then, an iterative method will be designed and carried out to predict potential key proteins. At last, for evaluating the predictive performance of LNSPF, extensive experiments have been done, and compare results between LNPSF and 15 state-of-the-art competitive methods have demonstrated that LNSPF can achieve satisfactory recognition accuracy, which is markedly better than that achieved by each competing method.
Collapse
Affiliation(s)
- Xianyou Zhu
- College of Computer Science and Technology, Hengyang Normal University, Hengyang, China
| | - Yaocan Zhu
- College of Computer Engineering and Applied Mathematics, Changsha University, Changsha, China
| | - Yihong Tan
- College of Computer Engineering and Applied Mathematics, Changsha University, Changsha, China
| | - Zhiping Chen
- College of Computer Science and Technology, Hengyang Normal University, Hengyang, China
- College of Computer Engineering and Applied Mathematics, Changsha University, Changsha, China
| | - Lei Wang
- College of Computer Engineering and Applied Mathematics, Changsha University, Changsha, China
| |
Collapse
|
10
|
Liu Y, Ye X, Yu CY, Shao W, Hou J, Feng W, Zhang J, Huang K. TPSC: a module detection method based on topology potential and spectral clustering in weighted networks and its application in gene co-expression module discovery. BMC Bioinformatics 2021; 22:111. [PMID: 34689740 PMCID: PMC8543836 DOI: 10.1186/s12859-021-03964-5] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2020] [Accepted: 01/08/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Gene co-expression networks are widely studied in the biomedical field, with algorithms such as WGCNA and lmQCM having been developed to detect co-expressed modules. However, these algorithms have limitations such as insufficient granularity and unbalanced module size, which prevent full acquisition of knowledge from data mining. In addition, it is difficult to incorporate prior knowledge in current co-expression module detection algorithms. RESULTS In this paper, we propose a novel module detection algorithm based on topology potential and spectral clustering algorithm to detect co-expressed modules in gene co-expression networks. By testing on TCGA data, our novel method can provide more complete coverage of genes, more balanced module size and finer granularity than current methods in detecting modules with significant overall survival difference. In addition, the proposed algorithm can identify modules by incorporating prior knowledge. CONCLUSION In summary, we developed a method to obtain as much as possible information from networks with increased input coverage and the ability to detect more size-balanced and granular modules. In addition, our method can integrate data from different sources. Our proposed method performs better than current methods with complete coverage of input genes and finer granularity. Moreover, this method is designed not only for gene co-expression networks but can also be applied to any general fully connected weighted network.
Collapse
Affiliation(s)
- Yusong Liu
- Collage of Intelligent Systems Science and Engineering, Harbin Engineering University, Harbin, 150001, Heilongjiang, China.,Indiana University School of Medicine, Indianapolis, IN, 46202, USA
| | - Xiufen Ye
- Collage of Intelligent Systems Science and Engineering, Harbin Engineering University, Harbin, 150001, Heilongjiang, China.
| | - Christina Y Yu
- Indiana University School of Medicine, Indianapolis, IN, 46202, USA.,Department of Biomedical Informatics, The Ohio State University, Columbus, OH, 43210, USA
| | - Wei Shao
- Indiana University School of Medicine, Indianapolis, IN, 46202, USA
| | - Jie Hou
- Collage of Intelligent Systems Science and Engineering, Harbin Engineering University, Harbin, 150001, Heilongjiang, China
| | - Weixing Feng
- Collage of Intelligent Systems Science and Engineering, Harbin Engineering University, Harbin, 150001, Heilongjiang, China
| | - Jie Zhang
- Indiana University School of Medicine, Indianapolis, IN, 46202, USA
| | - Kun Huang
- Indiana University School of Medicine, Indianapolis, IN, 46202, USA. .,Regenstrief Institute, Indianapolis, IN, 46202, USA.
| |
Collapse
|
11
|
Li S, Zhang Z, Li X, Tan Y, Wang L, Chen Z. An iteration model for identifying essential proteins by combining comprehensive PPI network with biological information. BMC Bioinformatics 2021; 22:430. [PMID: 34496745 PMCID: PMC8425031 DOI: 10.1186/s12859-021-04300-7] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2020] [Accepted: 07/08/2021] [Indexed: 11/10/2022] Open
Abstract
Background Essential proteins have great impacts on cell survival and development, and played important roles in disease analysis and new drug design. However, since it is inefficient and costly to identify essential proteins by using biological experiments, then there is an urgent need for automated and accurate detection methods. In recent years, the recognition of essential proteins in protein interaction networks (PPI) has become a research hotspot, and many computational models for predicting essential proteins have been proposed successively. Results In order to achieve higher prediction performance, in this paper, a new prediction model called TGSO is proposed. In TGSO, a protein aggregation degree network is constructed first by adopting the node density measurement method for complex networks. And simultaneously, a protein co-expression interactive network is constructed by combining the gene expression information with the network connectivity, and a protein co-localization interaction network is constructed based on the subcellular localization data. And then, through integrating these three kinds of newly constructed networks, a comprehensive protein–protein interaction network will be obtained. Finally, based on the homology information, scores can be calculated out iteratively for different proteins, which can be utilized to estimate the importance of proteins effectively. Moreover, in order to evaluate the identification performance of TGSO, we have compared TGSO with 13 different latest competitive methods based on three kinds of yeast databases. And experimental results show that TGSO can achieve identification accuracies of 94%, 82% and 72% out of the top 1%, 5% and 10% candidate proteins respectively, which are to some degree superior to these state-of-the-art competitive models. Conclusions We constructed a comprehensive interactive network based on multi-source data to reduce the noise and errors in the initial PPI, and combined with iterative methods to improve the accuracy of necessary protein prediction, and means that TGSO may be conducive to the future development of essential protein recognition as well.
Collapse
Affiliation(s)
- Shiyuan Li
- College of Computer Engineering and Applied Mathematics, Changsha University, Changsha, 410022, China.,Hunan Province Key Laboratory of Industrial Internet Technology and Security, Changsha University, Changsha, 410022, China
| | - Zhen Zhang
- College of Electronic Information and Electrical Engineering, Changsha University, Changsha, 410022, China
| | - Xueyong Li
- College of Computer Engineering and Applied Mathematics, Changsha University, Changsha, 410022, China.,Hunan Province Key Laboratory of Industrial Internet Technology and Security, Changsha University, Changsha, 410022, China
| | - Yihong Tan
- College of Computer Engineering and Applied Mathematics, Changsha University, Changsha, 410022, China. .,Hunan Province Key Laboratory of Industrial Internet Technology and Security, Changsha University, Changsha, 410022, China.
| | - Lei Wang
- College of Computer Engineering and Applied Mathematics, Changsha University, Changsha, 410022, China.,Hunan Province Key Laboratory of Industrial Internet Technology and Security, Changsha University, Changsha, 410022, China
| | - Zhiping Chen
- College of Computer Engineering and Applied Mathematics, Changsha University, Changsha, 410022, China. .,Hunan Province Key Laboratory of Industrial Internet Technology and Security, Changsha University, Changsha, 410022, China.
| |
Collapse
|
12
|
Zhong J, Tang C, Peng W, Xie M, Sun Y, Tang Q, Xiao Q, Yang J. A novel essential protein identification method based on PPI networks and gene expression data. BMC Bioinformatics 2021; 22:248. [PMID: 33985429 PMCID: PMC8120700 DOI: 10.1186/s12859-021-04175-8] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2020] [Accepted: 05/06/2021] [Indexed: 02/08/2023] Open
Abstract
Background Some proposed methods for identifying essential proteins have better results by using biological information. Gene expression data is generally used to identify essential proteins. However, gene expression data is prone to fluctuations, which may affect the accuracy of essential protein identification. Therefore, we propose an essential protein identification method based on gene expression and the PPI network data to calculate the similarity of "active" and "inactive" state of gene expression in a cluster of the PPI network. Our experiments show that the method can improve the accuracy in predicting essential proteins. Results In this paper, we propose a new measure named JDC, which is based on the PPI network data and gene expression data. The JDC method offers a dynamic threshold method to binarize gene expression data. After that, it combines the degree centrality and Jaccard similarity index to calculate the JDC score for each protein in the PPI network. We benchmark the JDC method on four organisms respectively, and evaluate our method by using ROC analysis, modular analysis, jackknife analysis, overlapping analysis, top analysis, and accuracy analysis. The results show that the performance of JDC is better than DC, IC, EC, SC, BC, CC, NC, PeC, and WDC. We compare JDC with both NF-PIN and TS-PIN methods, which predict essential proteins through active PPI networks constructed from dynamic gene expression. Conclusions We demonstrate that the new centrality measure, JDC, is more efficient than state-of-the-art prediction methods with same input. The main ideas behind JDC are as follows: (1) Essential proteins are generally densely connected clusters in the PPI network. (2) Binarizing gene expression data can screen out fluctuations in gene expression profiles. (3) The essentiality of the protein depends on the similarity of "active" and "inactive" state of gene expression in a cluster of the PPI network.
Collapse
Affiliation(s)
- Jiancheng Zhong
- School of Information Science and Engineering, Hunan Normal University, Changsha, 410081, China.,Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Hunan Provincial Key Laboratory of Intelligent Computing and Language Information Processing, Changsha, 410083, China
| | - Chao Tang
- School of Information Science and Engineering, Hunan Normal University, Changsha, 410081, China
| | - Wei Peng
- College of Information Engineering and Automation, Kunming University of Science and Technology, Kunming, 650500, Yunnan, China
| | - Minzhu Xie
- School of Information Science and Engineering, Hunan Normal University, Changsha, 410081, China
| | - Yusui Sun
- School of Information Science and Engineering, Hunan Normal University, Changsha, 410081, China
| | - Qiang Tang
- College of Engineering and Design, Hunan Normal University, Changsha, 410081, China
| | - Qiu Xiao
- School of Information Science and Engineering, Hunan Normal University, Changsha, 410081, China.
| | - Jiahong Yang
- School of Information Science and Engineering, Hunan Normal University, Changsha, 410081, China.
| |
Collapse
|
13
|
Liu Y, Ye X, Zhan X, Yu CY, Zhang J, Huang K. TPQCI: A topology potential-based method to quantify functional influence of copy number variations. Methods 2021; 192:46-56. [PMID: 33894380 DOI: 10.1016/j.ymeth.2021.04.015] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2020] [Revised: 04/18/2021] [Accepted: 04/19/2021] [Indexed: 12/21/2022] Open
Abstract
Copy number variation (CNV) is a major type of chromosomal structural variation that play important roles in many diseases including cancers. Due to genome instability, a large number of CNV events can be detected in diseases such as cancer. Therefore, it is important to identify the functionally important CNVs in diseases, which currently still poses a challenge in genomics. One of the critical steps to solve the problem is to define the influence of CNV. In this paper, we provide a topology potential based method, TPQCI, to quantify this kind of influence by integrating statistics, gene regulatory associations, and biological function information. We used this metric to detect functionally enriched genes on genomic segments with CNV in breast cancer and multiple myeloma and discovered biological functions influenced by CNV. Our results demonstrate that, by using our proposed TPQCI metric, we can detect disease-specific genes that are influenced by CNVs. Source codes of TPQCI are provided in Github (https://github.com/usos/TPQCI).
Collapse
Affiliation(s)
- Yusong Liu
- Collage of Intelligent Systems Science and Engineering, Harbin Engineering University, Harbin, Heilongjiang 150001, China; Indiana University School of Medicine, Indianapolis, IN 46202, USA
| | - Xiufen Ye
- Collage of Intelligent Systems Science and Engineering, Harbin Engineering University, Harbin, Heilongjiang 150001, China
| | - Xiaohui Zhan
- Indiana University School of Medicine, Indianapolis, IN 46202, USA; National-Regional Key Technology Engineering Laboratory for Medical Ultrasound, Guangdong Key Laboratory for Biomedical Measurements and Ultrasound Imaging, School of Biomedical Engineering, Health Science Center, Shenzhen University, Shenzhen, Guangdong 518037, China; Department of Bioinformatics, School of Basic Medicine, Chongqing Medical University, Chongqing 400016, China
| | - Christina Y Yu
- Indiana University School of Medicine, Indianapolis, IN 46202, USA; Department of Biomedical Informatics, The Ohio State University, Columbus, OH 43210, USA
| | - Jie Zhang
- Indiana University School of Medicine, Indianapolis, IN 46202, USA
| | - Kun Huang
- Indiana University School of Medicine, Indianapolis, IN 46202, USA; Regenstrief Institute, Indianapolis, IN 46202, USA.
| |
Collapse
|
14
|
Chakrapani HB, Chourasia S, Gupta S, Kumar D T, Doss C GP, Haldar R. Effective utilisation of influence maximization technique for the identification of significant nodes in breast cancer gene networks. Comput Biol Med 2021; 133:104378. [PMID: 33971587 DOI: 10.1016/j.compbiomed.2021.104378] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2020] [Revised: 03/28/2021] [Accepted: 04/02/2021] [Indexed: 10/21/2022]
Abstract
BACKGROUND Identifying the most important genes in a cancer gene network is a crucial step in understanding the disease's functional characteristics and finding an effective drug. METHOD In this study, a popular influence maximization technique was applied on a large breast cancer gene network to identify the most influential genes computationally. The novel approach involved incorporating gene expression data and protein to protein interaction network to create a customized pruned and weighted gene network. This was then readily provided to the influence maximization procedure. The weighted gene network was also processed through a widely accepted framework that identified essential proteins to benchmark the proposed method. RESULTS The proposed method's results had matched with the majority of the output from the benchmarked framework. The key takeaway from the experiment was that the influential genes identified by the proposed method, which did not match favorably with the widely accepted framework, were found to be very important by previous in-vivo studies on breast cancer. INTERPRETATION & CONCLUSION The new findings generated from the proposed method give us a favorable reason to infer that influence maximization added a more diversified approach to define and identify important genes and could be incorporated with other popular computational techniques for more relevant results.
Collapse
Affiliation(s)
| | - Smruti Chourasia
- School of Computer Science and Engineering, Vellore Institute of Technology, Vellore, India
| | - Sibasish Gupta
- School of Computer Science and Engineering, Vellore Institute of Technology, Vellore, India
| | - Thirumal Kumar D
- Meenakshi Academy of Higher Education and Research, Chennai, India
| | - George Priya Doss C
- School of Bio Sciences and Technology, Vellore Institute of Technology, Vellore, India
| | - Rishin Haldar
- School of Computer Science and Engineering, Vellore Institute of Technology, Vellore, India.
| |
Collapse
|
15
|
Ahmed NM, Chen L, Li B, Liu W, Dai C. A random walk-based method for detecting essential proteins by integrating the topological and biological features of PPI network. Soft comput 2021. [DOI: 10.1007/s00500-021-05780-8] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
|
16
|
Meng Z, Kuang L, Chen Z, Zhang Z, Tan Y, Li X, Wang L. Method for Essential Protein Prediction Based on a Novel Weighted Protein-Domain Interaction Network. Front Genet 2021; 12:645932. [PMID: 33815480 PMCID: PMC8010314 DOI: 10.3389/fgene.2021.645932] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2020] [Accepted: 02/15/2021] [Indexed: 01/04/2023] Open
Abstract
In recent years a number of calculative models based on protein-protein interaction (PPI) networks have been proposed successively. However, due to false positives, false negatives, and the incompleteness of PPI networks, there are still many challenges affecting the design of computational models with satisfactory predictive accuracy when inferring key proteins. This study proposes a prediction model called WPDINM for detecting key proteins based on a novel weighted protein-domain interaction (PDI) network. In WPDINM, a weighted PPI network is constructed first by combining the gene expression data of proteins with topological information extracted from the original PPI network. Simultaneously, a weighted domain-domain interaction (DDI) network is constructed based on the original PDI network. Next, through integrating the newly obtained weighted PPI network and weighted DDI network with the original PDI network, a weighted PDI network is further constructed. Then, based on topological features and biological information, including the subcellular localization and orthologous information of proteins, a novel PageRank-based iterative algorithm is designed and implemented on the newly constructed weighted PDI network to estimate the criticality of proteins. Finally, to assess the prediction performance of WPDINM, we compared it with 12 kinds of competitive measures. Experimental results show that WPDINM can achieve a predictive accuracy rate of 90.19, 81.96, 70.72, 62.04, 55.83, and 51.13% in the top 1%, top 5%, top 10%, top 15%, top 20%, and top 25% separately, which exceeds the prediction accuracy achieved by traditional state-of-the-art competing measures. Owing to the satisfactory identification effect, the WPDINM measure may contribute to the further development of key protein identification.
Collapse
Affiliation(s)
- Zixuan Meng
- College of Computer, Xiangtan University, Xiangtan, China
| | - Linai Kuang
- College of Computer, Xiangtan University, Xiangtan, China
| | - Zhiping Chen
- College of Computer Engineering & Applied Mathematics, Changsha University, Changsha, China
| | - Zhen Zhang
- College of Computer Engineering & Applied Mathematics, Changsha University, Changsha, China
| | - Yihong Tan
- College of Computer Engineering & Applied Mathematics, Changsha University, Changsha, China
| | - Xueyong Li
- College of Computer Engineering & Applied Mathematics, Changsha University, Changsha, China
| | - Lei Wang
- College of Computer, Xiangtan University, Xiangtan, China
- College of Computer Engineering & Applied Mathematics, Changsha University, Changsha, China
| |
Collapse
|
17
|
Screening and identification of potential prognostic biomarkers in bladder urothelial carcinoma: Evidence from bioinformatics analysis. GENE REPORTS 2020. [DOI: 10.1016/j.genrep.2020.100658] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
|
18
|
Li G, Li M, Wang J, Li Y, Pan Y. United Neighborhood Closeness Centrality and Orthology for Predicting Essential Proteins. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:1451-1458. [PMID: 30596582 DOI: 10.1109/tcbb.2018.2889978] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/06/2023]
Abstract
Identifying essential proteins plays an important role in disease study, drug design, and understanding the minimal requirement for cellular life. Computational methods for essential proteins discovery overcome the disadvantages of biological experimental methods that are often time-consuming, expensive, and inefficient. The topological features of protein-protein interaction (PPI) networks are often used to design computational prediction methods, such as Degree Centrality (DC), Betweenness Centrality (BC), Closeness Centrality (CC), Subgraph Centrality (SC), Eigenvector Centrality (EC), Information Centrality (IC), and Neighborhood Centrality (NC). However, the prediction accuracies of these individual methods still have space to be improved. Studies show that additional information, such as orthologous relations, helps discover essential proteins. Many researchers have proposed different methods by combining multiple information sources to gain improvement of prediction accuracy. In this study, we find that essential proteins appear in triangular structure in PPI network significantly more often than nonessential ones. Based on this phenomenon, we propose a novel pure centrality measure, so-called Neighborhood Closeness Centrality (NCC). Accordingly, we develop a new combination model, Extended Pareto Optimality Consensus model, named EPOC, to fuse NCC and Orthology information and a novel essential proteins identification method, NCCO, is fully proposed. Compared with seven existing classic centrality methods (DC, BC, IC, CC, SC, EC, and NC) and three consensus methods (PeC, ION, and CSC), our results on S.cerevisiae and E.coli datasets show that NCCO has clear advantages. As a consensus method, EPOC also yields better performance than the random walk model.
Collapse
|
19
|
Li G, Li M, Peng W, Li Y, Pan Y, Wang J. A novel extended Pareto Optimality Consensus model for predicting essential proteins. J Theor Biol 2019; 480:141-149. [PMID: 31398315 DOI: 10.1016/j.jtbi.2019.08.005] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2019] [Revised: 08/02/2019] [Accepted: 08/06/2019] [Indexed: 12/11/2022]
Abstract
Essential proteins have vital functions, when they are destroyed in cells, the cells will die or stop reproducing. Therefore, it is very important to identify essential proteins from a large number of other proteins. Due to the time-consuming, expensive, and inefficient process in biological experimental methods, computational methods become more and more popular to recognize them. In the early stages, these methods mainly rely on protein-protein interaction (PPI) information, which limits their discovery capacities. Researchers find novel methods by fusing multi-information to improve prediction accuracy. According to these features, essential proteins are more important and conservative in the evolution process, their neighbors in PPI networks are usually likely to be essential, there are many false positives in PPI data, whether a protein is essential can be assessed by the importance of a protein itself, the relevance of neighbors and the reliability of PPIs. The importance of neighbors and the reliability of PPIs can be further integrated into neighborhood feature. In the study, orthologous information, edge-clustering coefficient and gene expression information are used to measure the importance of a protein itself, the importance of the neighbors and the reliability of PPIs, respectively. Then, a novel expanded POC model, E_POC, is proposed to fuse the above information to discover essential proteins, a weighted PPI network is constructed. The proteins ranked high according to their weights are treated as candidate essential proteins. This novel method is named as E_POC. E_POC outperforms the existing classical methods on S. cerevisiae and E. coli data.
Collapse
Affiliation(s)
- Gaoshi Li
- School of Computer Science and engineering, Central South University, Changsha 410083, China; Guangxi Key Lab of Multi-source Information Mining & Security, Guangxi Normal University, Guilin, Guangxi 541004, China.
| | - Min Li
- School of Computer Science and engineering, Central South University, Changsha 410083, China.
| | - Wei Peng
- Computer Center/ Faculty of Information Engineering and Automation of Kunming University of Science and Technology, Kunming, Yunnan 650093, China
| | - Yaohang Li
- Department of Computer Science, Old Dominion University, Norfolk, VA 23529, USA.
| | - Yi Pan
- Department of Computer Science, Georgia State University, Atlanta, GA 30302-4110, USA.
| | - Jianxin Wang
- School of Computer Science and engineering, Central South University, Changsha 410083, China.
| |
Collapse
|
20
|
Liu J, Li Y, Zhang Y, Huo M, Sun X, Xu Z, Tan N, Du K, Wang Y, Zhang J, Wang W. A Network Pharmacology Approach to Explore the Mechanisms of Qishen Granules in Heart Failure. Med Sci Monit 2019; 25:7735-7745. [PMID: 31613871 PMCID: PMC6813758 DOI: 10.12659/msm.919768] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023] Open
Abstract
This study aimed to investigate the intrinsic mechanisms of Qishen granules (QSG) in the treatment of HF, and to provide new evidence and insights for its clinical application. Information on QSG ingredients was collected from Traditional Chinese medicine systems pharmacology (TCMSP), TCM@Taiwan, TCMID, and Batman, and input into SwissTargetPrediction to identify the compound targets. HF-related targets were detected from Therapeutic Target Database (TTD), Disgenet-Gene, Drugbank database, and Online Mendelian Inheritance in Man (OMIM) database. The overlap targets of QSG and HF were identified for pathway enrichment analysis by utilizing the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis. The protein-protein interaction (PPI) network of QSG-HF was constructed, following by the generation of core targets, construction of core modules, and KEGG analysis of the core functional modules. There were 1909 potential targets predicted from the 243 bioactive compounds in QSG which shared 129 common targets with HF-related targets. KEGG pathway analysis of common targets indicated that QSG could regulated 23 representative pathways. In the QSG-HF PPI network analysis, 10 key targets were identified, including EDN1, AGT, CREB1, ACE, CXCR4, ADRBK1, AGTR1, BDKRB1, ADRB2, and F2. Further cluster and enrichment analysis suggested that neuroactive ligand-receptor interaction, cGMP-PKG signaling pathway, renin secretion, vascular smooth muscle contraction, and the renin-angiotensin system might be core pathways of QSG for HF. Our study elucidated the possible mechanisms of QSG from a systemic and holistic perspective. The key targets and pathways will provide new insights for further research on the pharmacological mechanism of QSG.
Collapse
Affiliation(s)
- Junjie Liu
- Dongzhimen Hospital, Beijing University of Chinese Medicine, Beijing, China (mainland)
| | - Yuan Li
- Dongzhimen Hospital, Beijing University of Chinese Medicine, Beijing, China (mainland)
| | - Yili Zhang
- School of Traditional Chinese Medicine, Beijing University of Chinese Medicine, Beijing, China (mainland)
| | - Mengqi Huo
- School of Chinese Materia Medica, Beijing University of Chinese Medicine, Beijing, China (mainland)
| | - Xiaoli Sun
- School of Traditional Chinese Medicine, Beijing University of Chinese Medicine, Beijing, China (mainland)
| | - Zixuan Xu
- Respiratory Department, Nanjing Pukou Hospital of Traditional Chinese Medicine, Nanjing, Jiangsu, China (mainland)
| | - Nannan Tan
- School of Traditional Chinese Medicine, Beijing University of Chinese Medicine, Beijing, China (mainland)
| | - Kangjia Du
- School of Traditional Chinese Medicine, Beijing University of Chinese Medicine, Beijing, China (mainland)
| | - Yong Wang
- School of Life Science, Beijing University of Chinese Medicine, Beijing, China (mainland)
| | - Jian Zhang
- School of Life Science, Beijing University of Chinese Medicine, Beijing, China (mainland)
| | - Wei Wang
- School of Traditional Chinese Medicine, Beijing University of Chinese Medicine, Beijing, China (mainland)
| |
Collapse
|
21
|
Sabetian S, Shamsir MS. Computer aided analysis of disease linked protein networks. Bioinformation 2019; 15:513-522. [PMID: 31485137 PMCID: PMC6704336 DOI: 10.6026/97320630015513] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2019] [Revised: 04/16/2019] [Accepted: 04/17/2019] [Indexed: 12/26/2022] Open
Abstract
Proteins can interact in various ways, ranging from direct physical relationships to indirect interactions in a formation of protein-protein interaction network. Diagnosis of the protein connections is critical to identify various cellular pathways. Today constructing and analyzing the protein interaction network is being developed as a powerful approach to create network pharmacology toward detecting unknown genes and proteins associated with diseases. Discovery drug targets regarding therapeutic decisions are exciting outcomes of studying disease networks. Protein connections may be identified by experimental and recent new computational approaches. Due to difficulties in analyzing in-vivo proteins interactions, many researchers have encouraged improving computational methods to design protein interaction network. In this review, the experimental and computational approaches and also advantages and disadvantages of these methods regarding the identification of new interactions in a molecular mechanism have been reviewed. Systematic analysis of complex biological systems including network pharmacology and disease network has also been discussed in this review.
Collapse
Affiliation(s)
- Soudabeh Sabetian
- Department of Biological and Health Sciences, Faculty of Bioscience and Medical Engineering, Universiti Teknologi Malaysia, 81310 Johor, Malaysia
- Infertility Research Center, Shiraz University, Shiraz 71454, Iran, Shiraz University of Medical Sciences, Shiraz, Iran
| | - Mohd Shahir Shamsir
- Department of Biological and Health Sciences, Faculty of Bioscience and Medical Engineering, Universiti Teknologi Malaysia, 81310 Johor, Malaysia
| |
Collapse
|
22
|
Lin Y, Zhang FZ, Xue K, Gao YZ, Guo FB. Identifying Bacterial Essential Genes Based on a Feature-Integrated Method. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2019; 16:1274-1279. [PMID: 28212095 DOI: 10.1109/tcbb.2017.2669968] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Essential genes are those genes of an organism that are considered to be crucial for its survival. Identification of essential genes is therefore of great significance to advance our understanding of the principles of cellular life. We have developed a novel computational method, which can effectively predict bacterial essential genes by extracting and integrating homologous features, protein domain feature, gene intrinsic features, and network topological features. By performing the principal component regression (PCR) analysis for Escherichia coli MG1655, we established a classification model with the average area under curve (AUC) value of 0.992 in ten times 5-fold cross-validation tests. Furthermore, when employing this new model to a distantly related organism-Streptococcus pneumoniae TIGR4, we still got a reliable AUC value of 0.788. These results indicate that our feature-integrated approach could have practical applications in accurately investigating essential genes from broad bacterial species, and also provide helpful guidelines for the minimal cell.
Collapse
|
23
|
Li M, Ni P, Chen X, Wang J, Wu FX, Pan Y. Construction of Refined Protein Interaction Network for Predicting Essential Proteins. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2019; 16:1386-1397. [PMID: 28186903 DOI: 10.1109/tcbb.2017.2665482] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Identification of essential proteins based on protein interaction network (PIN) is a very important and hot topic in the post genome era. Up to now, a number of network-based essential protein discovery methods have been proposed. Generally, a static protein interaction network was constructed by using the protein-protein interactions obtained from different experiments or databases. Unfortunately, most of the network-based essential protein discovery methods are sensitive to the reliability of the constructed PIN. In this paper, we propose a new method for constructing refined PIN by using gene expression profiles and subcellular location information. The basic idea behind refining the PIN is that two proteins should have higher possibility to physically interact with each other if they appear together at the same subcellular location and are active together at least at a time point in the cell cycle. The original static PIN is denoted by S-PIN while the final PIN refined by our method is denoted by TS-PIN. To evaluate whether the constructed TS-PIN is more suitable to be used in the identification of essential proteins, 10 network-based essential protein discovery methods (DC, EC, SC, BC, CC, IC, LAC, NC, BN, and DMNC) are applied on it to identify essential proteins. A comparison of TS-PIN and two other networks: S-PIN and NF-APIN (a noise-filtered active PIN constructed by using gene expression data and S-PIN) is implemented on the prediction of essential proteins by using these ten network-based methods. The comparison results show that all of the 10 network-based methods achieve better results when being applied on TS-PIN than that being applied on S-PIN and NF-APIN.
Collapse
|
24
|
Alur VC, Raju V, Vastrad B, Vastrad C. Mining Featured Biomarkers Linked with Epithelial Ovarian CancerBased on Bioinformatics. Diagnostics (Basel) 2019; 9:diagnostics9020039. [PMID: 30970615 PMCID: PMC6628368 DOI: 10.3390/diagnostics9020039] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2019] [Revised: 03/31/2019] [Accepted: 04/05/2019] [Indexed: 11/16/2022] Open
Abstract
Epithelial ovarian cancer (EOC) is the18th most common cancer worldwide and the 8th most common in women. The aim of this study was to diagnose the potential importance of, as well as novel genes linked with, EOC and to provide valid biological information for further research. The gene expression profiles of E-MTAB-3706 which contained four high-grade ovarian epithelial cancer samples, four normal fallopian tube samples and four normal ovarian epithelium samples were downloaded from the ArrayExpress database. Pathway enrichment and Gene Ontology (GO) enrichment analysis of differentially expressed genes (DEGs) were performed, and protein-protein interaction (PPI) network, microRNA-target gene regulatory network and TFs (transcription factors) -target gene regulatory network for up- and down-regulated were analyzed using Cytoscape. In total, 552 DEGs were found, including 276 up-regulated and 276 down-regulated DEGs. Pathway enrichment analysis demonstrated that most DEGs were significantly enriched in chemical carcinogenesis, urea cycle, cell adhesion molecules and creatine biosynthesis. GO enrichment analysis showed that most DEGs were significantly enriched in translation, nucleosome, extracellular matrix organization and extracellular matrix. From protein-protein interaction network (PPI) analysis, modules, microRNA-target gene regulatory network and TFs-target gene regulatory network for up- and down-regulated, and the top hub genes such as E2F4, SRPK2, A2M, CDH1, MAP1LC3A, UCHL1, HLA-C (major histocompatibility complex, class I, C), VAT1, ECM1 and SNRPN (small nuclear ribonucleoprotein polypeptide N) were associated in pathogenesis of EOC. The high expression levels of the hub genes such as CEBPD (CCAAT enhancer binding protein delta) and MID2 in stages 3 and 4 were validated in the TCGA (The Cancer Genome Atlas) database. CEBPD andMID2 were associated with the worst overall survival rates in EOC. In conclusion, the current study diagnosed DEGs between normal and EOC samples, which could improve our understanding of the molecular mechanisms in the progression of EOC. These new key biomarkers might be used as therapeutic targets for EOC.
Collapse
Affiliation(s)
- Varun Chandra Alur
- Department of Endocrinology, J.J. M Medical College, Davanagere, Karnataka 577004, India.
| | - Varshita Raju
- Department of Obstetrics and Gynecology, J.J. M Medical College, Davanagere, Karnataka 577004, India.
| | - Basavaraj Vastrad
- Department of Pharmaceutics, SET`S College of Pharmacy, Dharwad, Karnataka 580002, India.
| | - Chanabasayya Vastrad
- Biostatistics and Bioinformatics,Chanabasava Nilaya, Bharthinagar,Dharwad, Karanataka 580001, India.
| |
Collapse
|
25
|
Abstract
This chapter is based on exploiting the network-based representations of proteins, metagraphs, in protein-protein interaction network to identify candidate disease-causing proteins. Protein-protein interaction (PPI) networks are effective tools in studying the functional roles of proteins in the development of various diseases. However, they are insufficient without the support of additional biological knowledge for proteins such as their molecular functions and biological processes. To enhance PPI networks, we utilize biological properties of individual proteins as well. More specifically, we integrate keywords from UniProt database describing protein properties into the PPI network and construct a novel heterogeneous PPI-Keyword (PPIK) network consisting of both proteins and keywords. As proteins with similar functional duties or involving in the same metabolic pathway tend to have similar topological characteristics, we propose to represent them with metagraphs. Compared to the traditional network motif or subgraph, a metagraph can capture the topological arrangements through not only the protein-protein interactions but also protein-keyword associations. We feed those novel metagraph representations into classifiers for disease protein prediction and conduct our experiments on three different PPI databases. They show that the proposed method consistently increases disease protein prediction performance across various classifiers, by 15.3% in AUC on average. It outperforms the diffusion-based (e.g., RWR) and the module-based baselines by 13.8-32.9% in overall disease protein prediction. Breast cancer protein prediction outperforms RWR, PRINCE, and the module-based baselines by 6.6-14.2%. Finally, our predictions also exhibit better correlations with literature findings from PubMed database.
Collapse
|
26
|
Li X, Li W, Zeng M, Zheng R, Li M. Network-based methods for predicting essential genes or proteins: a survey. Brief Bioinform 2019; 21:566-583. [DOI: 10.1093/bib/bbz017] [Citation(s) in RCA: 55] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2018] [Revised: 01/21/2019] [Accepted: 01/22/2019] [Indexed: 12/14/2022] Open
Abstract
Abstract
Genes that are thought to be critical for the survival of organisms or cells are called essential genes. The prediction of essential genes and their products (essential proteins) is of great value in exploring the mechanism of complex diseases, the study of the minimal required genome for living cells and the development of new drug targets. As laboratory methods are often complicated, costly and time-consuming, a great many of computational methods have been proposed to identify essential genes/proteins from the perspective of the network level with the in-depth understanding of network biology and the rapid development of biotechnologies. Through analyzing the topological characteristics of essential genes/proteins in protein–protein interaction networks (PINs), integrating biological information and considering the dynamic features of PINs, network-based methods have been proved to be effective in the identification of essential genes/proteins. In this paper, we survey the advanced methods for network-based prediction of essential genes/proteins and present the challenges and directions for future research.
Collapse
Affiliation(s)
- Xingyi Li
- School of Computer Science and Engineering, Central South University, Changsha, Hunan, China
| | - Wenkai Li
- School of Computer Science and Engineering, Central South University, Changsha, Hunan, China
| | - Min Zeng
- School of Computer Science and Engineering, Central South University, Changsha, Hunan, China
| | - Ruiqing Zheng
- School of Computer Science and Engineering, Central South University, Changsha, Hunan, China
| | - Min Li
- School of Computer Science and Engineering, Central South University, Changsha, Hunan, China
| |
Collapse
|
27
|
Liu X, Hong Z, Liu J, Lin Y, Rodríguez-Patón A, Zou Q, Zeng X. Computational methods for identifying the critical nodes in biological networks. Brief Bioinform 2019; 21:486-497. [DOI: 10.1093/bib/bbz011] [Citation(s) in RCA: 46] [Impact Index Per Article: 9.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2018] [Revised: 12/03/2018] [Accepted: 01/11/2019] [Indexed: 12/28/2022] Open
Abstract
Abstract
A biological network is complex. A group of critical nodes determines the quality and state of such a network. Increasing studies have shown that diseases and biological networks are closely and mutually related and that certain diseases are often caused by errors occurring in certain nodes in biological networks. Thus, studying biological networks and identifying critical nodes can help determine the key targets in treating diseases. The problem is how to find the critical nodes in a network efficiently and with low cost. Existing experimental methods in identifying critical nodes generally require much time, manpower and money. Accordingly, many scientists are attempting to solve this problem by researching efficient and low-cost computing methods. To facilitate calculations, biological networks are often modeled as several common networks. In this review, we classify biological networks according to the network types used by several kinds of common computational methods and introduce the computational methods used by each type of network.
Collapse
Affiliation(s)
- Xiangrong Liu
- Department of Computer Science, Xiamen University, China
| | - Zengyan Hong
- Department of Computer Science, Xiamen University, China
| | - Juan Liu
- Department of Computer Science, Xiamen University, China
| | - Yuan Lin
- ITOP Section, DNB Bank ASA, Solheimsgaten, Bergen, Norway
| | - Alfonso Rodríguez-Patón
- Universidad Politécnica de Madrid (UPM) Campus Montegancedo s/n, Boadilla del Monte, Madrid, Spain
| | - Quan Zou
- Department of Computer Science, Xiamen University, China
- Insitute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, China
- School of Computer Science and Technology, Tianjin University, Tianjin, China
| | | |
Collapse
|
28
|
Elahi A, Babamir SM. Identification of essential proteins based on a new combination of topological and biological features in weighted protein-protein interaction networks. IET Syst Biol 2018; 12:247-257. [PMID: 30472688 PMCID: PMC8687241 DOI: 10.1049/iet-syb.2018.5024] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2018] [Revised: 04/23/2018] [Accepted: 04/30/2018] [Indexed: 02/01/2023] Open
Abstract
The identification of essential proteins in protein-protein interaction (PPI) networks is not only important in understanding the process of cellular life but also useful in diagnosis and drug design. The network topology-based centrality measures are sensitive to noise of network. Moreover, these measures cannot detect low-connectivity essential proteins. The authors have proposed a new method using a combination of topological centrality measures and biological features based on statistical analyses of essential proteins and protein complexes. With incomplete PPI networks, they face the challenge of false-positive interactions. To remove these interactions, the PPI networks are weighted by gene ontology. Furthermore, they use a combination of classifiers, including the newly proposed measures and traditional weighted centrality measures, to improve the precision of identification. This combination is evaluated using the logistic regression model in terms of significance levels. The proposed method has been implemented and compared to both previous and more recent efficient computational methods using six statistical standards. The results show that the proposed method is more precise in identifying essential proteins than the previous methods. This level of precision was obtained through the use of four different data sets: YHQ-W, YMBD-W, YDIP-W and YMIPS-W.
Collapse
Affiliation(s)
- Abdolkarim Elahi
- Department of Software Engineering, University of Kashan, Kashan, Iran
| | | |
Collapse
|
29
|
Dong C, Jin YT, Hua HL, Wen QF, Luo S, Zheng WX, Guo FB. Comprehensive review of the identification of essential genes using computational methods: focusing on feature implementation and assessment. Brief Bioinform 2018; 21:171-181. [PMID: 30496347 DOI: 10.1093/bib/bby116] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2018] [Revised: 11/01/2018] [Accepted: 11/02/2018] [Indexed: 02/06/2023] Open
Abstract
Essential genes have attracted increasing attention in recent years due to the important functions of these genes in organisms. Among the methods used to identify the essential genes, accurate and efficient computational methods can make up for the deficiencies of expensive and time-consuming experimental technologies. In this review, we have collected researches on essential gene predictions in prokaryotes and eukaryotes and summarized the five predominant types of features used in these studies. The five types of features include evolutionary conservation, domain information, network topology, sequence component and expression level. We have described how to implement the useful forms of these features and evaluated their performance based on the data of Escherichia coli MG1655, Bacillus subtilis 168 and human. The prerequisite and applicable range of these features is described. In addition, we have investigated the techniques used to weight features in various models. To facilitate researchers in the field, two available online tools, which are accessible for free and can be directly used to predict gene essentiality in prokaryotes and humans, were referred. This article provides a simple guide for the identification of essential genes in prokaryotes and eukaryotes.
Collapse
Affiliation(s)
- Chuan Dong
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, China
| | - Yan-Ting Jin
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, China
| | - Hong-Li Hua
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, China
| | - Qing-Feng Wen
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, China
| | - Sen Luo
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, China
| | - Wen-Xin Zheng
- School of Biomedical Engineering, Capital Medical University, Beijing, China
| | - Feng-Biao Guo
- School of Life Science and Technology, Center for Informational Biology, Intelligent Learning Institute for Science and Application, University of Electronic Science and Technology of China, Chengdu, China
| |
Collapse
|
30
|
Zeng P, Chen J, Meng Y, Zhou Y, Yang J, Cui Q. Defining Essentiality Score of Protein-Coding Genes and Long Noncoding RNAs. Front Genet 2018; 9:380. [PMID: 30356729 PMCID: PMC6189311 DOI: 10.3389/fgene.2018.00380] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2018] [Accepted: 08/27/2018] [Indexed: 12/16/2022] Open
Abstract
Measuring the essentiality of genes is critically important in biology and medicine. Here we proposed a computational method, GIC (Gene Importance Calculator), which can efficiently predict the essentiality of both protein-coding genes and long noncoding RNAs (lncRNAs) based on only sequence information. For identifying the essentiality of protein-coding genes, GIC outperformed well-established computational scores. In an independent mouse lncRNA dataset, GIC also achieved an exciting performance (AUC = 0.918). In contrast, the traditional computational methods are not applicable to lncRNAs. Moreover, we explored several potential applications of GIC score. Firstly, we revealed a correlation between gene GIC score and research hotspots of genes. Moreover, GIC score can be used to evaluate whether a gene in mouse is representative for its homolog in human by dissecting its cross-species difference. This is critical for basic medicine because many basic medical studies are performed in animal models. Finally, we showed that GIC score can be used to identify candidate genes from a transcriptomics study. GIC is freely available at http://www.cuilab.cn/gic/.
Collapse
Affiliation(s)
- Pan Zeng
- School of Basic Medical Sciences, MOE Key Lab of Cardiovascular Sciences, Department of Biomedical Informatics, Department of Physiology and Pathophysiology, Centre for Noncoding RNA Medicine, Peking University, Beijing, China
| | - Ji Chen
- School of Basic Medical Sciences, MOE Key Lab of Cardiovascular Sciences, Department of Biomedical Informatics, Department of Physiology and Pathophysiology, Centre for Noncoding RNA Medicine, Peking University, Beijing, China
| | - Yuhong Meng
- School of Basic Medical Sciences, MOE Key Lab of Cardiovascular Sciences, Department of Biomedical Informatics, Department of Physiology and Pathophysiology, Centre for Noncoding RNA Medicine, Peking University, Beijing, China
| | - Yuan Zhou
- School of Basic Medical Sciences, MOE Key Lab of Cardiovascular Sciences, Department of Biomedical Informatics, Department of Physiology and Pathophysiology, Centre for Noncoding RNA Medicine, Peking University, Beijing, China
| | - Jichun Yang
- School of Basic Medical Sciences, MOE Key Lab of Cardiovascular Sciences, Department of Biomedical Informatics, Department of Physiology and Pathophysiology, Centre for Noncoding RNA Medicine, Peking University, Beijing, China
| | - Qinghua Cui
- School of Basic Medical Sciences, MOE Key Lab of Cardiovascular Sciences, Department of Biomedical Informatics, Department of Physiology and Pathophysiology, Centre for Noncoding RNA Medicine, Peking University, Beijing, China
| |
Collapse
|
31
|
Wang Z, Li Z, Yuan G, Sun Y, Rui X, Xiang X. Tracking the evolution of overlapping communities in dynamic social networks. Knowl Based Syst 2018. [DOI: 10.1016/j.knosys.2018.05.026] [Citation(s) in RCA: 42] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
32
|
Alterations of 63 hub genes during lingual carcinogenesis in C57BL/6J mice. Sci Rep 2018; 8:12626. [PMID: 30135512 PMCID: PMC6105652 DOI: 10.1038/s41598-018-31103-3] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2018] [Accepted: 08/08/2018] [Indexed: 12/18/2022] Open
Abstract
To identify potential biomarkers of lingual cancer, 75 female C57BL/6J mice were subjected to 16-week oral delivery of 4-nitroquinoline-1-oxide (4NQO; 50 mg/L), with 10 mice used as controls. Lingual mucosa samples representative of normal tissue (week 0) and early (week 12) and advanced (week 28) tumorigenesis were harvested for microarray and methylated DNA immunoprecipitation sequencing (MeDIP-Seq). Combined analysis with Short Time-series Expression Miner (STEM), the Cytoscape plugin cytoHubba, and screening of differentially expressed genes enabled identification of 63 hub genes predominantly altered in the early stage rather than the advanced stage. Validation of microarray results was carried out using qRT-PCR. Of 63 human orthologous genes, 35 correlated with human oral squamous cell carcinoma. KEGG analysis showed "pathways in cancer", involving 13 hub genes, as the leading KEGG term. Significant alterations in promoter methylation were confirmed at Tbp, Smad1, Smad4, Pdpk1, Camk2, Atxn3, and Cdh2. HDAC2, TBP, and EP300 scored ≥10 on Maximal Clique Centrality (MCC) in STEM profile 11 and were overexpressed in human tongue cancer samples. However, expression did not correlate with smoking status, tumor differentiation, or overall survival. These results highlight potentially useful candidate biomarkers for lingual cancer prevention, diagnosis, and treatment.
Collapse
|
33
|
Lei X, Yang X. A new method for predicting essential proteins based on participation degree in protein complex and subgraph density. PLoS One 2018; 13:e0198998. [PMID: 29894517 PMCID: PMC5997351 DOI: 10.1371/journal.pone.0198998] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2018] [Accepted: 05/30/2018] [Indexed: 12/11/2022] Open
Abstract
Essential proteins are crucial to living cells. Identification of essential proteins from protein-protein interaction (PPI) networks can be applied to pathway analysis and function prediction, furthermore, it can contribute to disease diagnosis and drug design. There have been some experimental and computational methods designed to identify essential proteins, however, the prediction precision remains to be improved. In this paper, we propose a new method for identifying essential proteins based on Participation degree of a protein in protein Complexes and Subgraph Density, named as PCSD. In order to test the performance of PCSD, four PPI datasets (DIP, Krogan, MIPS and Gavin) are used to conduct experiments. The experiment results have demonstrated that PCSD achieves a better performance for predicting essential proteins compared with some competing methods including DC, SC, EC, IC, LAC, NC, WDC, PeC, UDoNC, and compared with the most recent method LBCC, PCSD can correctly predict more essential proteins from certain numbers of top ranked proteins on the DIP dataset, which indicates that PCSD is very effective in discovering essential proteins in most case.
Collapse
Affiliation(s)
- Xiujuan Lei
- School of Computer Science, Shaanxi Normal University, Xi’an, China
| | - Xiaoqin Yang
- School of Computer Science, Shaanxi Normal University, Xi’an, China
| |
Collapse
|
34
|
Li M, Li W, Wu FX, Pan Y, Wang J. Identifying essential proteins based on sub-network partition and prioritization by integrating subcellular localization information. J Theor Biol 2018; 447:65-73. [PMID: 29571709 DOI: 10.1016/j.jtbi.2018.03.029] [Citation(s) in RCA: 27] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2017] [Revised: 03/19/2018] [Accepted: 03/20/2018] [Indexed: 01/07/2023]
Abstract
Essential proteins are important participants in various life activities and play a vital role in the survival and reproduction of living organisms. Identification of essential proteins from protein-protein interaction (PPI) networks has great significance to facilitate the study of human complex diseases, the design of drugs and the development of bioinformatics and computational science. Studies have shown that highly connected proteins in a PPI network tend to be essential. A series of computational methods have been proposed to identify essential proteins by analyzing topological structures of PPI networks. However, the high noise in the PPI data can degrade the accuracy of essential protein prediction. Moreover, proteins must be located in the appropriate subcellular localization to perform their functions, and only when the proteins are located in the same subcellular localization, it is possible that they can interact with each other. In this paper, we propose a new network-based essential protein discovery method based on sub-network partition and prioritization by integrating subcellular localization information, named SPP. The proposed method SPP was tested on two different yeast PPI networks obtained from DIP database and BioGRID database. The experimental results show that SPP can effectively reduce the effect of false positives in PPI networks and predict essential proteins more accurately compared with other existing computational methods DC, BC, CC, SC, EC, IC, NC.
Collapse
Affiliation(s)
- Min Li
- School of Information Science and Engineering, Central South University, Changsha 410083, China.
| | - Wenkai Li
- School of Information Science and Engineering, Central South University, Changsha 410083, China.
| | - Fang-Xiang Wu
- Division of Biomedical Engineering and Department of Mechanical Engineering, University of Saskatchewan, Saskatoon, SK S7N 5A9, Canada.
| | - Yi Pan
- Department of Computer Science, Georgia State University, Atlanta, GA 30302-4110, USA.
| | - Jianxin Wang
- School of Information Science and Engineering, Central South University, Changsha 410083, China.
| |
Collapse
|
35
|
Lei X, Zhang Y, Cheng S, Wu FX, Pedrycz W. Topology potential based seed-growth method to identify protein complexes on dynamic PPI data. Inf Sci (N Y) 2018. [DOI: 10.1016/j.ins.2017.10.013] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
36
|
Disease gene classification with metagraph representations. Methods 2017; 131:83-92. [DOI: 10.1016/j.ymeth.2017.06.036] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2017] [Revised: 06/23/2017] [Accepted: 06/30/2017] [Indexed: 12/28/2022] Open
|
37
|
|
38
|
Qin C, Sun Y, Dong Y. A new computational strategy for identifying essential proteins based on network topological properties and biological information. PLoS One 2017; 12:e0182031. [PMID: 28753682 PMCID: PMC5533339 DOI: 10.1371/journal.pone.0182031] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2017] [Accepted: 07/11/2017] [Indexed: 12/26/2022] Open
Abstract
Essential proteins are the proteins that are indispensable to the survival and development of an organism. Deleting a single essential protein will cause lethality or infertility. Identifying and analysing essential proteins are key to understanding the molecular mechanisms of living cells. There are two types of methods for predicting essential proteins: experimental methods, which require considerable time and resources, and computational methods, which overcome the shortcomings of experimental methods. However, the prediction accuracy of computational methods for essential proteins requires further improvement. In this paper, we propose a new computational strategy named CoTB for identifying essential proteins based on a combination of topological properties, subcellular localization information and orthologous protein information. First, we introduce several topological properties of the protein-protein interaction (PPI) network. Second, we propose new methods for measuring orthologous information and subcellular localization and a new computational strategy that uses a random forest prediction model to obtain a probability score for the proteins being essential. Finally, we conduct experiments on four different Saccharomyces cerevisiae datasets. The experimental results demonstrate that our strategy for identifying essential proteins outperforms traditional computational methods and the most recently developed method, SON. In particular, our strategy improves the prediction accuracy to 89, 78, 79, and 85 percent on the YDIP, YMIPS, YMBD and YHQ datasets at the top 100 level, respectively.
Collapse
Affiliation(s)
- Chao Qin
- Beijing Key Lab of Traffic Data Analysis and Mining, School of Computer and Information Technology, Beijing Jiaotong University, Beijing, China
| | - Yongqi Sun
- Beijing Key Lab of Traffic Data Analysis and Mining, School of Computer and Information Technology, Beijing Jiaotong University, Beijing, China
- * E-mail:
| | - Yadong Dong
- Beijing Key Lab of Traffic Data Analysis and Mining, School of Computer and Information Technology, Beijing Jiaotong University, Beijing, China
| |
Collapse
|
39
|
Truong CD, Tran TD, Kwon YK. MORO: a Cytoscape app for relationship analysis between modularity and robustness in large-scale biological networks. BMC SYSTEMS BIOLOGY 2016; 10:122. [PMID: 28155725 PMCID: PMC5260057 DOI: 10.1186/s12918-016-0363-3] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
BACKGROUND Although there have been many studies revealing that dynamic robustness of a biological network is related to its modularity characteristics, no proper tool exists to investigate the relation between network dynamics and modularity. RESULTS Accordingly, we developed a novel Cytoscape app called MORO, which can conveniently analyze the relationship between network modularity and robustness. We employed an existing algorithm to analyze the modularity of directed graphs and a Boolean network model for robustness calculation. In particular, to ensure the robustness algorithm's applicability to large-scale networks, we implemented it as a parallel algorithm by using the OpenCL library. A batch-mode simulation function was also developed to verify whether an observed relationship between modularity and robustness is conserved in a large set of randomly structured networks. The app provides various visualization modes to better elucidate topological relations between modules, and tabular results of centrality and gene ontology enrichment analyses of modules. We tested the proposed app to analyze large signaling networks and showed an interesting relationship between network modularity and robustness. CONCLUSIONS Our app can be a promising tool which efficiently analyzes the relationship between modularity and robustness in large signaling networks.
Collapse
Affiliation(s)
- Cong-Doan Truong
- Department of IT Convergence, University of Ulsan, 93 Daehak-ro, Nam-gu, Ulsan, 680-749, Republic of Korea
| | - Tien-Dzung Tran
- Complex Network and Bioinformatics Group, Center for Research and Development, Hanoi University of Industry, Hanoi, Vietnam
| | - Yung-Keun Kwon
- Department of IT Convergence, University of Ulsan, 93 Daehak-ro, Nam-gu, Ulsan, 680-749, Republic of Korea.
| |
Collapse
|
40
|
Zhang W, Xu J, Li X, Zou X. A New Method for Identifying Essential Proteins by Measuring Co-Expression and Functional Similarity. IEEE Trans Nanobioscience 2016; 15:939-945. [DOI: 10.1109/tnb.2016.2625460] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
|
41
|
Li G, Li M, Wang J, Wu J, Wu FX, Pan Y. Predicting essential proteins based on subcellular localization, orthology and PPI networks. BMC Bioinformatics 2016; 17 Suppl 8:279. [PMID: 27586883 PMCID: PMC5009824 DOI: 10.1186/s12859-016-1115-5] [Citation(s) in RCA: 48] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023] Open
Abstract
Background Essential proteins play an indispensable role in the cellular survival and development. There have been a series of biological experimental methods for finding essential proteins; however they are time-consuming, expensive and inefficient. In order to overcome the shortcomings of biological experimental methods, many computational methods have been proposed to predict essential proteins. The computational methods can be roughly divided into two categories, the topology-based methods and the sequence-based ones. The former use the topological features of protein-protein interaction (PPI) networks while the latter use the sequence features of proteins to predict essential proteins. Nevertheless, it is still challenging to improve the prediction accuracy of the computational methods. Results Comparing with nonessential proteins, essential proteins appear more frequently in certain subcellular locations and their evolution more conservative. By integrating the information of subcellular localization, orthologous proteins and PPI networks, we propose a novel essential protein prediction method, named SON, in this study. The experimental results on S.cerevisiae data show that the prediction accuracy of SON clearly exceeds that of nine competing methods: DC, BC, IC, CC, SC, EC, NC, PeC and ION. Conclusions We demonstrate that, by integrating the information of subcellular localization, orthologous proteins with PPI networks, the accuracy of predicting essential proteins can be improved. Our proposed method SON is effective for predicting essential proteins.
Collapse
Affiliation(s)
- Gaoshi Li
- School of Information Science and Engineering, Central South University, Changsha, 410083, Hunan, People's Republic of China.,Guangxi Key Lab of Multi-source Information Mining and Security, Guangxi Normal University, Guilin, 541004, Guangxi, People's Republic of China
| | - Min Li
- School of Information Science and Engineering, Central South University, Changsha, 410083, Hunan, People's Republic of China.
| | - Jianxin Wang
- School of Information Science and Engineering, Central South University, Changsha, 410083, Hunan, People's Republic of China.
| | - Jingli Wu
- Guangxi Key Lab of Multi-source Information Mining and Security, Guangxi Normal University, Guilin, 541004, Guangxi, People's Republic of China
| | - Fang-Xiang Wu
- School of Information Science and Engineering, Central South University, Changsha, 410083, Hunan, People's Republic of China.,Department of Mechanical Engineering and Division of Biomedical Engineering, University of Saskatchewan, Saskatoon, S7N 5A9, SK, Canada
| | - Yi Pan
- School of Information Science and Engineering, Central South University, Changsha, 410083, Hunan, People's Republic of China.,Department of Computer Science, Georgia State University, Atlanta, 30302-4110, GA, USA
| |
Collapse
|
42
|
Zhang X, Xiao W, Acencio ML, Lemke N, Wang X. An ensemble framework for identifying essential proteins. BMC Bioinformatics 2016; 17:322. [PMID: 27557880 PMCID: PMC4997703 DOI: 10.1186/s12859-016-1166-7] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2016] [Accepted: 08/09/2016] [Indexed: 11/10/2022] Open
Abstract
Background Many centrality measures have been proposed to mine and characterize the correlations between network topological properties and protein essentiality. However, most of them show limited prediction accuracy, and the number of common predicted essential proteins by different methods is very small. Results In this paper, an ensemble framework is proposed which integrates gene expression data and protein-protein interaction networks (PINs). It aims to improve the prediction accuracy of basic centrality measures. The idea behind this ensemble framework is that different protein-protein interactions (PPIs) may show different contributions to protein essentiality. Five standard centrality measures (degree centrality, betweenness centrality, closeness centrality, eigenvector centrality, and subgraph centrality) are integrated into the ensemble framework respectively. We evaluated the performance of the proposed ensemble framework using yeast PINs and gene expression data. The results show that it can considerably improve the prediction accuracy of the five centrality measures individually. It can also remarkably increase the number of common predicted essential proteins among those predicted by each centrality measure individually and enable each centrality measure to find more low-degree essential proteins. Conclusions This paper demonstrates that it is valuable to differentiate the contributions of different PPIs for identifying essential proteins based on network topological characteristics. The proposed ensemble framework is a successful paradigm to this end. Electronic supplementary material The online version of this article (doi:10.1186/s12859-016-1166-7) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Xue Zhang
- Systems Biology Core, NHLBI, NIH, 9000 Rockville Pike, Bethesda, MD, 20892, USA
| | - Wangxin Xiao
- Department of Computer Science, XiangNan University, Eastern Wangxian Park, Chenzhou, Hunan, 423000, China.
| | - Marcio Luis Acencio
- Department of Physics and Biophysics, Institute of Biosciences of Botucatu, UNESP-São Paulo State University, CEP 18618-970, Botucatu, São Paulo, 510, Brazil.,Department of Cancer Research and Molecular Medicine, Norwegian University of Science and Technology (NTNU), P.B. 8905, N-7491, Trondheim, Norway
| | - Ney Lemke
- Department of Physics and Biophysics, Institute of Biosciences of Botucatu, UNESP-São Paulo State University, CEP 18618-970, Botucatu, São Paulo, 510, Brazil
| | - Xujing Wang
- Systems Biology Core, NHLBI, NIH, 9000 Rockville Pike, Bethesda, MD, 20892, USA.
| |
Collapse
|
43
|
Li M, Tang Y, Wu X, Wang J, Wu FX, Pan Y. C-DEVA: Detection, evaluation, visualization and annotation of clusters from biological networks. Biosystems 2016; 150:78-86. [PMID: 27530307 DOI: 10.1016/j.biosystems.2016.08.004] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2015] [Revised: 07/21/2016] [Accepted: 08/08/2016] [Indexed: 10/21/2022]
Abstract
With the progress of studies and researches on the biological networks, plenty of excellent clustering algorithms have been proposed. Nevertheless, not only different algorithms but also the same algorithms with different characteristics result in different performances on the same biological networks. Therefore, it might be difficult for researchers to choose an appropriate clustering algorithm to use for a specific network. Here we present C-DEVA, a comprehensive platform for Detecting clusters from biological networks and its Evaluation, Visualization and Annotation analysis. Ten clustering methods are provided in C-DEVA, covering different types of clustering algorithms, with a discrepancy in principle of each type. For the identified complexes, there are over ten popular and traditional bio-statistical measurements to assess them. And multi-source biological information has been integrated in C-DEVA, such as biology-functional annotations, and gold standard complex sets, which are collected from latest datasets in major databases or related papers. Furthermore, visualization analyses are available throughout the whole workflow, which endows C-DEVA with good usability and simple manipulation. To assure extensibility, development interfaces are offered in C-DEVA, for integrating new clustering as well as evaluating methods. Additionally, operations to the network as for example network randomization are also supported. C-DEVA provides a complete tool for identifying clusters from biological networks. Multiple options are offered during the analysis process, including detection methods, evaluation metrics and visualization modules. In addition, researchers could customize C-DEVA for the workflow according to the properties of their networks, and find the most ideal results. C-DEVA is released under the GNU General Public License (GPL), and the source code and binaries are freely available at https://github.com/cici333/c-deva.
Collapse
Affiliation(s)
- Min Li
- School of Information Science and Engineering, Central South University, Changsha, 410083, China.
| | - Yu Tang
- School of Information Science and Engineering, Central South University, Changsha, 410083, China.
| | - Xuehong Wu
- School of Information Science and Engineering, Central South University, Changsha, 410083, China.
| | - Jianxin Wang
- School of Information Science and Engineering, Central South University, Changsha, 410083, China.
| | - Fang-Xiang Wu
- School of Information Science and Engineering, Central South University, Changsha, 410083, China; Division of Biomedical Engineering and Department of Mechanical Engineering, University of Saskatchewan, Saskatoon, SK, S7N 5A9, Canada.
| | - Yi Pan
- School of Information Science and Engineering, Central South University, Changsha, 410083, China; Department of Computer Science, Georgia State University, Atlanta, GA, 30302-4110, USA.
| |
Collapse
|
44
|
Abstract
Background Protein complexes play an important role in biological processes. Recent developments in experiments have resulted in the publication of many high-quality, large-scale protein-protein interaction (PPI) datasets, which provide abundant data for computational approaches to the prediction of protein complexes. However, the precision of protein complex prediction still needs to be improved due to the incompletion and noise in PPI networks. Results There exist complex and diverse relationships among proteins after integrating multiple sources of biological information. Considering that the influences of different types of interactions are not the same weight for protein complex prediction, we construct a multi-relationship protein interaction network (MPIN) by integrating PPI network topology with gene ontology annotation information. Then, we design a novel algorithm named MINE (identifying protein complexes based on Multi-relationship protein Interaction NEtwork) to predict protein complexes with high cohesion and low coupling from MPIN. Conclusions The experiments on yeast data show that MINE outperforms the current methods in terms of both accuracy and statistical significance.
Collapse
|
45
|
Essential protein discovery based on a combination of modularity and conservatism. Methods 2016; 110:54-63. [PMID: 27402354 DOI: 10.1016/j.ymeth.2016.07.005] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2016] [Revised: 06/05/2016] [Accepted: 07/08/2016] [Indexed: 01/22/2023] Open
Abstract
Essential proteins are indispensable for the survival of a living organism and play important roles in the emerging field of synthetic biology. Many computational methods have been proposed to identify essential proteins by using the topological features of interactome networks. However, most of these methods ignored intrinsic biological meaning of proteins. Researches show that essentiality is tied not only to the protein or gene itself, but also to the molecular modules to which that protein belongs. The results of this study reveal the modularity of essential proteins. On the other hand, essential proteins are more evolutionarily conserved than nonessential proteins and frequently bind each other. That is to say, conservatism is another important feature of essential proteins. Multiple networks are constructed by integrating protein-protein interaction (PPI) networks, time course gene expression data and protein domain information. Based on these networks, a new essential protein identification method is proposed based on a combination of modularity and conservatism of proteins. Experimental results show that the proposed method outperforms other essential protein identification methods in terms of a number essential protein out of top ranked candidates.
Collapse
|
46
|
Shang X, Wang Y, Chen B. Identifying essential proteins based on dynamic protein-protein interaction networks and RNA-Seq datasets. SCIENCE CHINA INFORMATION SCIENCES 2016; 59:070106. [DOI: 10.1007/s11432-016-5583-z] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/09/2024]
|
47
|
Zhao B, Wang J, Li M, Li X, Li Y, Wu FX, Pan Y. A New Method for Predicting Protein Functions From Dynamic Weighted Interactome Networks. IEEE Trans Nanobioscience 2016; 15:131-9. [PMID: 26955047 DOI: 10.1109/tnb.2016.2536161] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
Automated annotation of protein function is challenging. As the number of sequenced genomes rapidly grows, the overwhelming majority of proteins can only be annotated computationally. Under new conditions or stimuli, not only the number and location of proteins would be changed, but also their interactions. This dynamic feature of protein interactions, however, was not considered in the existing function prediction algorithms. Taking the dynamic nature of protein interactions into consideration, we construct a dynamic weighted interactome network (DWIN) by integrating protein-protein interaction (PPI) network and time course gene expression data, as well as proteins' domain information and protein complex information. Then, we propose a new prediction approach that predicts protein functions from the constructed dynamic weighted interactome network. For an unknown protein, the proposed method visits dynamic networks at different time points and scores functions derived from all neighbors. Finally, the method selects top N functions from these ranked candidate functions to annotate the testing protein. Experiments on PPI datasets were conducted to evaluate the effectiveness of the proposed approach in predicting unknown protein functions. The evaluation results demonstrated that the proposed method outperforms other competing methods.
Collapse
|
48
|
ProSim: A Method for Prioritizing Disease Genes Based on Protein Proximity and Disease Similarity. BIOMED RESEARCH INTERNATIONAL 2015; 2015:213750. [PMID: 26339594 PMCID: PMC4538409 DOI: 10.1155/2015/213750] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/15/2014] [Accepted: 01/16/2015] [Indexed: 01/19/2023]
Abstract
Predicting disease genes for a particular genetic disease is very challenging in bioinformatics. Based on current research studies, this challenge can be tackled via network-based approaches. Furthermore, it has been highlighted that it is necessary to consider disease similarity along with the protein's proximity to disease genes in a protein-protein interaction (PPI) network in order to improve the accuracy of disease gene prioritization. In this study we propose a new algorithm called proximity disease similarity algorithm (ProSim), which takes both of the aforementioned properties into consideration, to prioritize disease genes. To illustrate the proposed algorithm, we have conducted six case studies, namely, prostate cancer, Alzheimer's disease, diabetes mellitus type 2, breast cancer, colorectal cancer, and lung cancer. We employed leave-one-out cross validation, mean enrichment, tenfold cross validation, and ROC curves to evaluate our proposed method and other existing methods. The results show that our proposed method outperforms existing methods such as PRINCE, RWR, and DADA.
Collapse
|
49
|
Luo J, Qi Y. Identification of Essential Proteins Based on a New Combination of Local Interaction Density and Protein Complexes. PLoS One 2015; 10:e0131418. [PMID: 26125187 PMCID: PMC4488326 DOI: 10.1371/journal.pone.0131418] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2015] [Accepted: 06/02/2015] [Indexed: 11/18/2022] Open
Abstract
Background Computational approaches aided by computer science have been used to predict essential proteins and are faster than expensive, time-consuming, laborious experimental approaches. However, the performance of such approaches is still poor, making practical applications of computational approaches difficult in some fields. Hence, the development of more suitable and efficient computing methods is necessary for identification of essential proteins. Method In this paper, we propose a new method for predicting essential proteins in a protein interaction network, local interaction density combined with protein complexes (LIDC), based on statistical analyses of essential proteins and protein complexes. First, we introduce a new local topological centrality, local interaction density (LID), of the yeast PPI network; second, we discuss a new integration strategy for multiple bioinformatics. The LIDC method was then developed through a combination of LID and protein complex information based on our new integration strategy. The purpose of LIDC is discovery of important features of essential proteins with their neighbors in real protein complexes, thereby improving the efficiency of identification. Results Experimental results based on three different PPI(protein-protein interaction) networks of Saccharomyces cerevisiae and Escherichia coli showed that LIDC outperformed classical topological centrality measures and some recent combinational methods. Moreover, when predicting MIPS datasets, the better improvement of performance obtained by LIDC is over all nine reference methods (i.e., DC, BC, NC, LID, PeC, CoEWC, WDC, ION, and UC). Conclusions LIDC is more effective for the prediction of essential proteins than other recently developed methods.
Collapse
Affiliation(s)
- Jiawei Luo
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, China
- * E-mail:
| | - Yi Qi
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, China
| |
Collapse
|
50
|
Khuri S, Wuchty S. Essentiality and centrality in protein interaction networks revisited. BMC Bioinformatics 2015; 16:109. [PMID: 25880655 PMCID: PMC4411940 DOI: 10.1186/s12859-015-0536-x] [Citation(s) in RCA: 32] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2014] [Accepted: 03/13/2015] [Indexed: 12/29/2022] Open
Abstract
Background Minimum dominating sets (MDSet) of protein interaction networks allow the control of underlying protein interaction networks through their topological placement. While essential proteins are enriched in MDSets, we hypothesize that the statistical properties of biological functions of essential genes are enhanced when we focus on essential MDSet proteins (e-MDSet). Results Here, we determined minimum dominating sets of proteins (MDSet) in interaction networks of E. coli, S. cerevisiae and H. sapiens, defined as subsets of proteins whereby each remaining protein can be reached by a single interaction. We compared several topological and functional parameters of essential, MDSet, and essential MDSet (e-MDSet) proteins. In particular, we observed that their topological placement allowed e-MDSet proteins to provide a positive correlation between degree and lethality, connect more protein complexes, and have a stronger impact on network resilience than essential proteins alone. In comparison to essential proteins we further found that interactions between e-MDSet proteins appeared more frequently within complexes, while interactions of e-MDSet proteins between complexes were depleted. Finally, these e-MDSet proteins classified into functional groupings that play a central role in survival and adaptability. Conclusions The determination of e-MDSet of an organism highlights a set of proteins that enhances the enrichment signals of biological functions of essential proteins. As a consequence, we surmise that e-MDSets may provide a new method of evaluating the core proteins of an organism.
Collapse
Affiliation(s)
- Sawsan Khuri
- Department of Computer Science, University of Miami, Coral Gables, FL, 33146, USA. .,Center for Computational Science, University of Miami, Coral Gables, FL, 33146, USA.
| | - Stefan Wuchty
- Department of Computer Science, University of Miami, Coral Gables, FL, 33146, USA. .,Center for Computational Science, University of Miami, Coral Gables, FL, 33146, USA.
| |
Collapse
|