1
|
Lawson S, Donovan D, Lefevre J. An application of node and edge nonlinear hypergraph centrality to a protein complex hypernetwork. PLoS One 2024; 19:e0311433. [PMID: 39361678 PMCID: PMC11449304 DOI: 10.1371/journal.pone.0311433] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2024] [Accepted: 09/12/2024] [Indexed: 10/05/2024] Open
Abstract
The use of graph centrality measures applied to biological networks, such as protein interaction networks, underpins much research into identifying key players within biological processes. This approach however is restricted to dyadic interactions and it is well-known that in many instances interactions are polyadic. In this study we illustrate the merit of using hypergraph centrality applied to a hypernetwork as an alternative. Specifically, we review and propose an extension to a recently introduced node and edge nonlinear hypergraph centrality model which provides mutually dependent node and edge centralities. A Saccharomyces Cerevisiae protein complex hypernetwork is used as an example application with nodes representing proteins and hyperedges representing protein complexes. The resulting rankings of the nodes and edges are considered to see if they provide insight into the essentiality of the proteins and complexes. We find that certain variations of the model predict essentiality more accurately and that the degree-based variation illustrates that the centrality-lethality rule extends to a hypergraph setting. In particular, through exploitation of the models flexibility, we identify small sets of proteins densely populated with essential proteins. One of the key advantages of applying this model to a protein complex hypernetwork is that it also provides a classification method for protein complexes, unlike previous approaches which are only concerned with classifying proteins.
Collapse
Affiliation(s)
- Sarah Lawson
- ARC Centre of Excellence, Plant Success in Nature and Agriculture, School of Mathematics and Physics, The University of Queensland, Brisbane, Queensland, Australia
| | - Diane Donovan
- ARC Centre of Excellence, Plant Success in Nature and Agriculture, School of Mathematics and Physics, The University of Queensland, Brisbane, Queensland, Australia
| | - James Lefevre
- ARC Centre of Excellence, Plant Success in Nature and Agriculture, School of Mathematics and Physics, The University of Queensland, Brisbane, Queensland, Australia
| |
Collapse
|
2
|
Pan L, Wang H, Yang B, Li W. A protein network refinement method based on module discovery and biological information. BMC Bioinformatics 2024; 25:157. [PMID: 38643108 PMCID: PMC11031909 DOI: 10.1186/s12859-024-05772-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2024] [Accepted: 04/10/2024] [Indexed: 04/22/2024] Open
Abstract
BACKGROUND The identification of essential proteins can help in understanding the minimum requirements for cell survival and development to discover drug targets and prevent disease. Nowadays, node ranking methods are a common way to identify essential proteins, but the poor data quality of the underlying PIN has somewhat hindered the identification accuracy of essential proteins for these methods in the PIN. Therefore, researchers constructed refinement networks by considering certain biological properties of interacting protein pairs to improve the performance of node ranking methods in the PIN. Studies show that proteins in a complex are more likely to be essential than proteins not present in the complex. However, the modularity is usually ignored for the refinement methods of the PINs. METHODS Based on this, we proposed a network refinement method based on module discovery and biological information. The idea is, first, to extract the maximal connected subgraph in the PIN, and to divide it into different modules by using Fast-unfolding algorithm; then, to detect critical modules according to the orthologous information, subcellular localization information and topology information within each module; finally, to construct a more refined network (CM-PIN) by using the identified critical modules. RESULTS To evaluate the effectiveness of the proposed method, we used 12 typical node ranking methods (LAC, DC, DMNC, NC, TP, LID, CC, BC, PR, LR, PeC, WDC) to compare the overall performance of the CM-PIN with those on the S-PIN, D-PIN and RD-PIN. The experimental results showed that the CM-PIN was optimal in terms of the identification number of essential proteins, precision-recall curve, Jackknifing method and other criteria, and can help to identify essential proteins more accurately.
Collapse
Affiliation(s)
- Li Pan
- Hunan Institute of Science and Technology, Yueyang, 414006, China
- Hunan Engineering Research Center of Multimodal Health Sensing and Intelligent Analysis, Yueyang, 414006, China
| | - Haoyue Wang
- Hunan Institute of Science and Technology, Yueyang, 414006, China.
| | - Bo Yang
- Hunan Institute of Science and Technology, Yueyang, 414006, China
- Hunan Engineering Research Center of Multimodal Health Sensing and Intelligent Analysis, Yueyang, 414006, China
| | - Wenbin Li
- Hunan Institute of Science and Technology, Yueyang, 414006, China.
| |
Collapse
|
3
|
Payra AK, Saha B, Ghosh A. MEM-FET: Essential protein prediction using membership feature and machine learning approach. Proteins 2024; 92:60-75. [PMID: 37638618 DOI: 10.1002/prot.26577] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2022] [Revised: 02/21/2023] [Accepted: 08/08/2023] [Indexed: 08/29/2023]
Abstract
Proteins are played key roles in different functionalities in our daily life. All functional roles of a protein are a bit enhanced in interaction compared to individuals. Identification of essential proteins of an organism is a time consume and costly task during observation in the wet lab. The results of observation in wet lab always ensure high reliability and accuracy in the biological ground. Essential protein prediction using computational approaches is an alternative choice in research. It proves its significance rapidly in day-to-day life as well as reduces the experimental cost of wet lab effectively. Existing computational methods were implemented using Protein interaction networks (PPIN), Sequence, Gene Expression Dataset (GED), Gene Ontology (GO), Orthologous groups, and Subcellular localized datasets. Machine learning has diverse categories of features that enable to model and predict essential macromolecules of understudied organisms. A novel methodology MEM-FET (membership feature) is predicted based on features, that is, edge clustering coefficient, Average clustering coefficient, subcellular localization, and Gene Ontology within a compartment of common neighbors. The accuracy (ACC) values of the predicted true positive (TP) essential proteins are 0.79, 0.74, 0.78, and 0.71 for YHQ, YMIPS, YDIP, and YMBD datasets. An enriched set of essential proteins are also predicted using the MEM-FET algorithm. Ensemble ML also validated the proposed model with an accuracy of 60%. It has been predicted that MEM-FET algorithms outperform other existing algorithms with an ACC value of 80% for the yeast dataset.
Collapse
Affiliation(s)
- Anjan Kumar Payra
- Department of Computer Science and Engineering, Dr. Sudhir Chandra Sur Degree Engineering College, Kolkata, India
| | - Banani Saha
- Department of Computer Science and Engineering, University of Calcutta, Kolkata, India
| | - Anupam Ghosh
- Department of Computer Science and Engineering, Netaji Subhash Engineering College, Kolkata, India
| |
Collapse
|
4
|
Payra AK, Saha B, Ghosh A. MM-CCNB: Essential protein prediction using MAX-MIN strategies and compartment of common neighboring approach. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2023; 228:107247. [PMID: 36427433 DOI: 10.1016/j.cmpb.2022.107247] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/27/2022] [Revised: 10/16/2022] [Accepted: 11/14/2022] [Indexed: 06/16/2023]
Abstract
BACKGROUND AND OBJECTIVE Proteins are indispensable for the flow of the life of living organisms. Protein pairs in interaction exhibit more functional activities than individuals. These activities have been considered an essential measure in predicting their essentiality. Neighborhood approaches have been used frequently in the prediction of essentiality scores. All paired neighbors of the essential proteins are nominated for the suitable candidate seeds for prediction. Still now Jaccard's coefficient is limited to predicting functions, homologous groups, sequence analysis, etc. It really motivate us to predict essential proteins efficiently using different computational approaches. METHODS In our work, we proposed modified Jaccard's coefficient to predict essential proteins. We have proposed a novel methodology for predicting essential proteins using MAX-MIN strategies and modified Jaccard's coefficient approach. RESULTS The performance of our proposed methodology has been analyzed for Saccharomyces cerevisiae datasets with an accuracy of more than 80%. It has been observed that the proposed algorithm is outperforms with an accuracy of 0.78, 0.74, 0.79, and 0.862 for YDIP, YMIPS, YHQ, and YMBD datasets respectivly. CONCLUSIONS There are several computational approaches in the existing state-of-art model of essential protein prediction. It has been noted that our predicted methodology outperforms other existing models viz. different centralities, local interaction density combined with protein complexes, modified monkey algorithm and ortho_sim_loc methods.
Collapse
Affiliation(s)
- Anjan Kumar Payra
- Department of Computer Science & Engineering, Dr. Sudhir Chandra Sur Degree Engineering College, 540, Dum Dum Road, Near Dum Dum Jn. Station, Surermath, Kolkata 700074, India.
| | - Banani Saha
- Department of Computer Science & Engineering, University of Calcutta, Saltlake City Kolkata 700073, India
| | - Anupam Ghosh
- Department of Computer Science & Engineering, Netaji Subhash Engineering College, Techno City, Panchpota, Garia, Kolkata 700152, India.
| |
Collapse
|
5
|
Rule-Based Pruning and In Silico Identification of Essential Proteins in Yeast PPIN. Cells 2022; 11:cells11172648. [PMID: 36078056 PMCID: PMC9454873 DOI: 10.3390/cells11172648] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2022] [Revised: 08/18/2022] [Accepted: 08/22/2022] [Indexed: 11/25/2022] Open
Abstract
Proteins are vital for the significant cellular activities of living organisms. However, not all of them are essential. Identifying essential proteins through different biological experiments is relatively more laborious and time-consuming than the computational approaches used in recent times. However, practical implementation of conventional scientific methods sometimes becomes challenging due to poor performance impact in specific scenarios. Thus, more developed and efficient computational prediction models are required for essential protein identification. An effective methodology is proposed in this research, capable of predicting essential proteins in a refined yeast protein–protein interaction network (PPIN). The rule-based refinement is done using protein complex and local interaction density information derived from the neighborhood properties of proteins in the network. Identification and pruning of non-essential proteins are equally crucial here. In the initial phase, careful assessment is performed by applying node and edge weights to identify and discard the non-essential proteins from the interaction network. Three cut-off levels are considered for each node and edge weight for pruning the non-essential proteins. Once the PPIN has been filtered out, the second phase starts with two centralities-based approaches: (1) local interaction density (LID) and (2) local interaction density with protein complex (LIDC), which are successively implemented to identify the essential proteins in the yeast PPIN. Our proposed methodology achieves better performance in comparison to the existing state-of-the-art techniques.
Collapse
|
6
|
Li S, Zhang Z, Li X, Tan Y, Wang L, Chen Z. An iteration model for identifying essential proteins by combining comprehensive PPI network with biological information. BMC Bioinformatics 2021; 22:430. [PMID: 34496745 PMCID: PMC8425031 DOI: 10.1186/s12859-021-04300-7] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2020] [Accepted: 07/08/2021] [Indexed: 11/10/2022] Open
Abstract
Background Essential proteins have great impacts on cell survival and development, and played important roles in disease analysis and new drug design. However, since it is inefficient and costly to identify essential proteins by using biological experiments, then there is an urgent need for automated and accurate detection methods. In recent years, the recognition of essential proteins in protein interaction networks (PPI) has become a research hotspot, and many computational models for predicting essential proteins have been proposed successively. Results In order to achieve higher prediction performance, in this paper, a new prediction model called TGSO is proposed. In TGSO, a protein aggregation degree network is constructed first by adopting the node density measurement method for complex networks. And simultaneously, a protein co-expression interactive network is constructed by combining the gene expression information with the network connectivity, and a protein co-localization interaction network is constructed based on the subcellular localization data. And then, through integrating these three kinds of newly constructed networks, a comprehensive protein–protein interaction network will be obtained. Finally, based on the homology information, scores can be calculated out iteratively for different proteins, which can be utilized to estimate the importance of proteins effectively. Moreover, in order to evaluate the identification performance of TGSO, we have compared TGSO with 13 different latest competitive methods based on three kinds of yeast databases. And experimental results show that TGSO can achieve identification accuracies of 94%, 82% and 72% out of the top 1%, 5% and 10% candidate proteins respectively, which are to some degree superior to these state-of-the-art competitive models. Conclusions We constructed a comprehensive interactive network based on multi-source data to reduce the noise and errors in the initial PPI, and combined with iterative methods to improve the accuracy of necessary protein prediction, and means that TGSO may be conducive to the future development of essential protein recognition as well.
Collapse
Affiliation(s)
- Shiyuan Li
- College of Computer Engineering and Applied Mathematics, Changsha University, Changsha, 410022, China.,Hunan Province Key Laboratory of Industrial Internet Technology and Security, Changsha University, Changsha, 410022, China
| | - Zhen Zhang
- College of Electronic Information and Electrical Engineering, Changsha University, Changsha, 410022, China
| | - Xueyong Li
- College of Computer Engineering and Applied Mathematics, Changsha University, Changsha, 410022, China.,Hunan Province Key Laboratory of Industrial Internet Technology and Security, Changsha University, Changsha, 410022, China
| | - Yihong Tan
- College of Computer Engineering and Applied Mathematics, Changsha University, Changsha, 410022, China. .,Hunan Province Key Laboratory of Industrial Internet Technology and Security, Changsha University, Changsha, 410022, China.
| | - Lei Wang
- College of Computer Engineering and Applied Mathematics, Changsha University, Changsha, 410022, China.,Hunan Province Key Laboratory of Industrial Internet Technology and Security, Changsha University, Changsha, 410022, China
| | - Zhiping Chen
- College of Computer Engineering and Applied Mathematics, Changsha University, Changsha, 410022, China. .,Hunan Province Key Laboratory of Industrial Internet Technology and Security, Changsha University, Changsha, 410022, China.
| |
Collapse
|
7
|
Payra AK, Saha B, Ghosh A. Ortho_Sim_Loc: Essential protein prediction using orthology and priority-based similarity approach. Comput Biol Chem 2021; 92:107503. [PMID: 33962168 DOI: 10.1016/j.compbiolchem.2021.107503] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2020] [Revised: 04/02/2021] [Accepted: 04/21/2021] [Indexed: 10/21/2022]
Abstract
Proteins are the essential macro-molecules of living organism. But all proteins cannot be considered as essential in different relevant studies. Essentiality of a protein is thus computed by computation methods rather than biological experiments which in turn save both time and effort. Different computational approaches are already predicted to select essential proteins successfully with different biological significances by researchers. Most of the experimental approaches return higher false negative outcomes with respect to others. In order to retain the prediction accuracy level, a novel methodology "Ortho_Sim_Loc"has been proposed which is a combined approach of Orthology, Similarity (using clustering and priority based GO-Annotation) and Subcellular localization. Ortho_Sim_Loc can predict enriched functional set essential proteins. The predicted results are validated with other existing methods like different centrality measures, LIDC. The validation results exhibits better performance of Ortho_Sim_Loc in compare to other existing computational approaches.
Collapse
Affiliation(s)
- Anjan Kumar Payra
- Department of Computer Science & Engineering, Dr. Sudhir Chandra Sur Degree Engineering College, 540, Dum Dum Road, Near Dum Dum Jn. Station, Surermath, Kolkata, 700074, India.
| | - Banani Saha
- Department of Computer Science & Engineering, University of Calcutta, Saltlake City, Kolkata, 700073, India.
| | - Anupam Ghosh
- Department of Computer Science & Engineering, Netaji Subhash Engineering College, Techno City, Panchpota, Garia, Kolkata, 700152, India.
| |
Collapse
|
8
|
Wen QF, Liu S, Dong C, Guo HX, Gao YZ, Guo FB. Geptop 2.0: An Updated, More Precise, and Faster Geptop Server for Identification of Prokaryotic Essential Genes. Front Microbiol 2019; 10:1236. [PMID: 31214154 PMCID: PMC6558110 DOI: 10.3389/fmicb.2019.01236] [Citation(s) in RCA: 25] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2019] [Accepted: 05/17/2019] [Indexed: 12/16/2022] Open
Abstract
Geptop has performed effectively in the identification of prokaryotic essential genes since its first release in 2013. It estimates gene essentiality for prokaryotes based on orthology and phylogeny. Genome-scale essentiality data of more prokaryotic species are available, and the information has been collected into public essential gene repositories such as DEG and OGEE. A faster and more accurate toolkit is needed to meet the increasing prokaryotic genome data. We updated Geptop by supplementing more validated essentiality data into reference set (from 19 to 37 species), and introducing multi-process technology to accelerate the computing speed. Compared with Geptop 1.0 and other gene essentiality prediction models, Geptop 2.0 can generate more stable predictions and finish the computation in a shorter time. The software is available both as an online server and a downloadable standalone application. We hope that the improved Geptop 2.0 will facilitate researches in gene essentiality and the development of novel antibacterial drugs. The gene essentiality prediction tool is available at http://cefg.uestc.cn/geptop.
Collapse
Affiliation(s)
- Qing-Feng Wen
- School of Life Sciences and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Shuo Liu
- School of Life Sciences and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Chuan Dong
- School of Life Sciences and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Hai-Xia Guo
- School of Life Sciences and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Yi-Zhou Gao
- School of Life Sciences and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Feng-Biao Guo
- School of Life Sciences and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| |
Collapse
|