1
|
Saha S, Chatterjee P, Basu S, Nasipuri M. EPI-SF: essential protein identification in protein interaction networks using sequence features. PeerJ 2024; 12:e17010. [PMID: 38495766 PMCID: PMC10944162 DOI: 10.7717/peerj.17010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2023] [Accepted: 02/05/2024] [Indexed: 03/19/2024] Open
Abstract
Proteins are considered indispensable for facilitating an organism's viability, reproductive capabilities, and other fundamental physiological functions. Conventional biological assays are characterized by prolonged duration, extensive labor requirements, and financial expenses in order to identify essential proteins. Therefore, it is widely accepted that employing computational methods is the most expeditious and effective approach to successfully discerning essential proteins. Despite being a popular choice in machine learning (ML) applications, the deep learning (DL) method is not suggested for this specific research work based on sequence features due to the restricted availability of high-quality training sets of positive and negative samples. However, some DL works on limited availability of data are also executed at recent times which will be our future scope of work. Conventional ML techniques are thus utilized in this work due to their superior performance compared to DL methodologies. In consideration of the aforementioned, a technique called EPI-SF is proposed here, which employs ML to identify essential proteins within the protein-protein interaction network (PPIN). The protein sequence is the primary determinant of protein structure and function. So, initially, relevant protein sequence features are extracted from the proteins within the PPIN. These features are subsequently utilized as input for various machine learning models, including XGB Boost Classifier, AdaBoost Classifier, logistic regression (LR), support vector classification (SVM), Decision Tree model (DT), Random Forest model (RF), and Naïve Bayes model (NB). The objective is to detect the essential proteins within the PPIN. The primary investigation conducted on yeast examined the performance of various ML models for yeast PPIN. Among these models, the RF model technique had the highest level of effectiveness, as indicated by its precision, recall, F1-score, and AUC values of 0.703, 0.720, 0.711, and 0.745, respectively. It is also found to be better in performance when compared to the other state-of-arts based on traditional centrality like betweenness centrality (BC), closeness centrality (CC), etc. and deep learning methods as well like DeepEP, as emphasized in the result section. As a result of its favorable performance, EPI-SF is later employed for the prediction of novel essential proteins inside the human PPIN. Due to the tendency of viruses to selectively target essential proteins involved in the transmission of diseases within human PPIN, investigations are conducted to assess the probable involvement of these proteins in COVID-19 and other related severe diseases.
Collapse
Affiliation(s)
- Sovan Saha
- Department of Computer Science & Engineering (Artificial Intelligence & Machine Learning), Techno Main Salt Lake, Kolkata, West Bengal, India
| | - Piyali Chatterjee
- Department of Computer Science & Engineering, Netaji Subhash Engineering College, Kolkata, West Bengal, India
| | - Subhadip Basu
- Department of Computer Science & Engineering, Jadavpur University, Kolkata, West Bengal, India
| | - Mita Nasipuri
- Department of Computer Science & Engineering, Jadavpur University, Kolkata, West Bengal, India
| |
Collapse
|
2
|
Bandyopadhyay SS, Halder AK, Saha S, Chatterjee P, Nasipuri M, Basu S. Assessment of GO-Based Protein Interaction Affinities in the Large-Scale Human-Coronavirus Family Interactome. Vaccines (Basel) 2023; 11:549. [PMID: 36992133 PMCID: PMC10059867 DOI: 10.3390/vaccines11030549] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2023] [Revised: 02/19/2023] [Accepted: 02/23/2023] [Indexed: 03/03/2023] Open
Abstract
SARS-CoV-2 is a novel coronavirus that replicates itself via interacting with the host proteins. As a result, identifying virus and host protein-protein interactions could help researchers better understand the virus disease transmission behavior and identify possible COVID-19 drugs. The International Committee on Virus Taxonomy has determined that nCoV is genetically 89% compared to the SARS-CoV epidemic in 2003. This paper focuses on assessing the host-pathogen protein interaction affinity of the coronavirus family, having 44 different variants. In light of these considerations, a GO-semantic scoring function is provided based on Gene Ontology (GO) graphs for determining the binding affinity of any two proteins at the organism level. Based on the availability of the GO annotation of the proteins, 11 viral variants, viz., SARS-CoV-2, SARS, MERS, Bat coronavirus HKU3, Bat coronavirus Rp3/2004, Bat coronavirus HKU5, Murine coronavirus, Bovine coronavirus, Rat coronavirus, Bat coronavirus HKU4, Bat coronavirus 133/2005, are considered from 44 viral variants. The fuzzy scoring function of the entire host-pathogen network has been processed with ~180 million potential interactions generated from 19,281 host proteins and around 242 viral proteins. ~4.5 million potential level one host-pathogen interactions are computed based on the estimated interaction affinity threshold. The resulting host-pathogen interactome is also validated with state-of-the-art experimental networks. The study has also been extended further toward the drug-repurposing study by analyzing the FDA-listed COVID drugs.
Collapse
Affiliation(s)
- Soumyendu Sekhar Bandyopadhyay
- Department of Computer Science and Engineering, Jadavpur University, Kolkata 700032, India
- Department of Computer Science and Engineering, School of Engineering and Technology, Adamas University, Kolkata 700126, India
| | - Anup Kumar Halder
- Faculty of Mathematics and Information Sciences, Warsaw University of Technology, 00-662 Warsaw, Poland
| | - Sovan Saha
- Department of Computer Science and Engineering (Artificial Intelligence and Machine Learning), Techno Main Salt Lake, Sector V, Kolkata 700091, India
| | - Piyali Chatterjee
- Department of Computer Science and Engineering, Netaji Subhash Engineering College, Kolkata 700152, India
| | - Mita Nasipuri
- Department of Computer Science and Engineering, Jadavpur University, Kolkata 700032, India
| | - Subhadip Basu
- Department of Computer Science and Engineering, Jadavpur University, Kolkata 700032, India
| |
Collapse
|
3
|
Payra AK, Saha B, Ghosh A. MM-CCNB: Essential protein prediction using MAX-MIN strategies and compartment of common neighboring approach. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2023; 228:107247. [PMID: 36427433 DOI: 10.1016/j.cmpb.2022.107247] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/27/2022] [Revised: 10/16/2022] [Accepted: 11/14/2022] [Indexed: 06/16/2023]
Abstract
BACKGROUND AND OBJECTIVE Proteins are indispensable for the flow of the life of living organisms. Protein pairs in interaction exhibit more functional activities than individuals. These activities have been considered an essential measure in predicting their essentiality. Neighborhood approaches have been used frequently in the prediction of essentiality scores. All paired neighbors of the essential proteins are nominated for the suitable candidate seeds for prediction. Still now Jaccard's coefficient is limited to predicting functions, homologous groups, sequence analysis, etc. It really motivate us to predict essential proteins efficiently using different computational approaches. METHODS In our work, we proposed modified Jaccard's coefficient to predict essential proteins. We have proposed a novel methodology for predicting essential proteins using MAX-MIN strategies and modified Jaccard's coefficient approach. RESULTS The performance of our proposed methodology has been analyzed for Saccharomyces cerevisiae datasets with an accuracy of more than 80%. It has been observed that the proposed algorithm is outperforms with an accuracy of 0.78, 0.74, 0.79, and 0.862 for YDIP, YMIPS, YHQ, and YMBD datasets respectivly. CONCLUSIONS There are several computational approaches in the existing state-of-art model of essential protein prediction. It has been noted that our predicted methodology outperforms other existing models viz. different centralities, local interaction density combined with protein complexes, modified monkey algorithm and ortho_sim_loc methods.
Collapse
Affiliation(s)
- Anjan Kumar Payra
- Department of Computer Science & Engineering, Dr. Sudhir Chandra Sur Degree Engineering College, 540, Dum Dum Road, Near Dum Dum Jn. Station, Surermath, Kolkata 700074, India.
| | - Banani Saha
- Department of Computer Science & Engineering, University of Calcutta, Saltlake City Kolkata 700073, India
| | - Anupam Ghosh
- Department of Computer Science & Engineering, Netaji Subhash Engineering College, Techno City, Panchpota, Garia, Kolkata 700152, India.
| |
Collapse
|
4
|
Sengupta K, Saha S, Halder AK, Chatterjee P, Nasipuri M, Basu S, Plewczynski D. PFP-GO: Integrating protein sequence, domain and protein-protein interaction information for protein function prediction using ranked GO terms. Front Genet 2022; 13:969915. [PMID: 36246645 PMCID: PMC9556876 DOI: 10.3389/fgene.2022.969915] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2022] [Accepted: 08/31/2022] [Indexed: 11/13/2022] Open
Abstract
Protein function prediction is gradually emerging as an essential field in biological and computational studies. Though the latter has clinched a significant footprint, it has been observed that the application of computational information gathered from multiple sources has more significant influence than the one derived from a single source. Considering this fact, a methodology, PFP-GO, is proposed where heterogeneous sources like Protein Sequence, Protein Domain, and Protein-Protein Interaction Network have been processed separately for ranking each individual functional GO term. Based on this ranking, GO terms are propagated to the target proteins. While Protein sequence enriches the sequence-based information, Protein Domain and Protein-Protein Interaction Networks embed structural/functional and topological based information, respectively, during the phase of GO ranking. Performance analysis of PFP-GO is also based on Precision, Recall, and F-Score. The same was found to perform reasonably better when compared to the other existing state-of-art. PFP-GO has achieved an overall Precision, Recall, and F-Score of 0.67, 0.58, and 0.62, respectively. Furthermore, we check some of the top-ranked GO terms predicted by PFP-GO through multilayer network propagation that affect the 3D structure of the genome. The complete source code of PFP-GO is freely available at https://sites.google.com/view/pfp-go/.
Collapse
Affiliation(s)
- Kaustav Sengupta
- Laboratory of Functional and Structural Genomics, Center of New Technologies, University of Warsaw, Warsaw, Poland
- Department of Computer Science and Engineering, Jadavpur University, Kolkata, India
- Laboratory of Bioinformatics and Computational Genomics, Faculty of Mathematics and Information Science, Warsaw University of Technology, Warsaw, Poland
| | - Sovan Saha
- Department of Computer Science and Engineering, Institute of Engineering and Management, Kolkata, West Bengal, India
| | - Anup Kumar Halder
- Laboratory of Functional and Structural Genomics, Center of New Technologies, University of Warsaw, Warsaw, Poland
- Laboratory of Bioinformatics and Computational Genomics, Faculty of Mathematics and Information Science, Warsaw University of Technology, Warsaw, Poland
| | - Piyali Chatterjee
- Department of Computer Science and Engineering, Netaji Subhash Engineering College, Kolkata, India
| | - Mita Nasipuri
- Department of Computer Science and Engineering, Jadavpur University, Kolkata, India
| | - Subhadip Basu
- Department of Computer Science and Engineering, Jadavpur University, Kolkata, India
- *Correspondence: Subhadip Basu, Dariusz Plewczynski,
| | - Dariusz Plewczynski
- Laboratory of Functional and Structural Genomics, Center of New Technologies, University of Warsaw, Warsaw, Poland
- Laboratory of Bioinformatics and Computational Genomics, Faculty of Mathematics and Information Science, Warsaw University of Technology, Warsaw, Poland
- *Correspondence: Subhadip Basu, Dariusz Plewczynski,
| |
Collapse
|
5
|
Rule-Based Pruning and In Silico Identification of Essential Proteins in Yeast PPIN. Cells 2022; 11:cells11172648. [PMID: 36078056 PMCID: PMC9454873 DOI: 10.3390/cells11172648] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2022] [Revised: 08/18/2022] [Accepted: 08/22/2022] [Indexed: 11/25/2022] Open
Abstract
Proteins are vital for the significant cellular activities of living organisms. However, not all of them are essential. Identifying essential proteins through different biological experiments is relatively more laborious and time-consuming than the computational approaches used in recent times. However, practical implementation of conventional scientific methods sometimes becomes challenging due to poor performance impact in specific scenarios. Thus, more developed and efficient computational prediction models are required for essential protein identification. An effective methodology is proposed in this research, capable of predicting essential proteins in a refined yeast protein–protein interaction network (PPIN). The rule-based refinement is done using protein complex and local interaction density information derived from the neighborhood properties of proteins in the network. Identification and pruning of non-essential proteins are equally crucial here. In the initial phase, careful assessment is performed by applying node and edge weights to identify and discard the non-essential proteins from the interaction network. Three cut-off levels are considered for each node and edge weight for pruning the non-essential proteins. Once the PPIN has been filtered out, the second phase starts with two centralities-based approaches: (1) local interaction density (LID) and (2) local interaction density with protein complex (LIDC), which are successively implemented to identify the essential proteins in the yeast PPIN. Our proposed methodology achieves better performance in comparison to the existing state-of-the-art techniques.
Collapse
|
6
|
Saha S, Halder AK, Bandyopadhyay SS, Chatterjee P, Nasipuri M, Basu S. Computational modeling of human-nCoV protein-protein interaction network. Methods 2022; 203:488-497. [PMID: 34902553 PMCID: PMC8662836 DOI: 10.1016/j.ymeth.2021.12.003] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2021] [Revised: 11/30/2021] [Accepted: 12/06/2021] [Indexed: 01/25/2023] Open
Abstract
Novel coronavirus(SARS-CoV2) replicates the host cell's genome by interacting with the host proteins. Due to this fact, the identification of virus and host protein-protein interactions could be beneficial in understanding the disease transmission behavior of the virus as well as in potential COVID-19 drug identification. International Committee on Taxonomy of Viruses (ICTV) has declared that nCoV is highly genetically similar to the SARS-CoV epidemic in 2003 (∼89% similarity). With this hypothesis, the present work focuses on developing a computational model for the nCoV-Human protein interaction network, using the experimentally validated SARS-CoV-Human protein interactions. Initially, level-1 and level-2 human spreader proteins are identified in the SARS-CoV-Human interaction network, using Susceptible-Infected-Susceptible (SIS) model. These proteins are considered potential human targets for nCoV bait proteins. A gene-ontology-based fuzzy affinity function has been used to construct the nCoV-Human protein interaction network at a ∼99.98% specificity threshold. This also identifies 37 level-1 human spreaders for COVID-19 in the human protein-interaction network. 2474 level-2 human spreaders are subsequently identified using the SIS model. The derived host-pathogen interaction network is finally validated using six potential FDA-listed drugs for COVID-19 with significant overlap between the known drug target proteins and the identified spreader proteins.
Collapse
Affiliation(s)
- Sovan Saha
- Department of Computer Science & Engineering, Institute of Engineering & Management, Salt Lake Electronics Complex, Kolkata 700091, West Bengal, India
| | - Anup Kumar Halder
- Department of Computer Science & Engineering, University of Engineering & Management, Kolkata 700156, West Bengal, India
| | - Soumyendu Sekhar Bandyopadhyay
- Department of Computer Science & Engineering, School of Engineering and Technology, Adamas University, Kolkata 700126, West Bengal, India; Department of Computer Science & Engineering, Jadavpur University, Jadavpur, Kolkata, West Bengal 700032, India
| | - Piyali Chatterjee
- Department of Computer Science & Engineering, Netaji Subhash Engineering College, Garia, Kolkata, West Bengal 700152, India
| | - Mita Nasipuri
- Department of Computer Science & Engineering, Jadavpur University, Jadavpur, Kolkata, West Bengal 700032, India
| | - Subhadip Basu
- Department of Computer Science & Engineering, Jadavpur University, Jadavpur, Kolkata, West Bengal 700032, India.
| |
Collapse
|
7
|
Lu Y, Li Q, Li T. PPA-GCN: A Efficient GCN Framework for Prokaryotic Pathways Assignment. Front Genet 2022; 13:839453. [PMID: 35444686 PMCID: PMC9013948 DOI: 10.3389/fgene.2022.839453] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2021] [Accepted: 03/17/2022] [Indexed: 11/17/2022] Open
Abstract
With the rapid development of sequencing technology, completed genomes of microbes have explosively emerged. For a newly sequenced prokaryotic genome, gene functional annotation and metabolism pathway assignment are important foundations for all subsequent research work. However, the assignment rate for gene metabolism pathways is lower than 48% on the whole. It is even lower for newly sequenced prokaryotic genomes, which has become a bottleneck for subsequent research. Thus, the development of a high-precision metabolic pathway assignment framework is urgently needed. Here, we developed PPA-GCN, a prokaryotic pathways assignment framework based on graph convolutional network, to assist functional pathway assignments using KEGG information and genomic characteristics. In the framework, genomic gene synteny information was used to construct a network, and ideas of self-supervised learning were inspired to enhance the framework’s learning ability. Our framework is applicable to the genera of microbe with sufficient whole genome sequences. To evaluate the assignment rate, genomes from three different genera (Flavobacterium (65 genomes) and Pseudomonas (100 genomes), Staphylococcus (500 genomes)) were used. The initial functional pathway assignment rate of the three test genera were 27.7% (Flavobacterium), 49.5% (Pseudomonas) and 30.1% (Staphylococcus). PPA-GCN achieved excellence performance of 84.8% (Flavobacterium), 77.0% (Pseudomonas) and 71.0% (Staphylococcus) for assignment rate. At the same time, PPA-GCN was proved to have strong fault tolerance. The framework provides novel insights into assignment for metabolism pathways and is likely to inform future deep learning applications for interpreting functional annotations and extends to all prokaryotic genera with sufficient genomes.
Collapse
Affiliation(s)
- Yuntao Lu
- Key Laboratory of Freshwater Ecology and Biotechnology, Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan, China.,College of Advanced Agricultural Sciences, University of Chinese Academy of Sciences, Beijing, China
| | - Qi Li
- Key Laboratory of Freshwater Ecology and Biotechnology, Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan, China
| | - Tao Li
- Key Laboratory of Freshwater Ecology and Biotechnology, Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan, China
| |
Collapse
|
8
|
Meng H, Ruan J, Chen Y, Yan Z, Shi K, Li X, Yang P, Meng F. Investigation of Specific Proteins Related to Different Types of Coronary Atherosclerosis. Front Cardiovasc Med 2021; 8:758035. [PMID: 34746269 PMCID: PMC8569131 DOI: 10.3389/fcvm.2021.758035] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2021] [Accepted: 09/27/2021] [Indexed: 12/28/2022] Open
Abstract
Objective: Coronary heart disease (CHD) is a complex disease caused by multifaceted interaction between genetic and environmental factors, which makes identification of the most likely disease candidate proteins and their associated risk markers a big challenge. Atherosclerosis is presented by a broad spectrum of heart diseases, including stable coronary artery disease (SCAD) and acute myocardial infarction (AMI), which is the progressive stage of SCAD. As such, the correct and prompt diagnosis of atherosclerosis turns into imperative for precise and prompt disease diagnosis, treatment and prognosis. Methods: The current work aims to look for specific protein markers for differential diagnosis of coronary atherosclerosis. Thirty male patients between 45 and 55 years diagnosed with atherosclerosis were analyzed by tandem mass tag (TMT) mass spectrometry. The study excluded those who were additionally diagnosed with hypertension and type 1 and 2 diabetes. The Mufuzz analysis was applied to select target proteins for precise and prompt diagnosis of atherosclerosis, most of which were most related to high lipid metabolism. The parallel reaction monitoring (PRM) was used to verify the selected target proteins. Finally, The receiver operating characteristic curve (ROC) was calculated by a random forest experiment. Results: One thousand one hundred and forty seven proteins were identified in the TMT mass spectrometry, 907 of which were quantifiable. In the PRM study, six proteins related to lipid metabolism pathway were selected for verification and they were ALB, SHBG, APOC2, APOC3, APOC4, SAA4. Conclusion: Through the detected specific changes in these six proteins, our results provide accuracy in atherosclerosis patients' diagnosis, especially in cases with varying types of the disease.
Collapse
Affiliation(s)
- Heyu Meng
- Jilin Provincial Precision Medicine Key Laboratory for Cardiovascular Genetic Diagnosis (Jilin Provincial Engineering Laboratory for Endothelial Function and Genetic Diagnosis of Cardiovascular Disease, Jilin Provincial Molecular Biology Research Center for Precision Medicine of Major Cardiovascular Disease, Jilin Provincial Cardiovascular Research Institute), Department of Cardiology, China-Japan Union Hospital of Jilin University, Changchun, China
| | - Jianjun Ruan
- Jilin Provincial Precision Medicine Key Laboratory for Cardiovascular Genetic Diagnosis (Jilin Provincial Engineering Laboratory for Endothelial Function and Genetic Diagnosis of Cardiovascular Disease, Jilin Provincial Molecular Biology Research Center for Precision Medicine of Major Cardiovascular Disease, Jilin Provincial Cardiovascular Research Institute), Department of Cardiology, China-Japan Union Hospital of Jilin University, Changchun, China
| | - Yanqiu Chen
- Jilin Provincial Precision Medicine Key Laboratory for Cardiovascular Genetic Diagnosis (Jilin Provincial Engineering Laboratory for Endothelial Function and Genetic Diagnosis of Cardiovascular Disease, Jilin Provincial Molecular Biology Research Center for Precision Medicine of Major Cardiovascular Disease, Jilin Provincial Cardiovascular Research Institute), Department of Cardiology, China-Japan Union Hospital of Jilin University, Changchun, China
| | - Zhaohan Yan
- Jilin Provincial Precision Medicine Key Laboratory for Cardiovascular Genetic Diagnosis (Jilin Provincial Engineering Laboratory for Endothelial Function and Genetic Diagnosis of Cardiovascular Disease, Jilin Provincial Molecular Biology Research Center for Precision Medicine of Major Cardiovascular Disease, Jilin Provincial Cardiovascular Research Institute), Department of Cardiology, China-Japan Union Hospital of Jilin University, Changchun, China
| | - Kaiyao Shi
- Jilin Provincial Precision Medicine Key Laboratory for Cardiovascular Genetic Diagnosis (Jilin Provincial Engineering Laboratory for Endothelial Function and Genetic Diagnosis of Cardiovascular Disease, Jilin Provincial Molecular Biology Research Center for Precision Medicine of Major Cardiovascular Disease, Jilin Provincial Cardiovascular Research Institute), Department of Cardiology, China-Japan Union Hospital of Jilin University, Changchun, China
| | - Xiangdong Li
- Jilin Provincial Precision Medicine Key Laboratory for Cardiovascular Genetic Diagnosis (Jilin Provincial Engineering Laboratory for Endothelial Function and Genetic Diagnosis of Cardiovascular Disease, Jilin Provincial Molecular Biology Research Center for Precision Medicine of Major Cardiovascular Disease, Jilin Provincial Cardiovascular Research Institute), Department of Cardiology, China-Japan Union Hospital of Jilin University, Changchun, China
| | - Ping Yang
- Jilin Provincial Precision Medicine Key Laboratory for Cardiovascular Genetic Diagnosis (Jilin Provincial Engineering Laboratory for Endothelial Function and Genetic Diagnosis of Cardiovascular Disease, Jilin Provincial Molecular Biology Research Center for Precision Medicine of Major Cardiovascular Disease, Jilin Provincial Cardiovascular Research Institute), Department of Cardiology, China-Japan Union Hospital of Jilin University, Changchun, China
| | - Fanbo Meng
- Jilin Provincial Precision Medicine Key Laboratory for Cardiovascular Genetic Diagnosis (Jilin Provincial Engineering Laboratory for Endothelial Function and Genetic Diagnosis of Cardiovascular Disease, Jilin Provincial Molecular Biology Research Center for Precision Medicine of Major Cardiovascular Disease, Jilin Provincial Cardiovascular Research Institute), Department of Cardiology, China-Japan Union Hospital of Jilin University, Changchun, China
| |
Collapse
|
9
|
Saha S, Chatterjee P, Nasipuri M, Basu S. Detection of spreader nodes in human-SARS-CoV protein-protein interaction network. PeerJ 2021; 9:e12117. [PMID: 34567845 PMCID: PMC8428263 DOI: 10.7717/peerj.12117] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2021] [Accepted: 08/15/2021] [Indexed: 12/20/2022] Open
Abstract
The entire world is witnessing the coronavirus pandemic (COVID-19), caused by a novel coronavirus (n-CoV) generally distinguished as Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2). SARS-CoV-2 promotes fatal chronic respiratory disease followed by multiple organ failure, ultimately putting an end to human life. International Committee on Taxonomy of Viruses (ICTV) has reached a consensus that SARS-CoV-2 is highly genetically similar (up to 89%) to the Severe Acute Respiratory Syndrome Coronavirus (SARS-CoV), which had an outbreak in 2003. With this hypothesis, current work focuses on identifying the spreader nodes in the SARS-CoV-human protein-protein interaction network (PPIN) to find possible lineage with the disease propagation pattern of the current pandemic. Various PPIN characteristics like edge ratio, neighborhood density, and node weight have been explored for defining a new feature spreadability index by which spreader proteins and protein-protein interaction (in the form of network edges) are identified. Top spreader nodes with a high spreadability index have been validated by Susceptible-Infected-Susceptible (SIS) disease model, first using a synthetic PPIN followed by a SARS-CoV-human PPIN. The ranked edges highlight the path of entire disease propagation from SARS-CoV to human PPIN (up to level-2 neighborhood). The developed network attribute, spreadability index, and the generated SIS model, compared with the other network centrality-based methodologies, perform better than the existing state-of-art.
Collapse
Affiliation(s)
- Sovan Saha
- Computer Science and Engineering, Institute of Engineering and Management, Kolkata, West Bengal, India
| | - Piyali Chatterjee
- Computer Science and Engineering, Netaji Subhash Engineering College, Kolkata, West Bengal, India
| | - Mita Nasipuri
- Computer Science and Engineering, Jadavpur University, Kolkata, West Bengal, India
| | - Subhadip Basu
- Computer Science and Engineering, Jadavpur University, Kolkata, West Bengal, India
| |
Collapse
|
10
|
Xiang Z, Gong W, Li Z, Yang X, Wang J, Wang H. Predicting Protein-Protein Interactions via Gated Graph Attention Signed Network. Biomolecules 2021; 11:799. [PMID: 34071437 PMCID: PMC8228288 DOI: 10.3390/biom11060799] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2021] [Revised: 05/24/2021] [Accepted: 05/26/2021] [Indexed: 01/01/2023] Open
Abstract
Protein-protein interactions (PPIs) play a key role in signal transduction and pharmacogenomics, and hence, accurate PPI prediction is crucial. Graph structures have received increasing attention owing to their outstanding performance in machine learning. In practice, PPIs can be expressed as a signed network (i.e., graph structure), wherein the nodes in the network represent proteins, and edges represent the interactions (positive or negative effects) of protein nodes. PPI predictions can be realized by predicting the links of the signed network; therefore, the use of gated graph attention for signed networks (SN-GGAT) is proposed herein. First, the concept of graph attention network (GAT) is applied to signed networks, in which "attention" represents the weight of neighbor nodes, and GAT updates the node features through the weighted aggregation of neighbor nodes. Then, the gating mechanism is defined and combined with the balance theory to obtain the high-order relations of protein nodes to improve the attention effect, making the attention mechanism follow the principle of "low-order high attention, high-order low attention, different signs opposite". PPIs are subsequently predicted on the Saccharomyces cerevisiae core dataset and the Human dataset. The test results demonstrate that the proposed method exhibits strong competitiveness.
Collapse
Affiliation(s)
- Zhijie Xiang
- School of Information Science and Engineering, Shandong Normal University, Jinan 250014, China; (Z.X.); (W.G.); (Z.L.); (X.Y.); (J.W.)
| | - Weijia Gong
- School of Information Science and Engineering, Shandong Normal University, Jinan 250014, China; (Z.X.); (W.G.); (Z.L.); (X.Y.); (J.W.)
| | - Zehui Li
- School of Information Science and Engineering, Shandong Normal University, Jinan 250014, China; (Z.X.); (W.G.); (Z.L.); (X.Y.); (J.W.)
| | - Xue Yang
- School of Information Science and Engineering, Shandong Normal University, Jinan 250014, China; (Z.X.); (W.G.); (Z.L.); (X.Y.); (J.W.)
| | - Jihua Wang
- School of Information Science and Engineering, Shandong Normal University, Jinan 250014, China; (Z.X.); (W.G.); (Z.L.); (X.Y.); (J.W.)
| | - Hong Wang
- School of Information Science and Engineering, Shandong Normal University, Jinan 250014, China; (Z.X.); (W.G.); (Z.L.); (X.Y.); (J.W.)
- Shandong Provincial Key Laboratory for Distributed Computer Software Novel Technology, Shandong Normal University, Jinan 250014, China
| |
Collapse
|
11
|
smORFunction: a tool for predicting functions of small open reading frames and microproteins. BMC Bioinformatics 2020; 21:455. [PMID: 33054771 PMCID: PMC7559452 DOI: 10.1186/s12859-020-03805-x] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2020] [Accepted: 10/08/2020] [Indexed: 12/14/2022] Open
Abstract
Background Small open reading frame (smORF) is open reading frame with a length of less than 100 codons. Microproteins, translated from smORFs, have been found to participate in a variety of biological processes such as muscle formation and contraction, cell proliferation, and immune activation. Although previous studies have collected and annotated a large abundance of smORFs, functions of the vast majority of smORFs are still unknown. It is thus increasingly important to develop computational methods to annotate the functions of these smORFs. Results In this study, we collected 617,462 unique smORFs from three studies. The expression of smORF RNAs was estimated by reannotated microarray probes. Using a speed-optimized correlation algorism, the functions of smORFs were predicted by their correlated genes with known functional annotations. After applying our method to 5 known microproteins from literatures, our method successfully predicted their functions. Further validation from the UniProt database showed that at least one function of 202 out of 270 microproteins was predicted. Conclusions We developed a method, smORFunction, to provide function predictions of smORFs/microproteins in at most 265 models generated from 173 datasets, including 48 tissues/cells, 82 diseases (and normal). The tool can be available at https://www.cuilab.cn/smorfunction.
Collapse
|
12
|
Khatun MS, Shoombuatong W, Hasan MM, Kurata H. Evolution of Sequence-based Bioinformatics Tools for Protein-protein Interaction Prediction. Curr Genomics 2020; 21:454-463. [PMID: 33093807 PMCID: PMC7536797 DOI: 10.2174/1389202921999200625103936] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2020] [Revised: 03/19/2020] [Accepted: 05/27/2020] [Indexed: 12/22/2022] Open
Abstract
Protein-protein interactions (PPIs) are the physical connections between two or more proteins via electrostatic forces or hydrophobic effects. Identification of the PPIs is pivotal, which contributes to many biological processes including protein function, disease incidence, and therapy design. The experimental identification of PPIs via high-throughput technology is time-consuming and expensive. Bioinformatics approaches are expected to solve such restrictions. In this review, our main goal is to provide an inclusive view of the existing sequence-based computational prediction of PPIs. Initially, we briefly introduce the currently available PPI databases and then review the state-of-the-art bioinformatics approaches, working principles, and their performances. Finally, we discuss the caveats and future perspective of the next generation algorithms for the prediction of PPIs.
Collapse
Affiliation(s)
| | | | - Md. Mehedi Hasan
- Address correspondence to these authors at the Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka 820-8502, Japan; Japan Society for the Promotion of Science, 5-3-1 Kojimachi, Chiyoda-ku, Tokyo 102-0083, Japan; Tel: +81-948-297-828; E-mail: and Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka 820-8502, Japan; Biomedical Informatics R&D Center, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka 820-8502, Japan; Tel: +81-948-297-828; E-mail:
| | - Hiroyuki Kurata
- Address correspondence to these authors at the Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka 820-8502, Japan; Japan Society for the Promotion of Science, 5-3-1 Kojimachi, Chiyoda-ku, Tokyo 102-0083, Japan; Tel: +81-948-297-828; E-mail: and Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka 820-8502, Japan; Biomedical Informatics R&D Center, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka 820-8502, Japan; Tel: +81-948-297-828; E-mail:
| |
Collapse
|
13
|
Saha S, Prasad A, Chatterjee P, Basu S, Nasipuri M. Protein function prediction from dynamic protein interaction network using gene expression data. J Bioinform Comput Biol 2020; 17:1950025. [PMID: 31617461 DOI: 10.1142/s0219720019500252] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Computational prediction of functional annotation of proteins is an uphill task. There is an ever increasing gap between functional characterization of protein sequences and deluge of protein sequences generated by large-scale sequencing projects. The dynamic nature of protein interactions is frequently observed which is mostly influenced by any new change of state or change in stimuli. Functional characterization of proteins can be inferred from their interactions with each other, which is dynamic in nature. In this work, we have used a dynamic protein-protein interaction network (PPIN), time course gene expression data and protein sequence information for prediction of functional annotation of proteins. During progression of a particular function, it has also been observed that not all the proteins are active at all time points. For unannotated active proteins, our proposed methodology explores the dynamic PPIN consisting of level-1 and level-2 neighboring proteins at different time points, filtered by Damerau-Levenshtein edit distance to estimate the similarity between two protein sequences and coefficient variation methods to assess the strength of an edge in a network. Finally, from the filtered dynamic PPIN, at each time point, functional annotations of the level-2 proteins are assigned to the unknown and unannotated active proteins through the level-1 neighbor, following a bottom-up strategy. Our proposed methodology achieves an average precision, recall and F-Score of 0.59, 0.76 and 0.61 respectively, which is significantly higher than the reported state-of-the-art methods.
Collapse
Affiliation(s)
- Sovan Saha
- Department of Computer Science & Engineering, Dr. Sudhir Chandra Sur Degree Engineering College, 540, Dum Dum Road, Near Dum Dum Jn. Station, Surermath, Kolkata 700074, India
| | - Abhimanyu Prasad
- Department of Computer Science & Engineering, Dr. Sudhir Chandra Sur Degree Engineering College, 540, Dum Dum Road, Near Dum Dum Jn. Station, Surermath, Kolkata 700074, India
| | - Piyali Chatterjee
- Department of Computer Science & Engineering, Netaji Subhash Engineering College, Techno City, Panchpota, Garia, Kolkata 700152, India
| | - Subhadip Basu
- Department of Computer Science & Engineering, Jadavpur University, 188, Raja S.C. Mallick Road, Kolkata 700032, India
| | - Mita Nasipuri
- Department of Computer Science & Engineering, Netaji Subhash Engineering College, Techno City, Panchpota, Garia, Kolkata 700152, India
| |
Collapse
|