1
|
Tahir ul Qamar M, Noor F, Guo YX, Zhu XT, Chen LL. Deep-HPI-pred: An R-Shiny applet for network-based classification and prediction of Host-Pathogen protein-protein interactions. Comput Struct Biotechnol J 2024; 23:316-329. [PMID: 38192372 PMCID: PMC10772389 DOI: 10.1016/j.csbj.2023.12.010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2023] [Revised: 12/11/2023] [Accepted: 12/12/2023] [Indexed: 01/10/2024] Open
Abstract
Host-pathogen interactions (HPIs) are vital in numerous biological activities and are intrinsically linked to the onset and progression of infectious diseases. HPIs are pivotal in the entire lifecycle of diseases: from the onset of pathogen introduction, navigating through the mechanisms that bypass host cellular defenses, to its subsequent proliferation inside the host. At the heart of these stages lies the synergy of proteins from both the host and the pathogen. By understanding these interlinking protein dynamics, we can gain crucial insights into how diseases progress and pave the way for stronger plant defenses and the swift formulation of countermeasures. In the framework of current study, we developed a web-based R/Shiny app, Deep-HPI-pred, that uses network-driven feature learning method to predict the yet unmapped interactions between pathogen and host proteins. Leveraging citrus and CLas bacteria training datasets as case study, we spotlight the effectiveness of Deep-HPI-pred in discerning Protein-protein interaction (PPIs) between them. Deep-HPI-pred use Multilayer Perceptron (MLP) models for HPI prediction, which is based on a comprehensive evaluation of topological features and neural network architectures. When subjected to independent validation datasets, the predicted models consistently surpassed a Matthews correlation coefficient (MCC) of 0.80 in host-pathogen interactions. Remarkably, the use of Eigenvector Centrality as the leading topological feature further enhanced this performance. Further, Deep-HPI-pred also offers relevant gene ontology (GO) term information for each pathogen and host protein within the system. This protein annotation data contributes an additional layer to our understanding of the intricate dynamics within host-pathogen interactions. In the additional benchmarking studies, the Deep-HPI-pred model has proven its robustness by consistently delivering reliable results across different host-pathogen systems, including plant-pathogens (accuracy of 98.4% and 97.9%), human-virus (accuracy of 94.3%), and animal-bacteria (accuracy of 96.6%) interactomes. These results not only demonstrate the model's versatility but also pave the way for gaining comprehensive insights into the molecular underpinnings of complex host-pathogen interactions. Taken together, the Deep-HPI-pred applet offers a unified web service for both identifying and illustrating interaction networks. Deep-HPI-pred applet is freely accessible at its homepage: https://cbi.gxu.edu.cn/shiny-apps/Deep-HPI-pred/ and at github: https://github.com/tahirulqamar/Deep-HPI-pred.
Collapse
Affiliation(s)
- Muhammad Tahir ul Qamar
- State Key Laboratory for Conservation and Utilization of Subtropical Agro-bioresources, College of Life Science and Technology, Guangxi University, Nanning 530004, China
| | - Fatima Noor
- Integrative Omics and Molecular Modeling Laboratory, Department of Bioinformatics and Biotechnology, Government College University Faisalabad (GCUF), Faisalabad 38000, Pakistan
| | - Yi-Xiong Guo
- National Key Laboratory of Crop Genetic Improvement, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Xi-Tong Zhu
- State Key Laboratory for Conservation and Utilization of Subtropical Agro-bioresources, College of Life Science and Technology, Guangxi University, Nanning 530004, China
| | - Ling-Ling Chen
- State Key Laboratory for Conservation and Utilization of Subtropical Agro-bioresources, College of Life Science and Technology, Guangxi University, Nanning 530004, China
| |
Collapse
|
2
|
Tang T, Li T, Li W, Cao X, Liu Y, Zeng X. Anti-symmetric framework for balanced learning of protein-protein interactions. Bioinformatics 2024; 40:btae603. [PMID: 39404784 PMCID: PMC11513017 DOI: 10.1093/bioinformatics/btae603] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2024] [Revised: 09/13/2024] [Accepted: 10/12/2024] [Indexed: 10/29/2024] Open
Abstract
MOTIVATION Protein-protein interactions (PPIs) are essential for the regulation and facilitation of virtually all biological processes. Computational tools, particularly those based on deep learning, are preferred for the efficient prediction of PPIs. Despite recent progress, two challenges remain unresolved: (i) the imbalanced nature of PPI characteristics is often ignored and (ii) there exists a high computational cost associated with capturing long-range dependencies within protein data, typically exhibiting quadratic complexity relative to the length of the protein sequence. RESULT Here, we propose an anti-symmetric graph learning model, BaPPI, for the balanced prediction of PPIs and extrapolation of the involved patterns in PPI network. In BaPPI, the contextualized information of protein data is efficiently handled by an attention-free mechanism formed by recurrent convolution operator. The anti-symmetric graph convolutional network is employed to model the uneven distribution within PPI networks, aiming to learn a more robust and balanced representation of the relationships between proteins. Ultimately, the model is updated using asymmetric loss. The experimental results on classical baseline datasets demonstrate that BaPPI outperforms four state-of-the-art PPI prediction methods. In terms of Micro-F1, BaPPI exceeds the second-best method by 6.5% on SHS27K and 5.3% on SHS148K. Further analysis of the generalization ability and patterns of predicted PPIs also demonstrates our model's generalizability and robustness to the imbalanced nature of PPI datasets. AVAILABILITY AND IMPLEMENTATION The source code of this work is publicly available at https://github.com/ttan6729/BaPPI.
Collapse
Affiliation(s)
- Tao Tang
- School of Modern Posts, Nanjing University of Posts and Telecommunications, Nanjing 210023, China
| | - Tianyang Li
- School of Modern Posts, Nanjing University of Posts and Telecommunications, Nanjing 210023, China
| | - Weizhuo Li
- School of Modern Posts, Nanjing University of Posts and Telecommunications, Nanjing 210023, China
| | - Xiaofeng Cao
- School of Artificial Intelligence, Jilin University, Changchun 130012, China
| | - Yuansheng Liu
- College of Computer Science and Electronic Engineering, Hunan University, Changsha 410086, China
| | - Xiangxiang Zeng
- College of Computer Science and Electronic Engineering, Hunan University, Changsha 410086, China
| |
Collapse
|
3
|
Akbarzadeh S, Coşkun Ö, Günçer B. Studying protein-protein interactions: Latest and most popular approaches. J Struct Biol 2024; 216:108118. [PMID: 39214321 DOI: 10.1016/j.jsb.2024.108118] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2024] [Revised: 08/20/2024] [Accepted: 08/23/2024] [Indexed: 09/04/2024]
Abstract
PPIs, or protein-protein interactions, are essential for many biological processes. According to the findings, abnormal PPIs have been linked to several diseases, such as cancer and infectious and neurological disorders. Consequently, focusing on PPIs is a path toward disease treatment and a crucial tool for producing novel medications. Many methods exist to investigate PPIs, including low- and high-throughput studies. Since many PPIs have been discovered using in vitro and in vivo experimental approaches, the use of computational methods to predict PPIs has grown due to the expanding scale of PPI data and the intrinsic complexity of interacting mechanisms. Recognizing PPI networks offers a systematic means of predicting protein functions, and pathways that are included. These investigations can help uncover the underlying molecular mechanisms of complex phenotypes and clarify the biological processes related to health and diseases. Therefore, our goal in this study is to provide an overview of the latest and most popular approaches for investigating PPIs. We also overview some important clinical approaches based on the PPIs and how these interactions can be targeted.
Collapse
Affiliation(s)
- Sama Akbarzadeh
- Department of Biophysics, Istanbul Faculty of Medicine, Istanbul University, Istanbul, Türkiye; Institute of Graduate Studies in Health Sciences, Istanbul University, Istanbul, Türkiye
| | - Özlem Coşkun
- Department of Biophysics, Faculty of Medicine, Çanakkale Onsekiz Mart University, Çanakkale, Türkiye
| | - Başak Günçer
- Department of Biophysics, Istanbul Faculty of Medicine, Istanbul University, Istanbul, Türkiye.
| |
Collapse
|
4
|
Sun DZ, Sun ZL, Liu M, Yong SH. LPI-SKMSC: Predicting LncRNA-Protein Interactions with Segmented k-mer Frequencies and Multi-space Clustering. Interdiscip Sci 2024; 16:378-391. [PMID: 38206558 DOI: 10.1007/s12539-023-00598-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2023] [Revised: 11/25/2023] [Accepted: 12/05/2023] [Indexed: 01/12/2024]
Abstract
Long noncoding RNAs (lncRNAs) have significant regulatory roles in gene expression. Interactions with proteins are one of the ways lncRNAs play their roles. Since experiments to determine lncRNA-protein interactions (LPIs) are expensive and time-consuming, many computational methods for predicting LPIs have been proposed as alternatives. In the LPIs prediction problem, there commonly exists the imbalance in the distribution of positive and negative samples. However, there are few existing methods that give specific consideration to this problem. In this paper, we proposed a new clustering-based LPIs prediction method using segmented k-mer frequencies and multi-space clustering (LPI-SKMSC). It was dedicated to handling the imbalance of positive and negative samples. We constructed segmented k-mer frequencies to obtain global and local features of lncRNA and protein sequences. Then, the multi-space clustering was applied to LPI-SKMSC. The convolutional neural network (CNN)-based encoders were used to map different features of a sample to different spaces. It used multiple spaces to jointly constrain the classification of samples. Finally, the distances between the output features of the encoder and the cluster center in each space were calculated. The sum of distances in all spaces was compared with the cluster radius to predict the LPIs. We performed cross-validation on 3 public datasets and LPI-SKMSC showed the best performance compared to other existing methods. Experimental results showed that LPI-SKMSC could predict LPIs more effectively when faced with imbalanced positive and negative samples. In addition, we illustrated that our model was better at uncovering potential lncRNA-protein interaction pairs.
Collapse
Affiliation(s)
- Dian-Zheng Sun
- School of Electrical Engineering and Automation, Anhui University, Hefei, 230601, China
| | - Zhan-Li Sun
- School of Electrical Engineering and Automation, Anhui University, Hefei, 230601, China.
| | - Mengya Liu
- School of Computer Science and Technology, Anhui University, Hefei, 230601, China
| | - Shuang-Hao Yong
- School of Electrical Engineering and Automation, Anhui University, Hefei, 230601, China
| |
Collapse
|
5
|
Jin R, Ye Q, Wang J, Cao Z, Jiang D, Wang T, Kang Y, Xu W, Hsieh CY, Hou T. AttABseq: an attention-based deep learning prediction method for antigen-antibody binding affinity changes based on protein sequences. Brief Bioinform 2024; 25:bbae304. [PMID: 38960407 PMCID: PMC11221889 DOI: 10.1093/bib/bbae304] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2023] [Revised: 04/15/2024] [Accepted: 06/11/2024] [Indexed: 07/05/2024] Open
Abstract
The optimization of therapeutic antibodies through traditional techniques, such as candidate screening via hybridoma or phage display, is resource-intensive and time-consuming. In recent years, computational and artificial intelligence-based methods have been actively developed to accelerate and improve the development of therapeutic antibodies. In this study, we developed an end-to-end sequence-based deep learning model, termed AttABseq, for the predictions of the antigen-antibody binding affinity changes connected with antibody mutations. AttABseq is a highly efficient and generic attention-based model by utilizing diverse antigen-antibody complex sequences as the input to predict the binding affinity changes of residue mutations. The assessment on the three benchmark datasets illustrates that AttABseq is 120% more accurate than other sequence-based models in terms of the Pearson correlation coefficient between the predicted and experimental binding affinity changes. Moreover, AttABseq also either outperforms or competes favorably with the structure-based approaches. Furthermore, AttABseq consistently demonstrates robust predictive capabilities across a diverse array of conditions, underscoring its remarkable capacity for generalization across a wide spectrum of antigen-antibody complexes. It imposes no constraints on the quantity of altered residues, rendering it particularly applicable in scenarios where crystallographic structures remain unavailable. The attention-based interpretability analysis indicates that the causal effects of point mutations on antibody-antigen binding affinity changes can be visualized at the residue level, which might assist automated antibody sequence optimization. We believe that AttABseq provides a fiercely competitive answer to therapeutic antibody optimization.
Collapse
Affiliation(s)
- Ruofan Jin
- College of Pharmaceutical Science, Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Zhejiang University, Yuhangtang Road 866, Hangzhou 310058, Zhejiang, China
- College of Life Science, Zhejiang University, Yuhangtang Road 866, Hangzhou 310058, Zhejiang, China
| | - Qing Ye
- College of Pharmaceutical Science, Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Zhejiang University, Yuhangtang Road 866, Hangzhou 310058, Zhejiang, China
| | - Jike Wang
- College of Pharmaceutical Science, Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Zhejiang University, Yuhangtang Road 866, Hangzhou 310058, Zhejiang, China
| | - Zheng Cao
- College of Computer Science and Technology, Zhejiang University, Yuhangtang Road 866, Hangzhou 310058, Zhejiang, China
| | - Dejun Jiang
- College of Pharmaceutical Science, Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Zhejiang University, Yuhangtang Road 866, Hangzhou 310058, Zhejiang, China
| | - Tianyue Wang
- College of Pharmaceutical Science, Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Zhejiang University, Yuhangtang Road 866, Hangzhou 310058, Zhejiang, China
| | - Yu Kang
- College of Pharmaceutical Science, Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Zhejiang University, Yuhangtang Road 866, Hangzhou 310058, Zhejiang, China
| | - Wanting Xu
- College of Pharmaceutical Science, Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Zhejiang University, Yuhangtang Road 866, Hangzhou 310058, Zhejiang, China
| | - Chang-Yu Hsieh
- College of Pharmaceutical Science, Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Zhejiang University, Yuhangtang Road 866, Hangzhou 310058, Zhejiang, China
| | - Tingjun Hou
- College of Pharmaceutical Science, Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Zhejiang University, Yuhangtang Road 866, Hangzhou 310058, Zhejiang, China
| |
Collapse
|
6
|
Ovek D, Keskin O, Gursoy A. ProInterVal: Validation of Protein-Protein Interfaces through Learned Interface Representations. J Chem Inf Model 2024; 64:2979-2987. [PMID: 38526504 PMCID: PMC11040718 DOI: 10.1021/acs.jcim.3c01788] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2023] [Revised: 02/21/2024] [Accepted: 02/22/2024] [Indexed: 03/26/2024]
Abstract
Proteins are vital components of the biological world and serve a multitude of functions. They interact with other molecules through their interfaces and participate in crucial cellular processes. Disruption of these interactions can have negative effects on organisms, highlighting the importance of studying protein-protein interfaces for developing targeted therapies for diseases. Therefore, the development of a reliable method for investigating protein-protein interactions is of paramount importance. In this work, we present an approach for validating protein-protein interfaces using learned interface representations. The approach involves using a graph-based contrastive autoencoder architecture and a transformer to learn representations of protein-protein interaction interfaces from unlabeled data and then validating them through learned representations with a graph neural network. Our method achieves an accuracy of 0.91 for the test set, outperforming existing GNN-based methods. We demonstrate the effectiveness of our approach on a benchmark data set and show that it provides a promising solution for validating protein-protein interfaces.
Collapse
Affiliation(s)
- Damla Ovek
- KUIS
AI Center, Koç University, Istanbul 34450, Turkey
- Computer
Engineering, Koç University, Istanbul 34450, Turkey
| | - Ozlem Keskin
- Chemical
and Biological Engineering, Koç University, Istanbul 34450, Turkey
| | - Attila Gursoy
- Computer
Engineering, Koç University, Istanbul 34450, Turkey
| |
Collapse
|
7
|
Pan J, Zhang Z, Li Y, Yu J, You Z, Li C, Wang S, Zhu M, Ren F, Zhang X, Sun Y, Wang S. A microbial knowledge graph-based deep learning model for predicting candidate microbes for target hosts. Brief Bioinform 2024; 25:bbae119. [PMID: 38555472 PMCID: PMC10981679 DOI: 10.1093/bib/bbae119] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2023] [Revised: 02/23/2024] [Accepted: 03/02/2024] [Indexed: 04/02/2024] Open
Abstract
Predicting interactions between microbes and hosts plays critical roles in microbiome population genetics and microbial ecology and evolution. How to systematically characterize the sophisticated mechanisms and signal interplay between microbes and hosts is a significant challenge for global health risks. Identifying microbe-host interactions (MHIs) can not only provide helpful insights into their fundamental regulatory mechanisms, but also facilitate the development of targeted therapies for microbial infections. In recent years, computational methods have become an appealing alternative due to the high risk and cost of wet-lab experiments. Therefore, in this study, we utilized rich microbial metagenomic information to construct a novel heterogeneous microbial network (HMN)-based model named KGVHI to predict candidate microbes for target hosts. Specifically, KGVHI first built a HMN by integrating human proteins, viruses and pathogenic bacteria with their biological attributes. Then KGVHI adopted a knowledge graph embedding strategy to capture the global topological structure information of the whole network. A natural language processing algorithm is used to extract the local biological attribute information from the nodes in HMN. Finally, we combined the local and global information and fed it into a blended deep neural network (DNN) for training and prediction. Compared to state-of-the-art methods, the comprehensive experimental results show that our model can obtain excellent results on the corresponding three MHI datasets. Furthermore, we also conducted two pathogenic bacteria case studies to further indicate that KGVHI has excellent predictive capabilities for potential MHI pairs.
Collapse
Affiliation(s)
- Jie Pan
- Key Laboratory of Resources Biology and Biotechnology in Western China, Ministry of Education, Provincial Key Laboratory of Biotechnology of Shaanxi Province, College of Life Sciences, Northwest University, Xi’an 710069, China
| | - Zhen Zhang
- Key Laboratory of Resources Biology and Biotechnology in Western China, Ministry of Education, Provincial Key Laboratory of Biotechnology of Shaanxi Province, College of Life Sciences, Northwest University, Xi’an 710069, China
| | - Ying Li
- Key Laboratory of Resources Biology and Biotechnology in Western China, Ministry of Education, Provincial Key Laboratory of Biotechnology of Shaanxi Province, College of Life Sciences, Northwest University, Xi’an 710069, China
| | - Jiaoyang Yu
- Key Laboratory of Resources Biology and Biotechnology in Western China, Ministry of Education, Provincial Key Laboratory of Biotechnology of Shaanxi Province, College of Life Sciences, Northwest University, Xi’an 710069, China
| | - Zhuhong You
- School of Computer Science, Northwestern Polytechnical University, Xi’an 710129, China
| | - Chenyu Li
- Key Laboratory of Resources Biology and Biotechnology in Western China, Ministry of Education, Provincial Key Laboratory of Biotechnology of Shaanxi Province, College of Life Sciences, Northwest University, Xi’an 710069, China
| | - Shixu Wang
- Key Laboratory of Resources Biology and Biotechnology in Western China, Ministry of Education, Provincial Key Laboratory of Biotechnology of Shaanxi Province, College of Life Sciences, Northwest University, Xi’an 710069, China
| | - Minghui Zhu
- Key Laboratory of Resources Biology and Biotechnology in Western China, Ministry of Education, Provincial Key Laboratory of Biotechnology of Shaanxi Province, College of Life Sciences, Northwest University, Xi’an 710069, China
| | - Fengzhi Ren
- North China Pharmaceutical Group, Shijiazhuang 050015, Hebei, China
- National Microbial Medicine Engineering & Research Center, Shijiazhuang 050015, Hebei, China
| | - Xuexia Zhang
- North China Pharmaceutical Group, Shijiazhuang 050015, Hebei, China
- National Microbial Medicine Engineering & Research Center, Shijiazhuang 050015, Hebei, China
| | - Yanmei Sun
- Key Laboratory of Resources Biology and Biotechnology in Western China, Ministry of Education, Provincial Key Laboratory of Biotechnology of Shaanxi Province, College of Life Sciences, Northwest University, Xi’an 710069, China
| | - Shiwei Wang
- Key Laboratory of Resources Biology and Biotechnology in Western China, Ministry of Education, Provincial Key Laboratory of Biotechnology of Shaanxi Province, College of Life Sciences, Northwest University, Xi’an 710069, China
| |
Collapse
|
8
|
Dang TH, Vu TA. xCAPT5: protein-protein interaction prediction using deep and wide multi-kernel pooling convolutional neural networks with protein language model. BMC Bioinformatics 2024; 25:106. [PMID: 38461247 PMCID: PMC10924985 DOI: 10.1186/s12859-024-05725-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2024] [Accepted: 02/28/2024] [Indexed: 03/11/2024] Open
Abstract
BACKGROUND Predicting protein-protein interactions (PPIs) from sequence data is a key challenge in computational biology. While various computational methods have been proposed, the utilization of sequence embeddings from protein language models, which contain diverse information, including structural, evolutionary, and functional aspects, has not been fully exploited. Additionally, there is a significant need for a comprehensive neural network capable of efficiently extracting these multifaceted representations. RESULTS Addressing this gap, we propose xCAPT5, a novel hybrid classifier that uniquely leverages the T5-XL-UniRef50 protein large language model for generating rich amino acid embeddings from protein sequences. The core of xCAPT5 is a multi-kernel deep convolutional siamese neural network, which effectively captures intricate interaction features at both micro and macro levels, integrated with the XGBoost algorithm, enhancing PPIs classification performance. By concatenating max and average pooling features in a depth-wise manner, xCAPT5 effectively learns crucial features with low computational cost. CONCLUSION This study represents one of the initial efforts to extract informative amino acid embeddings from a large protein language model using a deep and wide convolutional network. Experimental results show that xCAPT5 outperforms recent state-of-the-art methods in binary PPI prediction, excelling in cross-validation on several benchmark datasets and demonstrating robust generalization across intra-species, cross-species, inter-species, and stringent similarity contexts.
Collapse
Affiliation(s)
- Thanh Hai Dang
- Faculty of Information Technology, VNU University of Engineering and Technology, 144 Xuan Thuy, Hanoi, 10000, Vietnam.
| | - Tien Anh Vu
- Faculty of Biology, VNU University of Science, 334 Nguyen Trai, Hanoi, 10000, Vietnam
| |
Collapse
|
9
|
Ma Y, Zhao Y, Ma Y. Kernel Bayesian nonlinear matrix factorization based on variational inference for human-virus protein-protein interaction prediction. Sci Rep 2024; 14:5693. [PMID: 38454139 PMCID: PMC10920681 DOI: 10.1038/s41598-024-56208-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2023] [Accepted: 03/04/2024] [Indexed: 03/09/2024] Open
Abstract
Identification of potential human-virus protein-protein interactions (PPIs) contributes to the understanding of the mechanisms of viral infection and to the development of antiviral drugs. Existing computational models often have more hyperparameters that need to be adjusted manually, which limits their computational efficiency and generalization ability. Based on this, this study proposes a kernel Bayesian logistic matrix decomposition model with automatic rank determination, VKBNMF, for the prediction of human-virus PPIs. VKBNMF introduces auxiliary information into the logistic matrix decomposition and sets the prior probabilities of the latent variables to build a Bayesian framework for automatic parameter search. In addition, we construct the variational inference framework of VKBNMF to ensure the solution efficiency. The experimental results show that for the scenarios of paired PPIs, VKBNMF achieves an average AUPR of 0.9101, 0.9316, 0.8727, and 0.9517 on the four benchmark datasets, respectively, and for the scenarios of new human (viral) proteins, VKBNMF still achieves a higher hit rate. The case study also further demonstrated that VKBNMF can be used as an effective tool for the prediction of human-virus PPIs.
Collapse
Affiliation(s)
- Yingjun Ma
- School of Mathematics and Statistics, Xiamen University of Technology, Xiamen, China
| | - Yongbiao Zhao
- School of Computer, Central China Normal University, Wuhan, China
| | - Yuanyuan Ma
- School of Computer Engineering, Hubei University of Arts and Science, Xiangyang, China.
- Hubei Key Laboratory of Power System Design and Test for Electrical Vehicle, Hubei University of Arts and Science, Xiangyang, China.
| |
Collapse
|
10
|
Yang X, Wuchty S, Liang Z, Ji L, Wang B, Zhu J, Zhang Z, Dong Y. Multi-modal features-based human-herpesvirus protein-protein interaction prediction by using LightGBM. Brief Bioinform 2024; 25:bbae005. [PMID: 38279649 PMCID: PMC10818167 DOI: 10.1093/bib/bbae005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2023] [Revised: 12/25/2023] [Accepted: 01/01/2021] [Indexed: 01/28/2024] Open
Abstract
The identification of human-herpesvirus protein-protein interactions (PPIs) is an essential and important entry point to understand the mechanisms of viral infection, especially in malignant tumor patients with common herpesvirus infection. While natural language processing (NLP)-based embedding techniques have emerged as powerful approaches, the application of multi-modal embedding feature fusion to predict human-herpesvirus PPIs is still limited. Here, we established a multi-modal embedding feature fusion-based LightGBM method to predict human-herpesvirus PPIs. In particular, we applied document and graph embedding approaches to represent sequence, network and function modal features of human and herpesviral proteins. Training our LightGBM models through our compiled non-rigorous and rigorous benchmarking datasets, we obtained significantly better performance compared to individual-modal features. Furthermore, our model outperformed traditional feature encodings-based machine learning methods and state-of-the-art deep learning-based methods using various benchmarking datasets. In a transfer learning step, we show that our model that was trained on human-herpesvirus PPI dataset without cytomegalovirus data can reliably predict human-cytomegalovirus PPIs, indicating that our method can comprehensively capture multi-modal fusion features of protein interactions across various herpesvirus subtypes. The implementation of our method is available at https://github.com/XiaodiYangpku/MultimodalPPI/.
Collapse
Affiliation(s)
- Xiaodi Yang
- Department of Hematology, Peking University First Hospital, Beijing, China
| | - Stefan Wuchty
- Department of Computer Science, University of Miami, Miami FL, 33146, USA
- Department of Biology, University of Miami, Miami FL, 33146, USA
- Institute of Data Science and Computation, University of Miami, Miami, FL 33146, USA
- Sylvester Comprehensive Cancer Center, University of Miami, Miami, FL 33136, USA
| | - Zeyin Liang
- Department of Hematology, Peking University First Hospital, Beijing, China
| | - Li Ji
- Department of Hematology, Peking University First Hospital, Beijing, China
| | - Bingjie Wang
- Department of Hematology, Peking University First Hospital, Beijing, China
| | - Jialin Zhu
- Department of Hematology, Peking University First Hospital, Beijing, China
| | - Ziding Zhang
- State Key Laboratory of Animal Biotech Breeding, College of Biological Sciences, China Agricultural University, Beijing 100193, China
| | - Yujun Dong
- Department of Hematology, Peking University First Hospital, Beijing, China
| |
Collapse
|
11
|
Chen HM, Liu JX, Liu D, Hao GF, Yang GF. Human-virus protein-protein interactions maps assist in revealing the pathogenesis of viral infection. Rev Med Virol 2024; 34:e2517. [PMID: 38282401 DOI: 10.1002/rmv.2517] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2023] [Revised: 09/12/2023] [Accepted: 01/16/2024] [Indexed: 01/30/2024]
Abstract
Many significant viral infections have been recorded in human history, which have caused enormous negative impacts worldwide. Human-virus protein-protein interactions (PPIs) mediate viral infection and immune processes in the host. The identification, quantification, localization, and construction of human-virus PPIs maps are critical prerequisites for understanding the biophysical basis of the viral invasion process and characterising the framework for all protein functions. With the technological revolution and the introduction of artificial intelligence, the human-virus PPIs maps have been expanded rapidly in the past decade and shed light on solving complicated biomedical problems. However, there is still a lack of prospective insight into the field. In this work, we comprehensively review and compare the effectiveness, potential, and limitations of diverse approaches for constructing large-scale PPIs maps in human-virus, including experimental methods based on biophysics and biochemistry, databases of human-virus PPIs, computational methods based on artificial intelligence, and tools for visualising PPIs maps. The work aims to provide a toolbox for researchers, hoping to better assist in deciphering the relationship between humans and viruses.
Collapse
Affiliation(s)
- Hui-Min Chen
- National Key Laboratory of Green Pesticide, Central China Normal University, Wuhan, China
| | - Jia-Xin Liu
- National Key Laboratory of Green Pesticide, Central China Normal University, Wuhan, China
| | - Di Liu
- CAS Key Laboratory of Special Pathogens and Biosafety, Wuhan Institute of Virology, Center for Biosafety Mega-Science, Chinese Academy of Sciences, Wuhan, China
| | - Ge-Fei Hao
- National Key Laboratory of Green Pesticide, Central China Normal University, Wuhan, China
- National Key Laboratory of Green Pesticide, Key Laboratory of Green Pesticide and Agricultural Bioengineering, Ministry of Education, Center for Research and Development of Fine Chemicals, Guizhou University, Guiyang, China
| | - Guang-Fu Yang
- National Key Laboratory of Green Pesticide, Central China Normal University, Wuhan, China
| |
Collapse
|
12
|
Wu J, Liu B, Zhang J, Wang Z, Li J. DL-PPI: a method on prediction of sequenced protein-protein interaction based on deep learning. BMC Bioinformatics 2023; 24:473. [PMID: 38097937 PMCID: PMC10722729 DOI: 10.1186/s12859-023-05594-5] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2023] [Accepted: 12/01/2023] [Indexed: 12/17/2023] Open
Abstract
PURPOSE Sequenced Protein-Protein Interaction (PPI) prediction represents a pivotal area of study in biology, playing a crucial role in elucidating the mechanistic underpinnings of diseases and facilitating the design of novel therapeutic interventions. Conventional methods for extracting features through experimental processes have proven to be both costly and exceedingly complex. In light of these challenges, the scientific community has turned to computational approaches, particularly those grounded in deep learning methodologies. Despite the progress achieved by current deep learning technologies, their effectiveness diminishes when applied to larger, unfamiliar datasets. RESULTS In this study, the paper introduces a novel deep learning framework, termed DL-PPI, for predicting PPIs based on sequence data. The proposed framework comprises two key components aimed at improving the accuracy of feature extraction from individual protein sequences and capturing relationships between proteins in unfamiliar datasets. 1. Protein Node Feature Extraction Module: To enhance the accuracy of feature extraction from individual protein sequences and facilitate the understanding of relationships between proteins in unknown datasets, the paper devised a novel protein node feature extraction module utilizing the Inception method. This module efficiently captures relevant patterns and representations within protein sequences, enabling more informative feature extraction. 2. Feature-Relational Reasoning Network (FRN): In the Global Feature Extraction module of our model, the paper developed a novel FRN that leveraged Graph Neural Networks to determine interactions between pairs of input proteins. The FRN effectively captures the underlying relational information between proteins, contributing to improved PPI predictions. DL-PPI framework demonstrates state-of-the-art performance in the realm of sequence-based PPI prediction.
Collapse
Affiliation(s)
- Jiahui Wu
- Faculty of Information Technology, Beijing University of Technology, Beijing, 100124, China
| | - Bo Liu
- School of Mathematical and Computational Sciences, Massey University, Auckland, 0745, New Zealand.
| | - Jidong Zhang
- Faculty of Information Technology, Beijing University of Technology, Beijing, 100124, China
| | - Zhihan Wang
- Faculty of Information Technology, Beijing University of Technology, Beijing, 100124, China
| | - Jianqiang Li
- Faculty of Information Technology, Beijing University of Technology, Beijing, 100124, China
| |
Collapse
|
13
|
Zhang Z, Lu C, Mo B, Bai K, Ge XY, Deng L, Peng Y. Prediction of mammalian virus cross-species transmission based on host proteins. Microbiol Spectr 2023; 11:e0536822. [PMID: 37754753 PMCID: PMC10581197 DOI: 10.1128/spectrum.05368-22] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2023] [Accepted: 08/04/2023] [Indexed: 09/28/2023] Open
Abstract
Most emerging viruses are spilled over from mammals. Understanding the mechanism of virus cross-species transmission and identifying zoonotic viruses before their emergence are critical for the prevention and control of newly emerging viruses. This study systematically investigated the host proteins associated with the cross-species transmission of mammalian viruses based on 1,271 pairs of virus-mammal interactions including 382 viruses from 33 viral families and 73 mammal species from 11 orders. Numerous host proteins were found to contribute to the cross-species transmission of mammalian viruses. Host proteins potentially contributing to virus cross-species transmission are specific to viral families, and few overlaps of such host proteins are observed in different viral families. Based on these host proteins, the random-forest (RF) models were built to predict the cross-species transmission potential of mammalian viruses. Moderate performance was obtained when using all viruses together. However, when modeling by viral family, the performance of the RF models varied much among viral families. In 13 viral families such as Flaviviridae, Retroviridae, and Poxviridae, the AUC of the RF model was greater than 0.8. Finally, the contribution of virus receptors to cross-species transmission was evaluated, and the virus receptor was found to have a minor effect in predicting the cross-species transmission of mammalian viruses. The study deepens our understanding of the mechanism of virus cross-species transmission and provides a framework for predicting the cross-species transmission of mammalian viruses. IMPORTANCE Emerging viruses pose serious threats to humans. Understanding the mechanism of virus cross-species transmission and identifying zoonotic viruses before their emergence are critical for the prevention and control of emerging viruses. This study systematically identified host factors associated with cross-species transmission of mammalian viruses and further built machine-learning models for predicting cross-species transmission of the viruses based on host factors including virus receptors. The study not only deepens our understanding of the mechanism of virus cross-species transmission but also provides a framework for predicting the cross-species transmission of mammalian viruses based on host factors.
Collapse
Affiliation(s)
- Zheng Zhang
- Bioinformatics Center, College of Biology, Hunan Provincial Key Laboratory of Medical Virology, Hunan University, Changsha, Hunan, China
- Hunan Engineering and Technology Research Center for Agricultural Big Data Analysis & Decision-making, College of Plant Protection, Hunan Agricultural University, Changsha, Hunan, China
| | - Congyu Lu
- Bioinformatics Center, College of Biology, Hunan Provincial Key Laboratory of Medical Virology, Hunan University, Changsha, Hunan, China
| | - Bocheng Mo
- Hunan Engineering and Technology Research Center for Agricultural Big Data Analysis & Decision-making, College of Plant Protection, Hunan Agricultural University, Changsha, Hunan, China
| | - Kehan Bai
- Hunan Juyoubiotech Co., Ltd, Changsha, Hunan, China
| | - Xing-Yi Ge
- Bioinformatics Center, College of Biology, Hunan Provincial Key Laboratory of Medical Virology, Hunan University, Changsha, Hunan, China
| | - Li Deng
- Department of Internal Medicine-Neurology, The Third Hospital of Changsha, Changsha, Hunan, China
| | - Yousong Peng
- Bioinformatics Center, College of Biology, Hunan Provincial Key Laboratory of Medical Virology, Hunan University, Changsha, Hunan, China
| |
Collapse
|
14
|
Lee M. Recent Advances in Deep Learning for Protein-Protein Interaction Analysis: A Comprehensive Review. Molecules 2023; 28:5169. [PMID: 37446831 DOI: 10.3390/molecules28135169] [Citation(s) in RCA: 12] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2023] [Revised: 06/30/2023] [Accepted: 06/30/2023] [Indexed: 07/15/2023] Open
Abstract
Deep learning, a potent branch of artificial intelligence, is steadily leaving its transformative imprint across multiple disciplines. Within computational biology, it is expediting progress in the understanding of Protein-Protein Interactions (PPIs), key components governing a wide array of biological functionalities. Hence, an in-depth exploration of PPIs is crucial for decoding the intricate biological system dynamics and unveiling potential avenues for therapeutic interventions. As the deployment of deep learning techniques in PPI analysis proliferates at an accelerated pace, there exists an immediate demand for an exhaustive review that encapsulates and critically assesses these novel developments. Addressing this requirement, this review offers a detailed analysis of the literature from 2021 to 2023, highlighting the cutting-edge deep learning methodologies harnessed for PPI analysis. Thus, this review stands as a crucial reference for researchers in the discipline, presenting an overview of the recent studies in the field. This consolidation helps elucidate the dynamic paradigm of PPI analysis, the evolution of deep learning techniques, and their interdependent dynamics. This scrutiny is expected to serve as a vital aid for researchers, both well-established and newcomers, assisting them in maneuvering the rapidly shifting terrain of deep learning applications in PPI analysis.
Collapse
Affiliation(s)
- Minhyeok Lee
- School of Electrical and Electronics Engineering, Chung-Ang University, Seoul 06974, Republic of Korea
| |
Collapse
|
15
|
Dalkıran A, Atakan A, Rifaioğlu AS, Martin MJ, Atalay RÇ, Acar AC, Doğan T, Atalay V. Transfer learning for drug-target interaction prediction. Bioinformatics 2023; 39:i103-i110. [PMID: 37387156 DOI: 10.1093/bioinformatics/btad234] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/19/2023] [Indexed: 07/01/2023] Open
Abstract
MOTIVATION Utilizing AI-driven approaches for drug-target interaction (DTI) prediction require large volumes of training data which are not available for the majority of target proteins. In this study, we investigate the use of deep transfer learning for the prediction of interactions between drug candidate compounds and understudied target proteins with scarce training data. The idea here is to first train a deep neural network classifier with a generalized source training dataset of large size and then to reuse this pre-trained neural network as an initial configuration for re-training/fine-tuning purposes with a small-sized specialized target training dataset. To explore this idea, we selected six protein families that have critical importance in biomedicine: kinases, G-protein-coupled receptors (GPCRs), ion channels, nuclear receptors, proteases, and transporters. In two independent experiments, the protein families of transporters and nuclear receptors were individually set as the target datasets, while the remaining five families were used as the source datasets. Several size-based target family training datasets were formed in a controlled manner to assess the benefit provided by the transfer learning approach. RESULTS Here, we present a systematic evaluation of our approach by pre-training a feed-forward neural network with source training datasets and applying different modes of transfer learning from the pre-trained source network to a target dataset. The performance of deep transfer learning is evaluated and compared with that of training the same deep neural network from scratch. We found that when the training dataset contains fewer than 100 compounds, transfer learning outperforms the conventional strategy of training the system from scratch, suggesting that transfer learning is advantageous for predicting binders to under-studied targets. AVAILABILITY AND IMPLEMENTATION The source code and datasets are available at https://github.com/cansyl/TransferLearning4DTI. Our web-based service containing the ready-to-use pre-trained models is accessible at https://tl4dti.kansil.org.
Collapse
Affiliation(s)
- Alperen Dalkıran
- Department of Computer Engineering, Middle East Technical University, Ankara 06800, Turkey
- Department of Computer Engineering, Adana Alparslan Türkeş Science and Technology University, Adana 01250, Turkey
| | - Ahmet Atakan
- Department of Computer Engineering, Middle East Technical University, Ankara 06800, Turkey
- Department of Computer Engineering, Erzincan Binali Yıldırım University, Erzincan 24002, Turkey
| | - Ahmet S Rifaioğlu
- Department of Computer Engineering, Iskenderun Technical University, Hatay 31200, Turkey
- Faculty of Medicine, Institute for Computational Biomedicine, Heidelberg University and Heidelberg University Hospital, Heidelberg 69120, Germany
| | - Maria J Martin
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Cambridge, Hinxton CB10 1SD, United Kingdom
| | - Rengül Çetin Atalay
- Faculty of Pulmonary and Critical Care Medicine, the University of Chicago, Chicago, IL, 60637, United States
| | - Aybar C Acar
- Cancer Systems Biology Laboratory (Kansil), Middle East Technical University, Ankara 06800, Turkey
| | - Tunca Doğan
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Cambridge, Hinxton CB10 1SD, United Kingdom
- Department of Computer Engineering, Hacettepe University, Ankara 06800, Turkey
| | - Volkan Atalay
- Department of Computer Engineering, Middle East Technical University, Ankara 06800, Turkey
| |
Collapse
|
16
|
Xie P, Zhuang J, Tian G, Yang J. Emvirus: An embedding-based neural framework for human-virus protein-protein interactions prediction. BIOSAFETY AND HEALTH 2023; 5:152-158. [PMID: 37362223 PMCID: PMC10166638 DOI: 10.1016/j.bsheal.2023.04.003] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2023] [Revised: 04/23/2023] [Accepted: 04/23/2023] [Indexed: 06/28/2023] Open
Abstract
Human-virus protein-protein interactions (PPIs) play critical roles in viral infection. For example, the spike protein of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) binds primarily to human angiotensin-converting enzyme 2 (ACE2) protein to infect human cells. Thus, identifying and blocking these PPIs contribute to controlling and preventing viruses. However, wet-lab experiment-based identification of human-virus PPIs is usually expensive, labor-intensive, and time-consuming, which presents the need for computational methods. Many machine-learning methods have been proposed recently and achieved good results in predicting human-virus PPIs. However, most methods are based on protein sequence features and apply manually extracted features, such as statistical characteristics, phylogenetic profiles, and physicochemical properties. In this work, we present an embedding-based neural framework with convolutional neural network (CNN) and bi-directional long short-term memory unit (Bi-LSTM) architecture, named Emvirus, to predict human-virus PPIs (including human-SARS-CoV-2 PPIs). In addition, we conduct cross-viral experiments to explore the generalization ability of Emvirus. Compared to other feature extraction methods, Emvirus achieves better prediction accuracy.
Collapse
Affiliation(s)
- Pengfei Xie
- College of Transportation Engineering, Dalian Maritime University, Dalian 116026, China
| | - Jujuan Zhuang
- School of Science, Dalian Maritime University, Dalian 116026, China
| | - Geng Tian
- Geneis Beijing Co., Ltd., Beijing 100102, China
- Qingdao Geneis Institute of Big Data Mining and Precision Medicine, Qingdao 266000, China
| | - Jialiang Yang
- Geneis Beijing Co., Ltd., Beijing 100102, China
- Qingdao Geneis Institute of Big Data Mining and Precision Medicine, Qingdao 266000, China
| |
Collapse
|
17
|
Li F, Liu S, Li K, Zhang Y, Duan M, Yao Z, Zhu G, Guo Y, Wang Y, Huang L, Zhou F. EpiTEAmDNA: Sequence feature representation via transfer learning and ensemble learning for identifying multiple DNA epigenetic modification types across species. Comput Biol Med 2023; 160:107030. [PMID: 37196456 DOI: 10.1016/j.compbiomed.2023.107030] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2023] [Revised: 04/21/2023] [Accepted: 05/10/2023] [Indexed: 05/19/2023]
Abstract
Methylation is a major DNA epigenetic modification for regulating the biological processes without altering the DNA sequence, and multiple types of DNA methylations have been discovered, including 6mA, 5hmC, and 4mC. Multiple computational approaches were developed to automatically identify the DNA methylation residues using machine learning or deep learning algorithms. The machine learning (ML) based methods are difficult to be transferred to the other predicting tasks of the DNA methylation sites using additional knowledge. Deep learning (DL) may facilitate the transfer learning of knowledge from similar tasks, but they are often ineffective on small datasets. This study proposes an integrated feature representation framework EpiTEAmDNA based on the strategies of transfer learning and ensemble learning, which is evaluated on multiple DNA methylation types across 15 species. EpiTEAmDNA integrates convolutional neural network (CNN) and conventional machine learning methods, and shows improved performances than the existing DL-based methods on small datasets when no additional knowledge is available. The experimental data suggests that the EpiTEAmDNA models may be further improved via transfer learning based on additional knowledge. The evaluation experiments on the independent test datasets also suggest that the proposed EpiTEAmDNA framework outperforms the existing models in most prediction tasks of the 3 DNA methylation types across 15 species. The source code, pre-trained global model, and the EpiTEAmDNA feature representation framework are freely available at http://www.healthinformaticslab.org/supp/.
Collapse
Affiliation(s)
- Fei Li
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin, 130012, China; College of Computer Science and Technology, Jilin University, Changchun, Jilin, 130012, China
| | - Shuai Liu
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin, 130012, China; College of Computer Science and Technology, Jilin University, Changchun, Jilin, 130012, China
| | - Kewei Li
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin, 130012, China; College of Computer Science and Technology, Jilin University, Changchun, Jilin, 130012, China
| | - Yaqi Zhang
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin, 130012, China; College of Computer Science and Technology, Jilin University, Changchun, Jilin, 130012, China
| | - Meiyu Duan
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin, 130012, China; College of Computer Science and Technology, Jilin University, Changchun, Jilin, 130012, China.
| | - Zhaomin Yao
- College of Medicine and Biological Information Engineering, Northeastern University, Shenyang, Liaoning, 110167, China
| | - Gancheng Zhu
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin, 130012, China; College of Computer Science and Technology, Jilin University, Changchun, Jilin, 130012, China
| | - Yutong Guo
- College of Life Sciences, Jilin University, Changchun, Jilin, 130012, China
| | - Ying Wang
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin, 130012, China; College of Computer Science and Technology, Jilin University, Changchun, Jilin, 130012, China
| | - Lan Huang
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin, 130012, China; College of Computer Science and Technology, Jilin University, Changchun, Jilin, 130012, China
| | - Fengfeng Zhou
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin, 130012, China; College of Computer Science and Technology, Jilin University, Changchun, Jilin, 130012, China.
| |
Collapse
|
18
|
Huang Y, Wuchty S, Zhou Y, Zhang Z. SGPPI: structure-aware prediction of protein-protein interactions in rigorous conditions with graph convolutional network. Brief Bioinform 2023; 24:6995378. [PMID: 36682013 DOI: 10.1093/bib/bbad020] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2022] [Revised: 11/17/2022] [Accepted: 01/05/2023] [Indexed: 01/23/2023] Open
Abstract
While deep learning (DL)-based models have emerged as powerful approaches to predict protein-protein interactions (PPIs), the reliance on explicit similarity measures (e.g. sequence similarity and network neighborhood) to known interacting proteins makes these methods ineffective in dealing with novel proteins. The advent of AlphaFold2 presents a significant opportunity and also a challenge to predict PPIs in a straightforward way based on monomer structures while controlling bias from protein sequences. In this work, we established Structure and Graph-based Predictions of Protein Interactions (SGPPI), a structure-based DL framework for predicting PPIs, using the graph convolutional network. In particular, SGPPI focused on protein patches on the protein-protein binding interfaces and extracted the structural, geometric and evolutionary features from the residue contact map to predict PPIs. We demonstrated that our model outperforms traditional machine learning methods and state-of-the-art DL-based methods using non-representation-bias benchmark datasets. Moreover, our model trained on human dataset can be reliably transferred to predict yeast PPIs, indicating that SGPPI can capture converging structural features of protein interactions across various species. The implementation of SGPPI is available at https://github.com/emerson106/SGPPI.
Collapse
Affiliation(s)
- Yan Huang
- State Key Laboratory of Livestock and Poultry Biotechnology Breeding, College of Biological Sciences, China Agricultural University, Beijing 100193, China
- Department of Biomedical Informatics, Ministry of Education Key Laboratory of Molecular Cardiovascular Sciences, Center for Non-Coding RNA Medicine, School of Basic Medical Sciences, Peking University, Beijing 100191, China
| | - Stefan Wuchty
- Department of Computer Science, University of Miami, Coral Gables, FL 33146, USA
- Department of Biology, University of Miami, Coral Gables, FL 33146, USA
- Sylvester Comprehensive Cancer Center, University of Miami, Miami, FL 33136, USA
- Institute of Data Science and Computing, University of Miami, Coral Gables, FL 33146, USA
| | - Yuan Zhou
- Department of Biomedical Informatics, Ministry of Education Key Laboratory of Molecular Cardiovascular Sciences, Center for Non-Coding RNA Medicine, School of Basic Medical Sciences, Peking University, Beijing 100191, China
| | - Ziding Zhang
- State Key Laboratory of Livestock and Poultry Biotechnology Breeding, College of Biological Sciences, China Agricultural University, Beijing 100193, China
| |
Collapse
|
19
|
Soleymani F, Paquet E, Viktor HL, Michalowski W, Spinello D. ProtInteract: A deep learning framework for predicting protein-protein interactions. Comput Struct Biotechnol J 2023; 21:1324-1348. [PMID: 36817951 PMCID: PMC9929211 DOI: 10.1016/j.csbj.2023.01.028] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2022] [Revised: 01/20/2023] [Accepted: 01/20/2023] [Indexed: 01/26/2023] Open
Abstract
Proteins mainly perform their functions by interacting with other proteins. Protein-protein interactions underpin various biological activities such as metabolic cycles, signal transduction, and immune response. However, due to the sheer number of proteins, experimental methods for finding interacting and non-interacting protein pairs are time-consuming and costly. We therefore developed the ProtInteract framework to predict protein-protein interaction. ProtInteract comprises two components: first, a novel autoencoder architecture that encodes each protein's primary structure to a lower-dimensional vector while preserving its underlying sequence attributes. This leads to faster training of the second network, a deep convolutional neural network (CNN) that receives encoded proteins and predicts their interaction under three different scenarios. In each scenario, the deep CNN predicts the class of a given encoded protein pair. Each class indicates different ranges of confidence scores corresponding to the probability of whether a predicted interaction occurs or not. The proposed framework features significantly low computational complexity and relatively fast response. The contributions of this work are twofold. First, ProtInteract assimilates the protein's primary structure into a pseudo-time series. Therefore, we leverage the nature of the time series of proteins and their physicochemical properties to encode a protein's amino acid sequence into a lower-dimensional vector space. This approach enables extracting highly informative sequence attributes while reducing computational complexity. Second, the ProtInteract framework utilises this information to identify protein interactions with other proteins based on its amino acid configuration. Our results suggest that the proposed framework performs with high accuracy and efficiency in predicting protein-protein interactions.
Collapse
Affiliation(s)
- Farzan Soleymani
- Department of Mechanical Engineering, University of Ottawa, Ottawa, ON K1N 6N5, Canada
| | - Eric Paquet
- National Research Council, 1200 Montreal Road, Ottawa, ON K1A 0R6, Canada,Corresponding author.
| | - Herna Lydia Viktor
- School of Electrical Engineering and Computer Science, University of Ottawa, ON K1N 6N5, Canada
| | | | - Davide Spinello
- Department of Mechanical Engineering, University of Ottawa, Ottawa, ON K1N 6N5, Canada
| |
Collapse
|
20
|
Rogers JR, Nikolényi G, AlQuraishi M. Growing ecosystem of deep learning methods for modeling protein-protein interactions. Protein Eng Des Sel 2023; 36:gzad023. [PMID: 38102755 DOI: 10.1093/protein/gzad023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2023] [Revised: 12/06/2023] [Accepted: 12/07/2023] [Indexed: 12/17/2023] Open
Abstract
Numerous cellular functions rely on protein-protein interactions. Efforts to comprehensively characterize them remain challenged however by the diversity of molecular recognition mechanisms employed within the proteome. Deep learning has emerged as a promising approach for tackling this problem by exploiting both experimental data and basic biophysical knowledge about protein interactions. Here, we review the growing ecosystem of deep learning methods for modeling protein interactions, highlighting the diversity of these biophysically informed models and their respective trade-offs. We discuss recent successes in using representation learning to capture complex features pertinent to predicting protein interactions and interaction sites, geometric deep learning to reason over protein structures and predict complex structures, and generative modeling to design de novo protein assemblies. We also outline some of the outstanding challenges and promising new directions. Opportunities abound to discover novel interactions, elucidate their physical mechanisms, and engineer binders to modulate their functions using deep learning and, ultimately, unravel how protein interactions orchestrate complex cellular behaviors.
Collapse
Affiliation(s)
- Julia R Rogers
- Department of Systems Biology, Columbia University, New York, NY 10032, USA
| | - Gergő Nikolényi
- Department of Systems Biology, Columbia University, New York, NY 10032, USA
| | | |
Collapse
|
21
|
Li X, Han P, Chen W, Gao C, Wang S, Song T, Niu M, Rodriguez-Patón A. MARPPI: boosting prediction of protein-protein interactions with multi-scale architecture residual network. Brief Bioinform 2023; 24:6887309. [PMID: 36502435 DOI: 10.1093/bib/bbac524] [Citation(s) in RCA: 14] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2022] [Revised: 09/29/2022] [Accepted: 11/04/2022] [Indexed: 12/14/2022] Open
Abstract
Protein-protein interactions (PPIs) are a major component of the cellular biochemical reaction network. Rich sequence information and machine learning techniques reduce the dependence of exploring PPIs on wet experiments, which are costly and time-consuming. This paper proposes a PPI prediction model, multi-scale architecture residual network for PPIs (MARPPI), based on dual-channel and multi-feature. Multi-feature leverages Res2vec to obtain the association information between residues, and utilizes pseudo amino acid composition, autocorrelation descriptors and multivariate mutual information to achieve the amino acid composition and order information, physicochemical properties and information entropy, respectively. Dual channel utilizes multi-scale architecture improved ResNet network which extracts protein sequence features to reduce protein feature loss. Compared with other advanced methods, MARPPI achieves 96.03%, 99.01% and 91.80% accuracy in the intraspecific datasets of Saccharomyces cerevisiae, Human and Helicobacter pylori, respectively. The accuracy on the two interspecific datasets of Human-Bacillus anthracis and Human-Yersinia pestis is 97.29%, and 95.30%, respectively. In addition, results on specific datasets of disease (neurodegenerative and metabolic disorders) demonstrate the ability to detect hidden interactions. To better illustrate the performance of MARPPI, evaluations on independent datasets and PPIs network suggest that MARPPI can be used to predict cross-species interactions. The above shows that MARPPI can be regarded as a concise, efficient and accurate tool for PPI datasets.
Collapse
Affiliation(s)
- Xue Li
- School of Computer Science and Technology, China University of Petroleum, Qingdao 266580, China
| | - Peifu Han
- School of Computer Science and Technology, China University of Petroleum, Qingdao 266580, China
| | - Wenqi Chen
- School of Computer Science and Technology, China University of Petroleum, Qingdao 266580, China
| | - Changnan Gao
- School of Computer Science and Technology, China University of Petroleum, Qingdao 266580, China
| | - Shuang Wang
- School of Computer Science and Technology, China University of Petroleum, Qingdao 266580, China
| | - Tao Song
- School of Computer Science and Technology, China University of Petroleum, Qingdao 266580, China
| | - Muyuan Niu
- School of Computer Science and Technology, China University of Petroleum, Qingdao 266580, China
| | - Alfonso Rodriguez-Patón
- School of Computer Science and Technology, China University of Petroleum, Qingdao 266580, China
| |
Collapse
|
22
|
Iuchi H, Kawasaki J, Kubo K, Fukunaga T, Hokao K, Yokoyama G, Ichinose A, Suga K, Hamada M. Bioinformatics approaches for unveiling virus-host interactions. Comput Struct Biotechnol J 2023; 21:1774-1784. [PMID: 36874163 PMCID: PMC9969756 DOI: 10.1016/j.csbj.2023.02.044] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2022] [Revised: 02/22/2023] [Accepted: 02/22/2023] [Indexed: 03/03/2023] Open
Abstract
The coronavirus disease-2019 (COVID-19) pandemic has elucidated major limitations in the capacity of medical and research institutions to appropriately manage emerging infectious diseases. We can improve our understanding of infectious diseases by unveiling virus-host interactions through host range prediction and protein-protein interaction prediction. Although many algorithms have been developed to predict virus-host interactions, numerous issues remain to be solved, and the entire network remains veiled. In this review, we comprehensively surveyed algorithms used to predict virus-host interactions. We also discuss the current challenges, such as dataset biases toward highly pathogenic viruses, and the potential solutions. The complete prediction of virus-host interactions remains difficult; however, bioinformatics can contribute to progress in research on infectious diseases and human health.
Collapse
Affiliation(s)
- Hitoshi Iuchi
- Waseda Research Institute for Science and Engineering, Waseda University, Tokyo 169-8555, Japan.,Computational Bio Big-Data Open Innovation Laboratory (CBBD-OIL), National Institute of Advanced Industrial Science and Technology (AIST), Tokyo 169-8555, Japan
| | - Junna Kawasaki
- Faculty of Science and Engineering, Waseda University, Okubo Shinjuku-ku, Tokyo 169-8555, Japan
| | - Kento Kubo
- Computational Bio Big-Data Open Innovation Laboratory (CBBD-OIL), National Institute of Advanced Industrial Science and Technology (AIST), Tokyo 169-8555, Japan.,School of Advanced Science and Engineering, Waseda University, Okubo Shinjuku-ku, Tokyo 169-8555, Japan
| | - Tsukasa Fukunaga
- Waseda Institute for Advanced Study, Waseda University, Nishi Waseda, Shinjuku-ku, Tokyo 169-0051, Japan
| | - Koki Hokao
- School of Advanced Science and Engineering, Waseda University, Okubo Shinjuku-ku, Tokyo 169-8555, Japan
| | - Gentaro Yokoyama
- Computational Bio Big-Data Open Innovation Laboratory (CBBD-OIL), National Institute of Advanced Industrial Science and Technology (AIST), Tokyo 169-8555, Japan.,School of Advanced Science and Engineering, Waseda University, Okubo Shinjuku-ku, Tokyo 169-8555, Japan
| | - Akiko Ichinose
- Waseda Research Institute for Science and Engineering, Waseda University, Tokyo 169-8555, Japan
| | - Kanta Suga
- School of Advanced Science and Engineering, Waseda University, Okubo Shinjuku-ku, Tokyo 169-8555, Japan
| | - Michiaki Hamada
- Waseda Research Institute for Science and Engineering, Waseda University, Tokyo 169-8555, Japan.,Computational Bio Big-Data Open Innovation Laboratory (CBBD-OIL), National Institute of Advanced Industrial Science and Technology (AIST), Tokyo 169-8555, Japan.,School of Advanced Science and Engineering, Waseda University, Okubo Shinjuku-ku, Tokyo 169-8555, Japan.,Graduate School of Medicine, Nippon Medical School, Tokyo 113-8602, Japan
| |
Collapse
|
23
|
Murakami Y, Mizuguchi K. Recent developments of sequence-based prediction of protein-protein interactions. Biophys Rev 2022; 14:1393-1411. [PMID: 36589735 PMCID: PMC9789376 DOI: 10.1007/s12551-022-01038-1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/08/2022] [Indexed: 12/25/2022] Open
Abstract
The identification of protein-protein interactions (PPIs) can lead to a better understanding of cellular functions and biological processes of proteins and contribute to the design of drugs to target disease-causing PPIs. In addition, targeting host-pathogen PPIs is useful for elucidating infection mechanisms. Although several experimental methods have been used to identify PPIs, these methods can yet to draw complete PPI networks. Hence, computational techniques are increasingly required for the prediction of potential PPIs, which have never been seen experimentally. Recent high-performance sequence-based methods have contributed to the construction of PPI networks and the elucidation of pathogenetic mechanisms in specific diseases. However, the usefulness of these methods depends on the quality and quantity of training data of PPIs. In this brief review, we introduce currently available PPI databases and recent sequence-based methods for predicting PPIs. Also, we discuss key issues in this field and present future perspectives of the sequence-based PPI predictions.
Collapse
Affiliation(s)
- Yoichi Murakami
- grid.440890.10000 0004 0640 9413Tokyo University of Information Sciences, 4-1 Onaridai, Wakaba-Ku, Chiba, 265-8501 Japan
| | - Kenji Mizuguchi
- grid.136593.b0000 0004 0373 3971Institute for Protein Research, Osaka University, 3-2 Yamadaoka, Suita-Shi, Osaka, 565-0871 Japan ,grid.482562.fNational Institutes of Biomedical Innovation, Health and Nutrition, 7-6-8 Saito Asagi, Ibaraki, Osaka 567-0085 Japan
| |
Collapse
|
24
|
Huang Y, Zhang Z, Zhou Y. AbAgIntPre: A deep learning method for predicting antibody-antigen interactions based on sequence information. Front Immunol 2022; 13:1053617. [PMID: 36618397 PMCID: PMC9813736 DOI: 10.3389/fimmu.2022.1053617] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2022] [Accepted: 12/14/2022] [Indexed: 12/24/2022] Open
Abstract
Introduction Antibody-mediated immunity is an essential part of the immune system in vertebrates. The ability to specifically bind to antigens allows antibodies to be widely used in the therapy of cancers and other critical diseases. A key step in antibody therapeutics is the experimental identification of antibody-antigen interactions, which is generally time-consuming, costly, and laborious. Although some computational methods have been proposed to screen potential antibodies, the dependence on 3D structures still limits the application of these methods. Methods Here, we developed a deep learning-assisted prediction method (i.e., AbAgIntPre) for fast identification of antibody-antigen interactions that only relies on amino acid sequences. A Siamese-like convolutional neural network architecture was established with the amino acid composition encoding scheme for both antigens and antibodies. Results and Discussion The generic model of AbAgIntPre achieved satisfactory performance with the Area Under Curve (AUC) of 0.82 on a high-quality generic independent test dataset. Besides, this approach also showed competitive performance on the more specific SARS-CoV dataset. We expect that AbAgIntPre can serve as an important complement to traditional experimental methods for antibody screening and effectively reduce the workload of antibody design. The web server of AbAgIntPre is freely available at http://www.zzdlab.com/AbAgIntPre.
Collapse
Affiliation(s)
- Yan Huang
- State Key Laboratory of Agrobiotechnology, College of Biological Sciences, China Agricultural University, Beijing, China,Department of Biomedical Informatics, Key Laboratory of Molecular Cardiovascular Sciences of the Ministry of Education, School of Basic Medical Sciences, Peking University, Beijing, China
| | - Ziding Zhang
- State Key Laboratory of Agrobiotechnology, College of Biological Sciences, China Agricultural University, Beijing, China,*Correspondence: Ziding Zhang, ; Yuan Zhou,
| | - Yuan Zhou
- Department of Biomedical Informatics, Key Laboratory of Molecular Cardiovascular Sciences of the Ministry of Education, School of Basic Medical Sciences, Peking University, Beijing, China,*Correspondence: Ziding Zhang, ; Yuan Zhou,
| |
Collapse
|
25
|
Neumann D, Roy S, Minhas FUAA, Ben-Hur A. On the choice of negative examples for prediction of host-pathogen protein interactions. FRONTIERS IN BIOINFORMATICS 2022; 2:1083292. [PMID: 36591335 PMCID: PMC9798088 DOI: 10.3389/fbinf.2022.1083292] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2022] [Accepted: 11/14/2022] [Indexed: 12/23/2022] Open
Abstract
As practitioners of machine learning in the area of bioinformatics we know that the quality of the results crucially depends on the quality of our labeled data. While there is a tendency to focus on the quality of positive examples, the negative examples are equally as important. In this opinion paper we revisit the problem of choosing negative examples for the task of predicting protein-protein interactions, either among proteins of a given species or for host-pathogen interactions and describe important issues that are prevalent in the current literature. The challenge in creating datasets for this task is the noisy nature of the experimentally derived interactions and the lack of information on non-interacting proteins. A standard approach is to choose random pairs of non-interacting proteins as negative examples. Since the interactomes of all species are only partially known, this leads to a very small percentage of false negatives. This is especially true for host-pathogen interactions. To address this perceived issue, some researchers have chosen to select negative examples as pairs of proteins whose sequence similarity to the positive examples is sufficiently low. This clearly reduces the chance for false negatives, but also makes the problem much easier than it really is, leading to over-optimistic accuracy estimates. We demonstrate the effect of this form of bias using a selection of recent protein interaction prediction methods of varying complexity, and urge researchers to pay attention to the details of generating their datasets for potential biases like this.
Collapse
Affiliation(s)
- Don Neumann
- Department Computer Science, Colorado State University, Fort Collins, CO, United States,*Correspondence: Don Neumann, ; Asa Ben-Hur,
| | - Soumyadip Roy
- Department Computer Science, Colorado State University, Fort Collins, CO, United States
| | | | - Asa Ben-Hur
- Department Computer Science, Colorado State University, Fort Collins, CO, United States,*Correspondence: Don Neumann, ; Asa Ben-Hur,
| |
Collapse
|
26
|
Asim MN, Fazeel A, Ibrahim MA, Dengel A, Ahmed S. MP-VHPPI: Meta predictor for viral host protein-protein interaction prediction in multiple hosts and viruses. Front Med (Lausanne) 2022; 9:1025887. [PMID: 36465911 PMCID: PMC9709337 DOI: 10.3389/fmed.2022.1025887] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2022] [Accepted: 10/17/2022] [Indexed: 09/19/2023] Open
Abstract
Viral-host protein-protein interaction (VHPPI) prediction is essential to decoding molecular mechanisms of viral pathogens and host immunity processes that eventually help to control the propagation of viral diseases and to design optimized therapeutics. Multiple AI-based predictors have been developed to predict diverse VHPPIs across a wide range of viruses and hosts, however, these predictors produce better performance only for specific types of hosts and viruses. The prime objective of this research is to develop a robust meta predictor (MP-VHPPI) capable of more accurately predicting VHPPI across multiple hosts and viruses. The proposed meta predictor makes use of two well-known encoding methods Amphiphilic Pseudo-Amino Acid Composition (APAAC) and Quasi-sequence (QS) Order that capture amino acids sequence order and distributional information to most effectively generate the numerical representation of complete viral-host raw protein sequences. Feature agglomeration method is utilized to transform the original feature space into a more informative feature space. Random forest (RF) and Extra tree (ET) classifiers are trained on optimized feature space of both APAAC and QS order separate encoders and by combining both encodings. Further predictions of both classifiers are utilized to feed the Support Vector Machine (SVM) classifier that makes final predictions. The proposed meta predictor is evaluated over 7 different benchmark datasets, where it outperforms existing VHPPI predictors with an average performance of 3.07, 6.07, 2.95, and 2.85% in terms of accuracy, Mathews correlation coefficient, precision, and sensitivity, respectively. To facilitate the scientific community, the MP-VHPPI web server is available at https://sds_genetic_analysis.opendfki.de/MP-VHPPI/.
Collapse
Affiliation(s)
- Muhammad Nabeel Asim
- Department of Computer Science, Technical University of Kaiserslautern, Kaiserslautern, Germany
- German Research Center for Artificial Intelligence GmbH, Kaiserslautern, Germany
| | - Ahtisham Fazeel
- Department of Computer Science, Technical University of Kaiserslautern, Kaiserslautern, Germany
- German Research Center for Artificial Intelligence GmbH, Kaiserslautern, Germany
| | - Muhammad Ali Ibrahim
- Department of Computer Science, Technical University of Kaiserslautern, Kaiserslautern, Germany
- German Research Center for Artificial Intelligence GmbH, Kaiserslautern, Germany
| | - Andreas Dengel
- Department of Computer Science, Technical University of Kaiserslautern, Kaiserslautern, Germany
- German Research Center for Artificial Intelligence GmbH, Kaiserslautern, Germany
| | - Sheraz Ahmed
- German Research Center for Artificial Intelligence GmbH, Kaiserslautern, Germany
| |
Collapse
|
27
|
Cross-attention PHV: Prediction of human and virus protein-protein interactions using cross-attention-based neural networks. Comput Struct Biotechnol J 2022; 20:5564-5573. [PMID: 36249566 PMCID: PMC9546503 DOI: 10.1016/j.csbj.2022.10.012] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2022] [Revised: 10/05/2022] [Accepted: 10/05/2022] [Indexed: 11/30/2022] Open
Abstract
Cross-attention PHV implements two key technologies: cross-attention mechanism and 1D-CNN. It accurately predicts PPIs between human and unknown influenza viruses/SARS-CoV-2. It extracts critical taxonomic and evolutionary differences responsible for PPI prediction.
Viral infections represent a major health concern worldwide. The alarming rate at which SARS-CoV-2 spreads, for example, led to a worldwide pandemic. Viruses incorporate genetic material into the host genome to hijack host cell functions such as the cell cycle and apoptosis. In these viral processes, protein–protein interactions (PPIs) play critical roles. Therefore, the identification of PPIs between humans and viruses is crucial for understanding the infection mechanism and host immune responses to viral infections and for discovering effective drugs. Experimental methods including mass spectrometry-based proteomics and yeast two-hybrid assays are widely used to identify human-virus PPIs, but these experimental methods are time-consuming, expensive, and laborious. To overcome this problem, we developed a novel computational predictor, named cross-attention PHV, by implementing two key technologies of the cross-attention mechanism and a one-dimensional convolutional neural network (1D-CNN). The cross-attention mechanisms were very effective in enhancing prediction and generalization abilities. Application of 1D-CNN to the word2vec-generated feature matrices reduced computational costs, thus extending the allowable length of protein sequences to 9000 amino acid residues. Cross-attention PHV outperformed existing state-of-the-art models using a benchmark dataset and accurately predicted PPIs for unknown viruses. Cross-attention PHV also predicted human–SARS-CoV-2 PPIs with area under the curve values >0.95. The Cross-attention PHV web server and source codes are freely available at https://kurata35.bio.kyutech.ac.jp/Cross-attention_PHV/ and https://github.com/kuratahiroyuki/Cross-Attention_PHV, respectively.
Collapse
Key Words
- 1D-CNN, One-dimensional-CNN
- AC, Accuracy
- AUC, Area under the curve
- CNN, Convolutional neural network
- Convolutional neural network
- DT, Decision tree
- F1, F1-score
- HV-PPIs, Human-virus PPIs
- HuV-PPI, Human–unknown virus PPI
- Human
- LR, Linear regression
- MCC, Matthews correlation coefficient
- PPIs, Protein-protein interactions
- Protein–protein interaction
- RF, Random forest
- SARS-CoV-2
- SARS-CoV-2, Severe acute respiratory syndrome coronavirus 2
- SN, Sensitivity
- SP, Specificity
- SVM, Support vector machine
- T-SNE, T-distributed stochastic neighbor embedding
- Virus
- W2V, Word2vec
- Word2vec
Collapse
|
28
|
Koca MB, Nourani E, Abbasoğlu F, Karadeniz İ, Sevilgen FE. Graph convolutional network based virus-human protein-protein interaction prediction for novel viruses. Comput Biol Chem 2022; 101:107755. [PMID: 36037723 DOI: 10.1016/j.compbiolchem.2022.107755] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2022] [Revised: 07/07/2022] [Accepted: 08/10/2022] [Indexed: 11/03/2022]
Abstract
Computational identification of human-virus protein-protein interactions (PHIs) is a worthwhile step towards understanding infection mechanisms. Analysis of the PHI networks is important for the determination of pathogenic diseases. Prediction of these interactions is a popular problem since experimental detection of PHIs is both time-consuming and expensive. The available methods use biological features like amino acid sequences, molecular structure, or biological activities for prediction. Recent studies show that the topological properties of proteins in protein-protein interaction (PPI) networks increase the performance of the predictions. The basic network projections, random-walk-based models, or graph neural networks are used for generating topologically enriched (hybrid) protein embeddings. In this study, we propose a three-stage machine learning pipeline that generates and uses hybrid embeddings for PHI prediction. In the first stage, numerical features are extracted from the amino acid sequences using the Doc2Vec and Byte Pair Encoding method. The amino acid embeddings are used as node features while training a modified GraphSAGE model, which is an improved version of the graph convolutional network. Lastly, the hybrid protein embeddings are used for training a binary interaction classifier model that predicts whether there is an interaction between the given two proteins or not. The proposed method is evaluated with comprehensive experiments to test its functionality and compare it with the state-of-art methods. The experimental results on the benchmark dataset prove the efficiency of the proposed model by having a 3-23% better area under curve (AUC) score than its competitors.
Collapse
Affiliation(s)
- Mehmet Burak Koca
- Department of Computer Engineering, Faculty of Engineering, Gebze Technical University, Kocaeli, Turkey
| | - Esmaeil Nourani
- Department of Information Technology, Faculty of Computer Engineering and Information Technology, Azarbaijan Shahid Madani University, Tabriz, Iran
| | - Ferda Abbasoğlu
- Department of Computer Engineering, Faculty of Engineering, Gebze Technical University, Kocaeli, Turkey
| | - İlknur Karadeniz
- Department of Computer Engineering, Faculty of Engineering and Natural Sciences, Işık University, İstanbul, Turkey.
| | - Fatih Erdoğan Sevilgen
- Department of Computer Engineering, Faculty of Engineering, Gebze Technical University, Kocaeli, Turkey; Institute for Data Science and Artificial Intelligence, Boğaziçi University, İstanbul, Turkey
| |
Collapse
|
29
|
Yang X, Yang S, Ren P, Wuchty S, Zhang Z. Deep Learning-Powered Prediction of Human-Virus Protein-Protein Interactions. Front Microbiol 2022; 13:842976. [PMID: 35495666 PMCID: PMC9051481 DOI: 10.3389/fmicb.2022.842976] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2021] [Accepted: 03/25/2022] [Indexed: 11/13/2022] Open
Abstract
Identifying human-virus protein-protein interactions (PPIs) is an essential step for understanding viral infection mechanisms and antiviral response of the human host. Recent advances in high-throughput experimental techniques enable the significant accumulation of human-virus PPI data, which have further fueled the development of machine learning-based human-virus PPI prediction methods. Emerging as a very promising method to predict human-virus PPIs, deep learning shows the powerful ability to integrate large-scale datasets, learn complex sequence-structure relationships of proteins and convert the learned patterns into final prediction models with high accuracy. Focusing on the recent progresses of deep learning-powered human-virus PPI predictions, we review technical details of these newly developed methods, including dataset preparation, deep learning architectures, feature engineering, and performance assessment. Moreover, we discuss the current challenges and potential solutions and provide future perspectives of human-virus PPI prediction in the coming post-AlphaFold2 era.
Collapse
Affiliation(s)
- Xiaodi Yang
- State Key Laboratory of Agrobiotechnology, College of Biological Sciences, China Agricultural University, Beijing, China
| | - Shiping Yang
- State Key Laboratory of Plant Physiology and Biochemistry, College of Biological Sciences, China Agricultural University, Beijing, China
| | - Panyu Ren
- State Key Laboratory of Agrobiotechnology, College of Biological Sciences, China Agricultural University, Beijing, China
| | - Stefan Wuchty
- Department of Computer Science, University of Miami, Miami, FL, United States
- Department of Biology, University of Miami, Miami, FL, United States
- Sylvester Comprehensive Cancer Center, University of Miami, Miami, FL, United States
| | - Ziding Zhang
- State Key Laboratory of Agrobiotechnology, College of Biological Sciences, China Agricultural University, Beijing, China
- *Correspondence: Ziding Zhang,
| |
Collapse
|
30
|
Hu X, Feng C, Ling T, Chen M. Deep learning frameworks for protein–protein interaction prediction. Comput Struct Biotechnol J 2022; 20:3223-3233. [PMID: 35832624 PMCID: PMC9249595 DOI: 10.1016/j.csbj.2022.06.025] [Citation(s) in RCA: 19] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2022] [Revised: 05/27/2022] [Accepted: 06/12/2022] [Indexed: 11/26/2022] Open
|