1
|
Yang X, Wuchty S, Liang Z, Ji L, Wang B, Zhu J, Zhang Z, Dong Y. Multi-modal features-based human-herpesvirus protein-protein interaction prediction by using LightGBM. Brief Bioinform 2024; 25:bbae005. [PMID: 38279649 PMCID: PMC10818167 DOI: 10.1093/bib/bbae005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2023] [Revised: 12/25/2023] [Accepted: 01/01/2021] [Indexed: 01/28/2024] Open
Abstract
The identification of human-herpesvirus protein-protein interactions (PPIs) is an essential and important entry point to understand the mechanisms of viral infection, especially in malignant tumor patients with common herpesvirus infection. While natural language processing (NLP)-based embedding techniques have emerged as powerful approaches, the application of multi-modal embedding feature fusion to predict human-herpesvirus PPIs is still limited. Here, we established a multi-modal embedding feature fusion-based LightGBM method to predict human-herpesvirus PPIs. In particular, we applied document and graph embedding approaches to represent sequence, network and function modal features of human and herpesviral proteins. Training our LightGBM models through our compiled non-rigorous and rigorous benchmarking datasets, we obtained significantly better performance compared to individual-modal features. Furthermore, our model outperformed traditional feature encodings-based machine learning methods and state-of-the-art deep learning-based methods using various benchmarking datasets. In a transfer learning step, we show that our model that was trained on human-herpesvirus PPI dataset without cytomegalovirus data can reliably predict human-cytomegalovirus PPIs, indicating that our method can comprehensively capture multi-modal fusion features of protein interactions across various herpesvirus subtypes. The implementation of our method is available at https://github.com/XiaodiYangpku/MultimodalPPI/.
Collapse
Affiliation(s)
- Xiaodi Yang
- Department of Hematology, Peking University First Hospital, Beijing, China
| | - Stefan Wuchty
- Department of Computer Science, University of Miami, Miami FL, 33146, USA
- Department of Biology, University of Miami, Miami FL, 33146, USA
- Institute of Data Science and Computation, University of Miami, Miami, FL 33146, USA
- Sylvester Comprehensive Cancer Center, University of Miami, Miami, FL 33136, USA
| | - Zeyin Liang
- Department of Hematology, Peking University First Hospital, Beijing, China
| | - Li Ji
- Department of Hematology, Peking University First Hospital, Beijing, China
| | - Bingjie Wang
- Department of Hematology, Peking University First Hospital, Beijing, China
| | - Jialin Zhu
- Department of Hematology, Peking University First Hospital, Beijing, China
| | - Ziding Zhang
- State Key Laboratory of Animal Biotech Breeding, College of Biological Sciences, China Agricultural University, Beijing 100193, China
| | - Yujun Dong
- Department of Hematology, Peking University First Hospital, Beijing, China
| |
Collapse
|
2
|
Ren P, Yang X, Wang T, Hou Y, Zhang Z. Proteome-wide prediction and analysis of the Cryptosporidium parvum protein-protein interaction network through integrative methods. Comput Struct Biotechnol J 2022; 20:2322-2331. [PMID: 35615014 PMCID: PMC9120227 DOI: 10.1016/j.csbj.2022.05.017] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2022] [Revised: 05/08/2022] [Accepted: 05/09/2022] [Indexed: 11/03/2022] Open
Abstract
By combining a sequence embedding technique (i.e., Doc2Vec) and a di-peptide composition representation to convert protein sequences into feature vectors, we proposed an RF classifier trained on the Plasmodium falciparum dataset for predicting Cryptosporidium parvum PPIs. A high-confidence Cryptosporidium parvum PPI network was identified by conjoining interolog mapping, domain-domain interaction-based inference, and the RF classifier. Some detected hub proteins and functional modules provided clues for an in-depth biological understanding of Cryptosporidium parvum.
As one of the most studied Apicomplexan parasite Cryptosporidium, Cryptosporidium parvum (C. parvum) causes worldwide serious diarrhea disease cryptosporidiosis, which can be deadly to immunodeficiency individuals, newly born children, and animals. Proteome-wide identification of protein–protein interactions (PPIs) has proven valuable in the systematic understanding of the genome-phenome relationship. However, the PPIs of C. parvum are largely unknown because of the limited experimental studies carried out. Therefore, we took full advantage of three bioinformatics methods, i.e., interolog mapping (IM), domain-domain interaction (DDI)-based inference, and machine learning (ML) method, to jointly predict PPIs of C. parvum. Due to the lack of experimental PPIs of C. parvum, we used the PPI data of Plasmodium falciparum (P. falciparum), which owned the largest number of PPIs in Apicomplexa, to train an ML model to infer C. parvum PPIs. We utilized consistent results of these three methods as the predicted high-confidence PPI network, which contains 4,578 PPIs covering 554 proteins. To further explore the biological significance of the constructed PPI network, we also conducted essential network and protein functional analysis, mainly focusing on hub proteins and functional modules. We anticipate the constructed PPI network can become an important data resource to accelerate the functional genomics studies of C. parvum as well as offer new hints to the target discovery in developing drugs/vaccines.
Collapse
|
3
|
Deciphering the Host-Pathogen Interactome of the Wheat-Common Bunt System: A Step towards Enhanced Resilience in Next Generation Wheat. Int J Mol Sci 2022; 23:ijms23052589. [PMID: 35269732 PMCID: PMC8910311 DOI: 10.3390/ijms23052589] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2022] [Accepted: 02/09/2022] [Indexed: 02/05/2023] Open
Abstract
Common bunt, caused by two fungal species, Tilletia caries and Tilletia laevis, is one of the most potentially destructive diseases of wheat. Despite the availability of synthetic chemicals against the disease, organic agriculture relies greatly on resistant cultivars. Using two computational approaches—interolog and domain-based methods—a total of approximately 58 M and 56 M probable PPIs were predicted in T. aestivum–T. caries and T. aestivum–T. laevis interactomes, respectively. We also identified 648 and 575 effectors in the interactions from T. caries and T. laevis, respectively. The major host hubs belonged to the serine/threonine protein kinase, hsp70, and mitogen-activated protein kinase families, which are actively involved in plant immune signaling during stress conditions. The Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analysis of the host proteins revealed significant GO terms (O-methyltransferase activity, regulation of response to stimulus, and plastid envelope) and pathways (NF-kappa B signaling and the MAPK signaling pathway) related to plant defense against pathogens. Subcellular localization suggested that most of the pathogen proteins target the host in the plastid. Furthermore, a comparison between unique T. caries and T. laevis proteins was carried out. We also identified novel host candidates that are resistant to disease. Additionally, the host proteins that serve as transcription factors were also predicted.
Collapse
|
4
|
Ogawa K, Nakamura S, Oguri H, Ryu K, Yoneda T, Hosoki R. Effective Search of Triterpenes with Anti-HSV-1 Activity Using a Classification Model by Logistic Regression. Front Chem 2021; 9:763794. [PMID: 34796164 PMCID: PMC8593400 DOI: 10.3389/fchem.2021.763794] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2021] [Accepted: 10/11/2021] [Indexed: 11/13/2022] Open
Abstract
Natural products are an excellent source of skeletons for medicinal seeds. Triterpenes and saponins are representative natural products that exhibit anti-herpes simplex virus type 1 (HSV-1) activity. However, there has been a lack of comprehensive information on the anti-HSV-1 activity of triterpenes. Therefore, expanding information on the anti-HSV-1 activity of triterpenes and improving the efficiency of their exploration are urgently required. To improve the efficiency of the development of anti-HSV-1 active compounds, we constructed a predictive model for the anti-HSV-1 activity of triterpenes by using the information obtained from previous studies using machine learning methods. In this study, we constructed a binary classification model (i.e., active or inactive) using a logistic regression algorithm. As a result of the evaluation of predictive model, the accuracy for the test data is 0.79, and the area under the curve (AUC) is 0.86. Additionally, to enrich the information on the anti-HSV-1 activity of triterpenes, a plaque reduction assay was performed on 20 triterpenes. As a result, chikusetsusaponin IVa (11: IC50 = 13.06 μM) was found to have potent anti-HSV-1 with three potentially anti-HSV-1 active triterpenes. The assay result was further used for external validation of predictive model. The prediction of the test compounds in the activity test showed a high accuracy (0.83) and AUC (0.81). We also found that this predictive model was found to be able to successfully narrow down the active compounds. This study provides more information on the anti-HSV-1 activity of triterpenes. Moreover, the predictive model can improve the efficiency of the development of active triterpenes by integrating many previous studies to clarify potential relationships.
Collapse
Affiliation(s)
- Keiko Ogawa
- Laboratory of Regulatory Science, College of Pharmaceutical Sciences, Ritsumeikan University, Kusatsu, Japan
| | - Seikou Nakamura
- Department of Pharmacognosy, Kyoto Pharmaceutical University, Kyoto, Japan
| | - Haruka Oguri
- Laboratory of Regulatory Science, College of Pharmaceutical Sciences, Ritsumeikan University, Kusatsu, Japan
| | - Kaori Ryu
- Department of Pharmacognosy, Kyoto Pharmaceutical University, Kyoto, Japan
| | - Taichi Yoneda
- Department of Pharmacognosy, Kyoto Pharmaceutical University, Kyoto, Japan
| | - Rumiko Hosoki
- Laboratory of Regulatory Science, College of Pharmaceutical Sciences, Ritsumeikan University, Kusatsu, Japan
| |
Collapse
|
5
|
Astore C, Zhou H, Jacob J, Skolnick J. Prediction of severe adverse events, modes of action and drug treatments for COVID-19's complications. Sci Rep 2021; 11:20864. [PMID: 34675303 PMCID: PMC8531388 DOI: 10.1038/s41598-021-00368-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2021] [Accepted: 10/06/2021] [Indexed: 01/08/2023] Open
Abstract
Following SARS-CoV-2 infection, some COVID-19 patients experience severe host driven adverse events. To treat these complications, their underlying etiology and drug treatments must be identified. Thus, a novel AI methodology MOATAI-VIR, which predicts disease-protein-pathway relationships and repurposed FDA-approved drugs to treat COVID-19's clinical manifestations was developed. SARS-CoV-2 interacting human proteins and GWAS identified respiratory failure genes provide the input from which the mode-of-action (MOA) proteins/pathways of the resulting disease comorbidities are predicted. These comorbidities are then mapped to their clinical manifestations. To assess each manifestation's molecular basis, their prioritized shared proteins were subject to global pathway analysis. Next, the molecular features associated with hallmark COVID-19 phenotypes, e.g. unusual neurological symptoms, cytokine storms, and blood clots were explored. In practice, 24/26 of the major clinical manifestations are successfully predicted. Three major uncharacterized manifestation categories including neoplasms are also found. The prevalence of neoplasms suggests that SARS-CoV-2 might be an oncovirus due to shared molecular mechanisms between oncogenesis and viral replication. Then, repurposed FDA-approved drugs that might treat COVID-19's clinical manifestations are predicted by virtual ligand screening of the most frequent comorbid protein targets. These drugs might help treat both COVID-19's severe adverse events and lesser ones such as loss of taste/smell.
Collapse
Affiliation(s)
- Courtney Astore
- Center for the Study of Systems Biology, School of Biological Sciences, Georgia Institute of Technology, 950 Atlantic Drive, N.W., Atlanta, GA, 30332, USA
| | - Hongyi Zhou
- Center for the Study of Systems Biology, School of Biological Sciences, Georgia Institute of Technology, 950 Atlantic Drive, N.W., Atlanta, GA, 30332, USA
| | - Joshy Jacob
- Emory Vaccine Center, Emory University, Atlanta, GA, 30329, USA
- Yerkes National Primate Research Center, Emory University, Atlanta, GA, 30329, USA
- Department of Microbiology and Immunology, Emory Vaccine Center, School of Medicine, Emory University, Atlanta, GA, 30329, USA
| | - Jeffrey Skolnick
- Center for the Study of Systems Biology, School of Biological Sciences, Georgia Institute of Technology, 950 Atlantic Drive, N.W., Atlanta, GA, 30332, USA.
| |
Collapse
|
6
|
Yang X, Yang S, Lian X, Wuchty S, Zhang Z. Transfer learning via multi-scale convolutional neural layers for human-virus protein-protein interaction prediction. Bioinformatics 2021; 37:4771-4778. [PMID: 34273146 PMCID: PMC8406877 DOI: 10.1093/bioinformatics/btab533] [Citation(s) in RCA: 27] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2021] [Revised: 06/03/2021] [Accepted: 07/16/2021] [Indexed: 11/20/2022] Open
Abstract
Motivation To complement experimental efforts, machine learning-based computational methods are playing an increasingly important role to predict human–virus protein–protein interactions (PPIs). Furthermore, transfer learning can effectively apply prior knowledge obtained from a large source dataset/task to a small target dataset/task, improving prediction performance. Results To predict interactions between human and viral proteins, we combine evolutionary sequence profile features with a Siamese convolutional neural network (CNN) architecture and a multi-layer perceptron. Our architecture outperforms various feature encodings-based machine learning and state-of-the-art prediction methods. As our main contribution, we introduce two transfer learning methods (i.e. ‘frozen’ type and ‘fine-tuning’ type) that reliably predict interactions in a target human–virus domain based on training in a source human–virus domain, by retraining CNN layers. Finally, we utilize the ‘frozen’ type transfer learning approach to predict human–SARS-CoV-2 PPIs, indicating that our predictions are topologically and functionally similar to experimentally known interactions. Availability and implementation: The source codes and datasets are available at https://github.com/XiaodiYangCAU/TransPPI/. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Xiaodi Yang
- State Key Laboratory of Agrobiotechnology, College of Biological Sciences, China Agricultural University, Beijing 100193, China
| | - Shiping Yang
- State Key Laboratory of Plant Physiology and Biochemistry, College of Biological Sciences, China Agricultural University, Beijing 100193, China
| | - Xianyi Lian
- State Key Laboratory of Agrobiotechnology, College of Biological Sciences, China Agricultural University, Beijing 100193, China
| | - Stefan Wuchty
- Dept. of Computer Science, University of Miami, Miami, FL 33146, USA.,Dept. of Biology, University of Miami, Miami, FL 33146, USA.,Sylvester Comprehensive Cancer Center, University of Miami, Miami, FL 33136, USA
| | - Ziding Zhang
- State Key Laboratory of Agrobiotechnology, College of Biological Sciences, China Agricultural University, Beijing 100193, China
| |
Collapse
|
7
|
Lian X, Yang X, Yang S, Zhang Z. Current status and future perspectives of computational studies on human-virus protein-protein interactions. Brief Bioinform 2021; 22:6161422. [PMID: 33693490 DOI: 10.1093/bib/bbab029] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2020] [Revised: 01/14/2021] [Accepted: 01/20/2021] [Indexed: 12/19/2022] Open
Abstract
The protein-protein interactions (PPIs) between human and viruses mediate viral infection and host immunity processes. Therefore, the study of human-virus PPIs can help us understand the principles of human-virus relationships and can thus guide the development of highly effective drugs to break the transmission of viral infectious diseases. Recent years have witnessed the rapid accumulation of experimentally identified human-virus PPI data, which provides an unprecedented opportunity for bioinformatics studies revolving around human-virus PPIs. In this article, we provide a comprehensive overview of computational studies on human-virus PPIs, especially focusing on the method development for human-virus PPI predictions. We briefly introduce the experimental detection methods and existing database resources of human-virus PPIs, and then discuss the research progress in the development of computational prediction methods. In particular, we elaborate the machine learning-based prediction methods and highlight the need to embrace state-of-the-art deep-learning algorithms and new feature engineering techniques (e.g. the protein embedding technique derived from natural language processing). To further advance the understanding in this research topic, we also outline the practical applications of the human-virus interactome in fundamental biological discovery and new antiviral therapy development.
Collapse
Affiliation(s)
- Xianyi Lian
- State Key Laboratory of Agrobiotechnology, College of Biological Sciences, China Agricultural University, Beijing 100193, China
| | - Xiaodi Yang
- State Key Laboratory of Agrobiotechnology, College of Biological Sciences, China Agricultural University, Beijing 100193, China
| | - Shiping Yang
- State Key Laboratory of Plant Physiology and Biochemistry, College of Biological Sciences, China Agricultural University, Beijing 100193, China
| | - Ziding Zhang
- State Key Laboratory of Agrobiotechnology, College of Biological Sciences, China Agricultural University, Beijing 100193, China
| |
Collapse
|