1
|
Su Z, Wu Y, Cao K, Du J, Cao L, Wu Z, Wu X, Wang X, Song Y, Wang X, Duan H. APEX-pHLA: A novel method for accurate prediction of the binding between exogenous short peptides and HLA class I molecules. Methods 2024; 228:38-47. [PMID: 38772499 DOI: 10.1016/j.ymeth.2024.05.013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2024] [Revised: 04/28/2024] [Accepted: 05/18/2024] [Indexed: 05/23/2024] Open
Abstract
Human leukocyte antigen (HLA) molecules play critically significant role within the realm of immunotherapy due to their capacities to recognize and bind exogenous antigens such as peptides, subsequently delivering them to immune cells. Predicting the binding between peptides and HLA molecules (pHLA) can expedite the screening of immunogenic peptides and facilitate vaccine design. However, traditional experimental methods are time-consuming and inefficient. In this study, an efficient method based on deep learning was developed for predicting peptide-HLA binding, which treated peptide sequences as linguistic entities. It combined the architectures of textCNN and BiLSTM to create a deep neural network model called APEX-pHLA. This model operated without limitations related to HLA class I allele variants and peptide segment lengths, enabling efficient encoding of sequence features for both HLA and peptide segments. On the independent test set, the model achieved Accuracy, ROC_AUC, F1, and MCC is 0.9449, 0.9850, 0.9453, and 0.8899, respectively. Similarly, on an external test set, the results were 0.9803, 0.9574, 0.8835, and 0.7863, respectively. These findings outperformed fifteen methods previously reported in the literature. The accurate prediction capability of the APEX-pHLA model in peptide-HLA binding might provide valuable insights for future HLA vaccine design.
Collapse
Affiliation(s)
- Zhihao Su
- College of Pharmaceutical Sciences, Zhejiang University of Technology, Hangzhou, Zhejiang 310014, China.
| | - Yejian Wu
- College of Pharmaceutical Sciences, Zhejiang University of Technology, Hangzhou, Zhejiang 310014, China
| | - Kaiqiang Cao
- College of Pharmaceutical Sciences, Zhejiang University of Technology, Hangzhou, Zhejiang 310014, China.
| | - Jie Du
- College of Pharmaceutical Sciences, Zhejiang University of Technology, Hangzhou, Zhejiang 310014, China.
| | - Lujing Cao
- College of Pharmaceutical Sciences, Zhejiang University of Technology, Hangzhou, Zhejiang 310014, China
| | - Zhipeng Wu
- College of Pharmaceutical Sciences, Zhejiang University of Technology, Hangzhou, Zhejiang 310014, China
| | - Xinyi Wu
- College of Pharmaceutical Sciences, Zhejiang University of Technology, Hangzhou, Zhejiang 310014, China
| | - Xinqiao Wang
- College of Pharmaceutical Sciences, Zhejiang University of Technology, Hangzhou, Zhejiang 310014, China
| | - Ying Song
- College of Pharmaceutical Sciences, Zhejiang University of Technology, Hangzhou, Zhejiang 310014, China.
| | - Xudong Wang
- College of Pharmaceutical Sciences, Zhejiang University of Technology, Hangzhou, Zhejiang 310014, China.
| | - Hongliang Duan
- Faculty of Applied Sciences, Macao Polytechnic University, Macao 999078, China.
| |
Collapse
|
2
|
Hong N, Jiang D, Wang Z, Sun H, Luo H, Bao L, Song M, Kang Y, Hou T. TransfIGN: A Structure-Based Deep Learning Method for Modeling the Interaction between HLA-A*02:01 and Antigen Peptides. J Chem Inf Model 2024; 64:5016-5027. [PMID: 38920330 DOI: 10.1021/acs.jcim.4c00678] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/27/2024]
Abstract
The intricate interaction between major histocompatibility complexes (MHCs) and antigen peptides with diverse amino acid sequences plays a pivotal role in immune responses and T cell activity. In recent years, deep learning (DL)-based models have emerged as promising tools for accelerating antigen peptide screening. However, most of these models solely rely on one-dimensional amino acid sequences, overlooking crucial information required for the three-dimensional (3-D) space binding process. In this study, we propose TransfIGN, a structure-based DL model that is inspired by our previously developed framework, Interaction Graph Network (IGN), and incorporates sequence information from transformers to predict the interactions between HLA-A*02:01 and antigen peptides. Our model, trained on a comprehensive data set containing 61,816 sequences with 9051 binding affinity labels and 56,848 eluted ligand labels, achieves an area under the curve (AUC) of 0.893 on the binary data set, better than state-of-the-art sequence-based models trained on larger data sets such as NetMHCpan4.1, ANN, and TransPHLA. Furthermore, when evaluated on the IEDB weekly benchmark data sets, our predictions (AUC = 0.816) are better than those of the recommended methods like the IEDB consensus (AUC = 0.795). Notably, the interaction weight matrices generated by our method highlight the strong interactions at specific positions within peptides, emphasizing the model's ability to provide physical interpretability. This capability to unveil binding mechanisms through intricate structural features holds promise for new immunotherapeutic avenues.
Collapse
Affiliation(s)
- Nanqi Hong
- College of Computer Science and Technology, Zhejiang University, Hangzhou, Zhejiang 310027, China
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Zhejiang University, Hangzhou, Zhejiang 310058, China
| | - Dejun Jiang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Zhejiang University, Hangzhou, Zhejiang 310058, China
| | - Zhe Wang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Zhejiang University, Hangzhou, Zhejiang 310058, China
| | - Huiyong Sun
- Department of Medicinal Chemistry, China Pharmaceutical University, Nanjing, Jiangsu 210009, China
| | - Hao Luo
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Zhejiang University, Hangzhou, Zhejiang 310058, China
| | - Lingjie Bao
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Zhejiang University, Hangzhou, Zhejiang 310058, China
| | - Mingli Song
- College of Computer Science and Technology, Zhejiang University, Hangzhou, Zhejiang 310027, China
| | - Yu Kang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Zhejiang University, Hangzhou, Zhejiang 310058, China
| | - Tingjun Hou
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Zhejiang University, Hangzhou, Zhejiang 310058, China
| |
Collapse
|
3
|
Giziński S, Preibisch G, Kucharski P, Tyrolski M, Rembalski M, Grzegorczyk P, Gambin A. Enhancing antigenic peptide discovery: Improved MHC-I binding prediction and methodology. Methods 2024; 224:1-9. [PMID: 38295891 DOI: 10.1016/j.ymeth.2024.01.016] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2023] [Revised: 12/30/2023] [Accepted: 01/16/2024] [Indexed: 02/05/2024] Open
Abstract
The Major Histocompatibility Complex (MHC) is a critical element of the vertebrate cellular immune system, responsible for presenting peptides derived from intracellular proteins. MHC-I presentation is pivotal in the immune response and holds considerable potential in the realms of vaccine development and cancer immunotherapy. This study delves into the limitations of current methods and benchmarks for MHC-I presentation. We introduce a novel benchmark designed to assess generalization properties and the reliability of models on unseen MHC molecules and peptides, with a focus on the Human Leukocyte Antigen (HLA)-a specific subset of MHC genes present in humans. Finally, we introduce HLABERT, a pretrained language model that outperforms previous methods significantly on our benchmark and establishes a new state-of-the-art on existing benchmarks.
Collapse
Affiliation(s)
| | - Grzegorz Preibisch
- Deepflare, Warsaw, Poland; University of Warsaw, Department of Mathematics Informatics and Mechanics, Warsaw, Poland.
| | | | | | | | | | - Anna Gambin
- University of Warsaw, Department of Mathematics Informatics and Mechanics, Warsaw, Poland.
| |
Collapse
|
4
|
Wang M, Lei C, Wang J, Li Y, Li M. TripHLApan: predicting HLA molecules binding peptides based on triple coding matrix and transfer learning. Brief Bioinform 2024; 25:bbae154. [PMID: 38600667 PMCID: PMC11006794 DOI: 10.1093/bib/bbae154] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2023] [Revised: 02/16/2024] [Accepted: 03/13/2024] [Indexed: 04/12/2024] Open
Abstract
Human leukocyte antigen (HLA) recognizes foreign threats and triggers immune responses by presenting peptides to T cells. Computationally modeling the binding patterns between peptide and HLA is very important for the development of tumor vaccines. However, it is still a big challenge to accurately predict HLA molecules binding peptides. In this paper, we develop a new model TripHLApan for predicting HLA molecules binding peptides by integrating triple coding matrix, BiGRU + Attention models, and transfer learning strategy. We have found the main interaction site regions between HLA molecules and peptides, as well as the correlation between HLA encoding and binding motifs. Based on the discovery, we make the preprocessing and coding closer to the natural biological process. Besides, due to the input being based on multiple types of features and the attention module focused on the BiGRU hidden layer, TripHLApan has learned more sequence level binding information. The application of transfer learning strategies ensures the accuracy of prediction results under special lengths (peptides in length 8) and model scalability with the data explosion. Compared with the current optimal models, TripHLApan exhibits strong predictive performance in various prediction environments with different positive and negative sample ratios. In addition, we validate the superiority and scalability of TripHLApan's predictive performance using additional latest data sets, ablation experiments and binding reconstitution ability in the samples of a melanoma patient. The results show that TripHLApan is a powerful tool for predicting the binding of HLA-I and HLA-II molecular peptides for the synthesis of tumor vaccines. TripHLApan is publicly available at https://github.com/CSUBioGroup/TripHLApan.git.
Collapse
Affiliation(s)
- Meng Wang
- School of Computer Science and engineering, Central South University, Changsha 410083, China
| | - Chuqi Lei
- School of Computer Science and engineering, Central South University, Changsha 410083, China
| | - Jianxin Wang
- School of Computer Science and engineering, Central South University, Changsha 410083, China
| | - Yaohang Li
- Department of Computer Science, Old Dominion University, Norfolk, VA 23529, USA
| | - Min Li
- School of Computer Science and engineering, Central South University, Changsha 410083, China
| |
Collapse
|
5
|
Li Y, Wu X, Fang D, Luo Y. Informing immunotherapy with multi-omics driven machine learning. NPJ Digit Med 2024; 7:67. [PMID: 38486092 PMCID: PMC10940614 DOI: 10.1038/s41746-024-01043-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2023] [Accepted: 02/14/2024] [Indexed: 03/18/2024] Open
Abstract
Progress in sequencing technologies and clinical experiments has revolutionized immunotherapy on solid and hematologic malignancies. However, the benefits of immunotherapy are limited to specific patient subsets, posing challenges for broader application. To improve its effectiveness, identifying biomarkers that can predict patient response is crucial. Machine learning (ML) play a pivotal role in harnessing multi-omic cancer datasets and unlocking new insights into immunotherapy. This review provides an overview of cutting-edge ML models applied in omics data for immunotherapy analysis, including immunotherapy response prediction and immunotherapy-relevant tumor microenvironment identification. We elucidate how ML leverages diverse data types to identify significant biomarkers, enhance our understanding of immunotherapy mechanisms, and optimize decision-making process. Additionally, we discuss current limitations and challenges of ML in this rapidly evolving field. Finally, we outline future directions aimed at overcoming these barriers and improving the efficiency of ML in immunotherapy research.
Collapse
Affiliation(s)
- Yawei Li
- Department of Preventive Medicine, Northwestern University, Feinberg School of Medicine, Chicago, IL, 60611, USA
- Center for Collaborative AI in Healthcare, Northwestern University, Feinberg School of Medicine, Chicago, IL, 60611, USA
| | - Xin Wu
- Department of Medicine, University of Illinois at Chicago, Chicago, IL, 60612, USA
| | - Deyu Fang
- Department of Pathology, Northwestern University Feinberg School of Medicine, Chicago, IL, 60611, USA
| | - Yuan Luo
- Department of Preventive Medicine, Northwestern University, Feinberg School of Medicine, Chicago, IL, 60611, USA.
- Center for Collaborative AI in Healthcare, Northwestern University, Feinberg School of Medicine, Chicago, IL, 60611, USA.
| |
Collapse
|
6
|
Shahbazy M, Ramarathinam SH, Li C, Illing PT, Faridi P, Croft NP, Purcell AW. MHCpLogics: an interactive machine learning-based tool for unsupervised data visualization and cluster analysis of immunopeptidomes. Brief Bioinform 2024; 25:bbae087. [PMID: 38487848 PMCID: PMC10940831 DOI: 10.1093/bib/bbae087] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2023] [Revised: 12/12/2023] [Accepted: 02/15/2024] [Indexed: 03/18/2024] Open
Abstract
The major histocompatibility complex (MHC) encodes a range of immune response genes, including the human leukocyte antigens (HLAs) in humans. These molecules bind peptide antigens and present them on the cell surface for T cell recognition. The repertoires of peptides presented by HLA molecules are termed immunopeptidomes. The highly polymorphic nature of the genres that encode the HLA molecules confers allotype-specific differences in the sequences of bound ligands. Allotype-specific ligand preferences are often defined by peptide-binding motifs. Individuals express up to six classical class I HLA allotypes, which likely present peptides displaying different binding motifs. Such complex datasets make the deconvolution of immunopeptidomic data into allotype-specific contributions and further dissection of binding-specificities challenging. Herein, we developed MHCpLogics as an interactive machine learning-based tool for mining peptide-binding sequence motifs and visualization of immunopeptidome data across complex datasets. We showcase the functionalities of MHCpLogics by analyzing both in-house and published mono- and multi-allelic immunopeptidomics data. The visualization modalities of MHCpLogics allow users to inspect clustered sequences down to individual peptide components and to examine broader sequence patterns within multiple immunopeptidome datasets. MHCpLogics can deconvolute large immunopeptidome datasets enabling the interrogation of clusters for the segregation of allotype-specific peptide sequence motifs, identification of sub-peptidome motifs, and the exportation of clustered peptide sequence lists. The tool facilitates rapid inspection of immunopeptidomes as a resource for the immunology and vaccine communities. MHCpLogics is a standalone application available via an executable installation at: https://github.com/PurcellLab/MHCpLogics.
Collapse
Affiliation(s)
- Mohammad Shahbazy
- Department of Biochemistry and Molecular Biology and Infection and Immunity Program, Biomedicine Discovery Institute, Monash University, Melbourne, VIC 3800, Australia
| | - Sri H Ramarathinam
- Department of Biochemistry and Molecular Biology and Infection and Immunity Program, Biomedicine Discovery Institute, Monash University, Melbourne, VIC 3800, Australia
| | - Chen Li
- Department of Biochemistry and Molecular Biology and Infection and Immunity Program, Biomedicine Discovery Institute, Monash University, Melbourne, VIC 3800, Australia
| | - Patricia T Illing
- Department of Biochemistry and Molecular Biology and Infection and Immunity Program, Biomedicine Discovery Institute, Monash University, Melbourne, VIC 3800, Australia
| | - Pouya Faridi
- Centre for Cancer Research, Hudson Institute of Medical Research, Clayton, VIC 3168, Australia
- Monash Proteomics and Metabolomics Platform, Department of Medicine, School of Clinical Sciences, Monash University, Clayton, VIC 3800, Australia
| | - Nathan P Croft
- Department of Biochemistry and Molecular Biology and Infection and Immunity Program, Biomedicine Discovery Institute, Monash University, Melbourne, VIC 3800, Australia
| | - Anthony W Purcell
- Department of Biochemistry and Molecular Biology and Infection and Immunity Program, Biomedicine Discovery Institute, Monash University, Melbourne, VIC 3800, Australia
| |
Collapse
|
7
|
Conev A, Fasoulis R, Hall-Swan S, Ferreira R, Kavraki LE. HLAEquity: Examining biases in pan-allele peptide-HLA binding predictors. iScience 2024; 27:108613. [PMID: 38188519 PMCID: PMC10770483 DOI: 10.1016/j.isci.2023.108613] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2023] [Revised: 11/13/2023] [Accepted: 11/29/2023] [Indexed: 01/09/2024] Open
Abstract
Peptide-HLA (pHLA) binding prediction is essential in screening peptide candidates for personalized peptide vaccines. Machine learning (ML) pHLA binding prediction tools are trained on vast amounts of data and are effective in screening peptide candidates. Most ML models report the ability to generalize to HLA alleles unseen during training ("pan-allele" models). However, the use of datasets with imbalanced allele content raises concerns about biased model performance. First, we examine the data bias of two ML-based pan-allele pHLA binding predictors. We find that the pHLA datasets overrepresent alleles from geographic populations of high-income countries. Second, we show that the identified data bias is perpetuated within ML models, leading to algorithmic bias and subpar performance for alleles expressed in low-income geographic populations. We draw attention to the potential therapeutic consequences of this bias, and we challenge the use of the term "pan-allele" to describe models trained with currently available public datasets.
Collapse
Affiliation(s)
- Anja Conev
- Department of Computer Science, Rice University, Houston, TX, USA
| | - Romanos Fasoulis
- Department of Computer Science, Rice University, Houston, TX, USA
| | - Sarah Hall-Swan
- Department of Computer Science, Rice University, Houston, TX, USA
| | - Rodrigo Ferreira
- Department of Computer Science, Rice University, Houston, TX, USA
| | - Lydia E. Kavraki
- Department of Computer Science, Rice University, Houston, TX, USA
| |
Collapse
|
8
|
Li F, Wang C, Guo X, Akutsu T, Webb GI, Coin LJM, Kurgan L, Song J. ProsperousPlus: a one-stop and comprehensive platform for accurate protease-specific substrate cleavage prediction and machine-learning model construction. Brief Bioinform 2023; 24:bbad372. [PMID: 37874948 DOI: 10.1093/bib/bbad372] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2023] [Revised: 08/30/2023] [Accepted: 09/29/2023] [Indexed: 10/26/2023] Open
Abstract
Proteases contribute to a broad spectrum of cellular functions. Given a relatively limited amount of experimental data, developing accurate sequence-based predictors of substrate cleavage sites facilitates a better understanding of protease functions and substrate specificity. While many protease-specific predictors of substrate cleavage sites were developed, these efforts are outpaced by the growth of the protease substrate cleavage data. In particular, since data for 100+ protease types are available and this number continues to grow, it becomes impractical to publish predictors for new protease types, and instead it might be better to provide a computational platform that helps users to quickly and efficiently build predictors that address their specific needs. To this end, we conceptualized, developed, tested and released a versatile bioinformatics platform, ProsperousPlus, that empowers users, even those with no programming or little bioinformatics background, to build fast and accurate predictors of substrate cleavage sites. ProsperousPlus facilitates the use of the rapidly accumulating substrate cleavage data to train, empirically assess and deploy predictive models for user-selected substrate types. Benchmarking tests on test datasets show that our platform produces predictors that on average exceed the predictive performance of current state-of-the-art approaches. ProsperousPlus is available as a webserver and a stand-alone software package at http://prosperousplus.unimelb-biotools.cloud.edu.au/.
Collapse
Affiliation(s)
- Fuyi Li
- College of Information Engineering, Northwest A&F University, Shaanxi 712100, China
- South Australian immunoGENomics Cancer Institute (SAiGENCI), Faculty of Health and Medical Sciences, The University of Adelaide, Adelaide, SA 5005, Australia
- The Peter Doherty Institute for Infection and Immunity, The University of Melbourne, VIC 3000, Australia
| | - Cong Wang
- College of Information Engineering, Northwest A&F University, Shaanxi 712100, China
| | - Xudong Guo
- College of Information Engineering, Northwest A&F University, Shaanxi 712100, China
| | - Tatsuya Akutsu
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Kyoto 611-0011, Japan
| | - Geoffrey I Webb
- Monash Data Futures Institute, Monash University, VIC 3800, Australia
| | - Lachlan J M Coin
- The Peter Doherty Institute for Infection and Immunity, The University of Melbourne, VIC 3000, Australia
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA 23284, USA
| | - Jiangning Song
- Monash Data Futures Institute, Monash University, VIC 3800, Australia
- Monash Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, VIC 3800, Australia
| |
Collapse
|
9
|
Qu W, You R, Mamitsuka H, Zhu S. DeepMHCI: an anchor position-aware deep interaction model for accurate MHC-I peptide binding affinity prediction. Bioinformatics 2023; 39:btad551. [PMID: 37669154 PMCID: PMC10516514 DOI: 10.1093/bioinformatics/btad551] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2023] [Revised: 08/06/2023] [Accepted: 09/04/2023] [Indexed: 09/07/2023] Open
Abstract
MOTIVATION Computationally predicting major histocompatibility complex class I (MHC-I) peptide binding affinity is an important problem in immunological bioinformatics, which is also crucial for the identification of neoantigens for personalized therapeutic cancer vaccines. Recent cutting-edge deep learning-based methods for this problem cannot achieve satisfactory performance, especially for non-9-mer peptides. This is because such methods generate the input by simply concatenating the two given sequences: a peptide and (the pseudo sequence of) an MHC class I molecule, which cannot precisely capture the anchor positions of the MHC binding motif for the peptides with variable lengths. We thus developed an anchor position-aware and high-performance deep model, DeepMHCI, with a position-wise gated layer and a residual binding interaction convolution layer. This allows the model to control the information flow in peptides to be aware of anchor positions and model the interactions between peptides and the MHC pseudo (binding) sequence directly with multiple convolutional kernels. RESULTS The performance of DeepMHCI has been thoroughly validated by extensive experiments on four benchmark datasets under various settings, such as 5-fold cross-validation, validation with the independent testing set, external HPV vaccine identification, and external CD8+ epitope identification. Experimental results with visualization of binding motifs demonstrate that DeepMHCI outperformed all competing methods, especially on non-9-mer peptides binding prediction. AVAILABILITY AND IMPLEMENTATION DeepMHCI is publicly available at https://github.com/ZhuLab-Fudan/DeepMHCI.
Collapse
Affiliation(s)
- Wei Qu
- Institute of Science and Technology for Brain-Inspired Intelligence and MOE Frontiers Center for Brain Science, Fudan University, Shanghai 200433, China
| | - Ronghui You
- Institute of Science and Technology for Brain-Inspired Intelligence and MOE Frontiers Center for Brain Science, Fudan University, Shanghai 200433, China
| | - Hiroshi Mamitsuka
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Uji, Kyoto Prefecture 611-0011, Japan
- Department of Computer Science, Aalto University, 00076 Espoo, Finland
| | - Shanfeng Zhu
- Institute of Science and Technology for Brain-Inspired Intelligence and MOE Frontiers Center for Brain Science, Fudan University, Shanghai 200433, China
- Shanghai Qi Zhi Institute, Shanghai 200030, China
- Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence, Fudan University, Ministry of Education, Shanghai 200433, China
- Shanghai Key Lab of Intelligent Information Processing and Shanghai Institute of Artificial Intelligence Algorithm, Fudan University, Shanghai 200433, China
- Zhangjiang Fudan International Innovation Center, Shanghai 200433, China
| |
Collapse
|
10
|
Kalemati M, Darvishi S, Koohi S. CapsNet-MHC predicts peptide-MHC class I binding based on capsule neural networks. Commun Biol 2023; 6:492. [PMID: 37147498 PMCID: PMC10162658 DOI: 10.1038/s42003-023-04867-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2022] [Accepted: 04/24/2023] [Indexed: 05/07/2023] Open
Abstract
The Major Histocompatibility Complex (MHC) binds to the derived peptides from pathogens to present them to killer T cells on the cell surface. Developing computational methods for accurate, fast, and explainable peptide-MHC binding prediction can facilitate immunotherapies and vaccine development. Various deep learning-based methods rely on separate feature extraction from the peptide and MHC sequences and ignore their pairwise binding information. This paper develops a capsule neural network-based method to efficiently capture the peptide-MHC complex features to predict the peptide-MHC class I binding. Various evaluations confirmed our method outperformance over the alternative methods, while it can provide accurate prediction over less available data. Moreover, for providing precise insights into the results, we explored the essential features that contributed to the prediction. Since the simulation results demonstrated consistency with the experimental studies, we concluded that our method can be utilized for the accurate, rapid, and interpretable peptide-MHC binding prediction to assist biological therapies.
Collapse
Affiliation(s)
- Mahmood Kalemati
- Department of Computer Engineering, Sharif University of Technology, Tehran, Iran
| | - Saeid Darvishi
- Department of Computer Engineering, Sharif University of Technology, Tehran, Iran
| | - Somayyeh Koohi
- Department of Computer Engineering, Sharif University of Technology, Tehran, Iran.
| |
Collapse
|
11
|
Ye Z, Li S, Mi X, Shao B, Dai Z, Ding B, Feng S, Sun B, Shen Y, Xiao Z. STMHCpan, an accurate Star-Transformer-based extensible framework for predicting MHC I allele binding peptides. Brief Bioinform 2023; 24:7147024. [PMID: 37122066 DOI: 10.1093/bib/bbad164] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2023] [Revised: 03/22/2023] [Accepted: 04/06/2023] [Indexed: 05/02/2023] Open
Abstract
Peptide-major histocompatibility complex I (MHC I) binding affinity prediction is crucial for vaccine development, but existing methods face limitations such as small datasets, model overfitting due to excessive parameters and suboptimal performance. Here, we present STMHCPan (STAR-MHCPan), an open-source package based on the Star-Transformer model, for MHC I binding peptide prediction. Our approach introduces an attention mechanism to improve the deep learning network architecture and performance in antigen prediction. Compared with classical deep learning algorithms, STMHCPan exhibits improved performance with fewer parameters in receptor affinity training. Furthermore, STMHCPan outperforms existing ligand benchmark datasets identified by mass spectrometry. It can also handle peptides of arbitrary length and is highly scalable for predicting T-cell responses. Our software is freely available for use, training and extension through Github (https://github.com/Luckysoutheast/STMHCPan.git).
Collapse
Affiliation(s)
- Zheng Ye
- State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing, Jiangsu 210096, China
| | - Shaohao Li
- State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing, Jiangsu 210096, China
| | - Xue Mi
- State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing, Jiangsu 210096, China
| | - Baoyi Shao
- State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing, Jiangsu 210096, China
| | - Zhu Dai
- State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing, Jiangsu 210096, China
| | - Bo Ding
- Department of Obstetrics and Gynecoloty, Zhongda Hospital, School of Medicine, Southeast University, Nanjing 210009, China
| | - Songwei Feng
- Department of Obstetrics and Gynecoloty, Zhongda Hospital, School of Medicine, Southeast University, Nanjing 210009, China
| | - Bo Sun
- State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing, Jiangsu 210096, China
| | - Yang Shen
- Department of Obstetrics and Gynecoloty, Zhongda Hospital, School of Medicine, Southeast University, Nanjing 210009, China
| | - Zhongdang Xiao
- State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing, Jiangsu 210096, China
| |
Collapse
|
12
|
Akerman O, Isakov H, Levi R, Psevkin V, Louzoun Y. Counting is almost all you need. Front Immunol 2023; 13:1031011. [PMID: 36741395 PMCID: PMC9896581 DOI: 10.3389/fimmu.2022.1031011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2022] [Accepted: 12/27/2022] [Indexed: 01/21/2023] Open
Abstract
The immune memory repertoire encodes the history of present and past infections and immunological attributes of the individual. As such, multiple methods were proposed to use T-cell receptor (TCR) repertoires to detect disease history. We here show that the counting method outperforms two leading algorithms. We then show that the counting can be further improved using a novel attention model to weigh the different TCRs. The attention model is based on the projection of TCRs using a Variational AutoEncoder (VAE). Both counting and attention algorithms predict better than current leading algorithms whether the host had CMV and its HLA alleles. As an intermediate solution between the complex attention model and the very simple counting model, we propose a new Graph Convolutional Network approach that obtains the accuracy of the attention model and the simplicity of the counting model. The code for the models used in the paper is provided at: https://github.com/louzounlab/CountingIsAlmostAllYouNeed.
Collapse
Affiliation(s)
- Ofek Akerman
- Department of Mathematics, Bar-Ilan University, Ramat Gan, Israel
- Department of Computer Science, Bar-Ilan University, Ramat Gan, Israel
| | - Haim Isakov
- Department of Mathematics, Bar-Ilan University, Ramat Gan, Israel
| | - Reut Levi
- Department of Mathematics, Bar-Ilan University, Ramat Gan, Israel
| | - Vladimir Psevkin
- Department of Mathematics, Bar-Ilan University, Ramat Gan, Israel
| | - Yoram Louzoun
- Department of Mathematics, Bar-Ilan University, Ramat Gan, Israel
| |
Collapse
|
13
|
Cai Y, Chen R, Gao S, Li W, Liu Y, Su G, Song M, Jiang M, Jiang C, Zhang X. Artificial intelligence applied in neoantigen identification facilitates personalized cancer immunotherapy. Front Oncol 2023; 12:1054231. [PMID: 36698417 PMCID: PMC9868469 DOI: 10.3389/fonc.2022.1054231] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2022] [Accepted: 12/16/2022] [Indexed: 01/10/2023] Open
Abstract
The field of cancer neoantigen investigation has developed swiftly in the past decade. Predicting novel and true neoantigens derived from large multi-omics data became difficult but critical challenges. The rise of Artificial Intelligence (AI) or Machine Learning (ML) in biomedicine application has brought benefits to strengthen the current computational pipeline for neoantigen prediction. ML algorithms offer powerful tools to recognize the multidimensional nature of the omics data and therefore extract the key neoantigen features enabling a successful discovery of new neoantigens. The present review aims to outline the significant technology progress of machine learning approaches, especially the newly deep learning tools and pipelines, that were recently applied in neoantigen prediction. In this review article, we summarize the current state-of-the-art tools developed to predict neoantigens. The standard workflow includes calling genetic variants in paired tumor and blood samples, and rating the binding affinity between mutated peptide, MHC (I and II) and T cell receptor (TCR), followed by characterizing the immunogenicity of tumor epitopes. More specifically, we highlight the outstanding feature extraction tools and multi-layer neural network architectures in typical ML models. It is noted that more integrated neoantigen-predicting pipelines are constructed with hybrid or combined ML algorithms instead of conventional machine learning models. In addition, the trends and challenges in further optimizing and integrating the existing pipelines are discussed.
Collapse
Affiliation(s)
- Yu Cai
- School of Medicine, Northwest University, Xi’an, Shaanxi, China
| | - Rui Chen
- School of Medicine, Northwest University, Xi’an, Shaanxi, China
| | - Shenghan Gao
- School of Medicine, Northwest University, Xi’an, Shaanxi, China
| | - Wenqing Li
- School of Medicine, Northwest University, Xi’an, Shaanxi, China
| | - Yuru Liu
- School of Medicine, Northwest University, Xi’an, Shaanxi, China
| | - Guodong Su
- School of Medicine, Northwest University, Xi’an, Shaanxi, China
| | - Mingming Song
- School of Medicine, Northwest University, Xi’an, Shaanxi, China
| | - Mengju Jiang
- School of Medicine, Northwest University, Xi’an, Shaanxi, China
| | - Chao Jiang
- Department of Neurology, The Second Affiliated Hospital of Xi’an Medical University, Xi’an, Shaanxi, China,*Correspondence: Chao Jiang, ; Xi Zhang,
| | - Xi Zhang
- School of Medicine, Northwest University, Xi’an, Shaanxi, China,*Correspondence: Chao Jiang, ; Xi Zhang,
| |
Collapse
|
14
|
Immunolyser: A web-based computational pipeline for analysing and mining immunopeptidomic data. Comput Struct Biotechnol J 2023; 21:1678-1687. [PMID: 36890882 PMCID: PMC9988424 DOI: 10.1016/j.csbj.2023.02.033] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2022] [Revised: 02/01/2023] [Accepted: 02/17/2023] [Indexed: 02/22/2023] Open
Abstract
Immunopeptidomics has made tremendous contributions to our understanding of antigen processing and presentation, by identifying and quantifying antigenic peptides presented on the cell surface by Major Histocompatibility Complex (MHC) molecules. Large and complex immunopeptidomics datasets can now be routinely generated using Liquid Chromatography-Mass Spectrometry techniques. The analysis of this data - often consisting of multiple replicates/conditions - rarely follows a standard data processing pipeline, hindering the reproducibility and depth of analysis of immunopeptidomic data. Here, we present Immunolyser, an automated pipeline designed to facilitate computational analysis of immunopeptidomic data with a minimal initial setup. Immunolyser brings together routine analyses, including peptide length distribution, peptide motif analysis, sequence clustering, peptide-MHC binding affinity prediction, and source protein analysis. Immunolyser provides a user-friendly and interactive interface via its webserver and is freely available for academic purposes at https://immunolyser.erc.monash.edu/. The open-access source code can be downloaded at our GitHub repository: https://github.com/prmunday/Immunolyser. We anticipate that Immunolyser will serve as a prominent computational pipeline to facilitate effortless and reproducible analysis of immunopeptidomic data.
Collapse
|
15
|
Guo X, Li F, Song J. Predicting Pseudouridine Sites with Porpoise. Methods Mol Biol 2023; 2624:139-151. [PMID: 36723814 DOI: 10.1007/978-1-0716-2962-8_10] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/18/2023]
Abstract
Pseudouridine is a ubiquitous RNA modification and plays a crucial role in many biological processes. However, it remains a challenging task to identify pseudouridine sites using expensive and time-consuming experimental research. To this end, we present Porpoise, a computational approach to identify pseudouridine sites from RNA sequence data. Porpoise builds on a stacking ensemble learning framework with several informative features and achieves competitive performance compared with state-of-the-art approaches. This protocol elaborates on step-by-step use and execution of the local stand-alone version and the webserver of Porpoise. In addition, we also provide a general machine learning framework that can help identify the optimal stacking ensemble learning model using different combinations of feature-based features. This general machine learning framework can facilitate users to build their pseudouridine predictors using their in-house datasets.
Collapse
Affiliation(s)
- Xudong Guo
- College of Information Engineering, Northwest A&F University, Yangling, China
| | - Fuyi Li
- College of Information Engineering, Northwest A&F University, Yangling, China.
- Department of Microbiology and Immunology, The Peter Doherty Institute for Infection and Immunity, The University of Melbourne, Melbourne, VIC, Australia.
| | - Jiangning Song
- Biomedicine Discovery Institute, Monash University, Melbourne, VIC, Australia.
- Monash Data Futures Institute, Monash University, Melbourne, VIC, Australia.
| |
Collapse
|
16
|
iEnhancer-MRBF: Identifying enhancers and their strength with a multiple Laplacian-regularized radial basis function network. Methods 2022; 208:1-8. [DOI: 10.1016/j.ymeth.2022.10.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2022] [Revised: 09/26/2022] [Accepted: 10/03/2022] [Indexed: 11/07/2022] Open
|
17
|
Zhu L, Wang X, Li F, Song J. PreAcrs: a machine learning framework for identifying anti-CRISPR proteins. BMC Bioinformatics 2022; 23:444. [PMID: 36284264 PMCID: PMC9597991 DOI: 10.1186/s12859-022-04986-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2022] [Accepted: 10/14/2022] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Anti-CRISPR proteins are potent modulators that inhibit the CRISPR-Cas immunity system and have huge potential in gene editing and gene therapy as a genome-editing tool. Extensive studies have shown that anti-CRISPR proteins are essential for modifying endogenous genes, promoting the RNA-guided binding and cleavage of DNA or RNA substrates. In recent years, identifying and characterizing anti-CRISPR proteins has become a hot and significant research topic in bioinformatics. However, as most anti-CRISPR proteins fall short in sharing similarities to those currently known, traditional screening methods are time-consuming and inefficient. Machine learning methods could fill this gap with powerful predictive capability and provide a new perspective for anti-CRISPR protein identification. RESULTS Here, we present a novel machine learning ensemble predictor, called PreAcrs, to identify anti-CRISPR proteins from protein sequences directly. Three features and eight different machine learning algorithms were used to train PreAcrs. PreAcrs outperformed other existing methods and significantly improved the prediction accuracy for identifying anti-CRISPR proteins. CONCLUSIONS In summary, the PreAcrs predictor achieved a competitive performance for predicting new anti-CRISPR proteins in terms of accuracy and robustness. We anticipate PreAcrs will be a valuable tool for researchers to speed up the research process. The source code is available at: https://github.com/Lyn-666/anti_CRISPR.git .
Collapse
Affiliation(s)
- Lin Zhu
- grid.263488.30000 0001 0472 9649Institute for Advanced Study, Shenzhen University, Shenzhen, China
| | - Xiaoyu Wang
- grid.1002.30000 0004 1936 7857Monash Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800 Australia
| | - Fuyi Li
- grid.1008.90000 0001 2179 088XDepartment of Microbiology and Immunology, The Peter Doherty Institute for Infection and Immunity, The University of Melbourne, Melbourne, VIC Australia
| | - Jiangning Song
- grid.1002.30000 0004 1936 7857Monash Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800 Australia ,grid.1002.30000 0004 1936 7857Monash Data Futures Institute, Monash University, Melbourne, VIC 3800 Australia
| |
Collapse
|
18
|
Liu S, Cui C, Chen H, Liu T. Ensemble learning-based feature selection for phosphorylation site detection. Front Genet 2022; 13:984068. [PMID: 36338976 PMCID: PMC9634105 DOI: 10.3389/fgene.2022.984068] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2022] [Accepted: 10/05/2022] [Indexed: 11/18/2022] Open
Abstract
SARS-COV-2 is prevalent all over the world, causing more than six million deaths and seriously affecting human health. At present, there is no specific drug against SARS-COV-2. Protein phosphorylation is an important way to understand the mechanism of SARS -COV-2 infection. It is often expensive and time-consuming to identify phosphorylation sites with specific modified residues through experiments. A method that uses machine learning to make predictions about them is proposed. As all the methods of extracting protein sequence features are knowledge-driven, these features may not be effective for detecting phosphorylation sites without a complete understanding of the mechanism of protein. Moreover, redundant features also have a great impact on the fitting degree of the model. To solve these problems, we propose a feature selection method based on ensemble learning, which firstly extracts protein sequence features based on knowledge, then quantifies the importance score of each feature based on data, and finally uses the subset of important features as the final features to predict phosphorylation sites.
Collapse
Affiliation(s)
- Songbo Liu
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Chengmin Cui
- Beijing Institute of Control Engineering, China Academy of Space Technology, Beijing, China
| | - Huipeng Chen
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
- *Correspondence: Huipeng Chen,
| | - Tong Liu
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| |
Collapse
|
19
|
Illing PT, Ramarathinam SH, Purcell AW. New insights and approaches for analyses of immunopeptidomes. Curr Opin Immunol 2022; 77:102216. [PMID: 35716458 DOI: 10.1016/j.coi.2022.102216] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2022] [Accepted: 05/10/2022] [Indexed: 11/03/2022]
Abstract
Human leucocyte antigen (HLA) molecules play a key role in health and disease by presenting antigen to T-lymphocytes for immunosurveillance. Immunopeptidomics involves the study of the collection of peptides presented within the antigen-binding groove of HLA molecules. Identifying their nature and diversity is crucial to understanding immunosurveillance especially during infection or for the recognition and potential eradication of tumours. This review discusses recent advances in the isolation, identification, and quantitation of these peptide antigens. New informatics approaches and databases have shed light on the extent of peptide antigens derived from unconventional sources including peptides derived from transcripts associated with frame shifts, long noncoding RNA, incorrectly annotated untranslated regions, post-translational modifications, and proteasomal splicing. Several challenges remain in successful analysis of immunopeptides, yet recent developments point to unexplored biology waiting to be unravelled.
Collapse
Affiliation(s)
- Patricia T Illing
- Department of Biochemistry and Molecular Biology and Infection and Immunity Program, Biomedicine Discovery Institute, Monash University, Melbourne, Victoria, Australia
| | - Sri H Ramarathinam
- Department of Biochemistry and Molecular Biology and Infection and Immunity Program, Biomedicine Discovery Institute, Monash University, Melbourne, Victoria, Australia
| | - Anthony W Purcell
- Department of Biochemistry and Molecular Biology and Infection and Immunity Program, Biomedicine Discovery Institute, Monash University, Melbourne, Victoria, Australia.
| |
Collapse
|
20
|
Glazer N, Akerman O, Louzoun Y. Naive and memory T cells TCR-HLA-binding prediction. OXFORD OPEN IMMUNOLOGY 2022; 3:iqac001. [PMID: 36846560 PMCID: PMC9914496 DOI: 10.1093/oxfimm/iqac001] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2022] [Revised: 05/01/2022] [Accepted: 05/17/2022] [Indexed: 11/12/2022] Open
Abstract
T cells recognize antigens through the interaction of their T cell receptor (TCR) with a peptide-major histocompatibility complex (pMHC) molecule. Following thymic-positive selection, TCRs in peripheral naive T cells are expected to bind MHC alleles of the host. Peripheral clonal selection is expected to further increase the frequency of antigen-specific TCRs that bind to the host MHC alleles. To check for a systematic preference for MHC-binding T cells in TCR repertoires, we developed Natural Language Processing-based methods to predict TCR-MHC binding independently of the peptide presented for Class I MHC alleles. We trained a classifier on published TCR-pMHC binding pairs and obtained a high area under curve (AUC) of over 0.90 on the test set. However, when applied to TCR repertoires, the accuracy of the classifier dropped. We thus developed a two-stage prediction model, based on large-scale naive and memory TCR repertoires, denoted TCR HLA-binding predictor (CLAIRE). Since each host carries multiple human leukocyte antigen (HLA) alleles, we first computed whether a TCR on a CD8 T cell binds an MHC from any of the host Class-I HLA alleles. We then performed an iteration, where we predict the binding with the most probable allele from the first round. We show that this classifier is more precise for memory than for naïve cells. Moreover, it can be transferred between datasets. Finally, we developed a CD4-CD8 T cell classifier to apply CLAIRE to unsorted bulk sequencing datasets and showed a high AUC of 0.96 and 0.90 on large datasets. CLAIRE is available through a GitHub at: https://github.com/louzounlab/CLAIRE, and as a server at: https://claire.math.biu.ac.il/Home.
Collapse
Affiliation(s)
- Neta Glazer
- Department of Mathematics, Bar-Ilan University, Ramat Gan, Israel
| | - Ofek Akerman
- Department of Mathematics, Bar-Ilan University, Ramat Gan, Israel
| | - Yoram Louzoun
- Correspondence address. Department of Mathematics, Bar-Ilan University, Ramat Gan, Israel. E-mail:
| |
Collapse
|
21
|
Dhall A, Patiyal S, Raghava GPS. HLAncPred: a method for predicting promiscuous non-classical HLA binding sites. Brief Bioinform 2022; 23:6587168. [PMID: 35580839 DOI: 10.1093/bib/bbac192] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2021] [Revised: 03/23/2022] [Accepted: 04/27/2022] [Indexed: 12/25/2022] Open
Abstract
Human leukocyte antigens (HLA) regulate various innate and adaptive immune responses and play a crucial immunomodulatory role. Recent studies revealed that non-classical HLA-(HLA-E & HLA-G) based immunotherapies have many advantages over traditional HLA-based immunotherapy, particularly against cancer and COVID-19 infection. In the last two decades, several methods have been developed to predict the binders of classical HLA alleles. In contrast, limited attempts have been made to develop methods for predicting non-classical HLA binding peptides, due to the scarcity of sufficient experimental data. Of note, in order to facilitate the scientific community, we have developed an artificial intelligence-based method for predicting binders of class-Ib HLA alleles. All the models were trained and tested on experimentally validated data obtained from the recent release of IEDB. The machine learning models achieved more than 0.98 AUC for HLA-G alleles on validation dataset. Similarly, our models achieved the highest AUC of 0.96 and 0.94 on the validation dataset for HLA-E*01:01 and HLA-E*01:03, respectively. We have summarized the models developed in the past for non-classical HLA and validated the performance with the models developed in this study. Moreover, to facilitate the community, we have utilized our tool for predicting the potential non-classical HLA binding peptides in the spike protein of different variants of virus causing COVID-19, including Omicron (B.1.1.529). One of the major challenges in the field of immunotherapy is to identify the promiscuous binders or antigenic regions that can bind to a large number of HLA alleles. To predict the promiscuous binders for the non-classical HLA alleles, we developed a web server HLAncPred (https://webs.iiitd.edu.in/raghava/hlancpred) and standalone package.
Collapse
Affiliation(s)
- Anjali Dhall
- Department of Computational Biology, Indraprastha Institute of Information Technology, Okhla Phase 3, New Delhi-110020, India
| | - Sumeet Patiyal
- Department of Computational Biology, Indraprastha Institute of Information Technology, Okhla Phase 3, New Delhi-110020, India
| | - Gajendra P S Raghava
- Department of Computational Biology, Indraprastha Institute of Information Technology, Okhla Phase 3, New Delhi-110020, India
| |
Collapse
|
22
|
Hensen L, Illing PT, Rowntree LC, Davies J, Miller A, Tong SYC, Habel JR, van de Sandt CE, Flanagan K, Purcell AW, Kedzierska K, Clemens EB. T Cell Epitope Discovery in the Context of Distinct and Unique Indigenous HLA Profiles. Front Immunol 2022; 13:812393. [PMID: 35603215 PMCID: PMC9121770 DOI: 10.3389/fimmu.2022.812393] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2021] [Accepted: 03/28/2022] [Indexed: 11/13/2022] Open
Abstract
CD8+ T cells are a pivotal part of the immune response to viruses, playing a key role in disease outcome and providing long-lasting immunity to conserved pathogen epitopes. Understanding CD8+ T cell immunity in humans is complex due to CD8+ T cell restriction by highly polymorphic Human Leukocyte Antigen (HLA) proteins, requiring T cell epitopes to be defined for different HLA allotypes across different ethnicities. Here we evaluate strategies that have been developed to facilitate epitope identification and study immunogenic T cell responses. We describe an immunopeptidomics approach to sequence HLA-bound peptides presented on virus-infected cells by liquid chromatography with tandem mass spectrometry (LC-MS/MS). Using antigen presenting cell lines that stably express the HLA alleles characteristic of Indigenous Australians, this approach has been successfully used to comprehensively identify influenza-specific CD8+ T cell epitopes restricted by HLA allotypes predominant in Indigenous Australians, including HLA-A*24:02 and HLA-A*11:01. This is an essential step in ensuring high vaccine coverage and efficacy in Indigenous populations globally, known to be at high risk from influenza disease and other respiratory infections.
Collapse
Affiliation(s)
- Luca Hensen
- Department of Microbiology and Immunology, University of Melbourne, at the Peter Doherty Institute for Infection and Immunity, Parkville, VIC, Australia
| | - Patricia T. Illing
- Department of Biochemistry and Molecular Biology & Infection and Immunity Program, Biomedicine Discovery Institute, Monash University, Clayton, VIC, Australia
| | - Louise C. Rowntree
- Department of Microbiology and Immunology, University of Melbourne, at the Peter Doherty Institute for Infection and Immunity, Parkville, VIC, Australia
| | - Jane Davies
- Menzies School of Health Research, Darwin, NT, Australia
| | - Adrian Miller
- Indigenous Engagement, CQUniversity, Townsville, QLD, Australia
| | - Steven Y. C. Tong
- Menzies School of Health Research, Darwin, NT, Australia
- Victorian Infectious Diseases Service, The Royal Melbourne Hospital at the Peter Doherty Institute for Infection and Immunity, Melbourne, VIC, Australia
- Department of Infectious Diseases, The University of Melbourne at the Peter Doherty Institute for Infection and Immunity, Melbourne, VIC, Australia
| | - Jennifer R. Habel
- Department of Microbiology and Immunology, University of Melbourne, at the Peter Doherty Institute for Infection and Immunity, Parkville, VIC, Australia
| | - Carolien E. van de Sandt
- Department of Microbiology and Immunology, University of Melbourne, at the Peter Doherty Institute for Infection and Immunity, Parkville, VIC, Australia
- Department of Hematopoiesis, Sanquin Research and Landsteiner Laboratory, Amsterdam UMC, University of Amsterdam, Amsterdam, Netherlands
| | - Katie L. Flanagan
- Department of Infectious Diseases and Tasmanian Vaccine Trial Centre, Launceston General Hospital, Launceston, TAS, Australia
- School of Health Sciences and School of Medicine, University of Tasmania, Launceston, TAS, Australia
- Department of Immunology and Pathology, Monash University, Melbourne, VIC, Australia
- School of Health and Biomedical Science, RMIT University, Melbourne, VIC, Australia
| | - Anthony W. Purcell
- Department of Biochemistry and Molecular Biology & Infection and Immunity Program, Biomedicine Discovery Institute, Monash University, Clayton, VIC, Australia
| | - Katherine Kedzierska
- Department of Microbiology and Immunology, University of Melbourne, at the Peter Doherty Institute for Infection and Immunity, Parkville, VIC, Australia
- *Correspondence: Katherine Kedzierska,
| | - E. Bridie Clemens
- Department of Microbiology and Immunology, University of Melbourne, at the Peter Doherty Institute for Infection and Immunity, Parkville, VIC, Australia
| |
Collapse
|
23
|
Zhang Y, Zhu G, Li K, Li F, Huang L, Duan M, Zhou F. HLAB: learning the BiLSTM features from the ProtBert-encoded proteins for the class I HLA-peptide binding prediction. Brief Bioinform 2022; 23:6581432. [PMID: 35514183 PMCID: PMC9487590 DOI: 10.1093/bib/bbac173] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2021] [Revised: 03/29/2022] [Accepted: 04/18/2022] [Indexed: 12/11/2022] Open
Abstract
Human Leukocyte Antigen (HLA) is a type of molecule residing on the surfaces of most human cells and exerts an essential role in the immune system responding to the invasive items. The T cell antigen receptors may recognize the HLA-peptide complexes on the surfaces of cancer cells and destroy these cancer cells through toxic T lymphocytes. The computational determination of HLA-binding peptides will facilitate the rapid development of cancer immunotherapies. This study hypothesized that the natural language processing-encoded peptide features may be further enriched by another deep neural network. The hypothesis was tested with the Bi-directional Long Short-Term Memory-extracted features from the pretrained Protein Bidirectional Encoder Representations from Transformers-encoded features of the class I HLA (HLA-I)-binding peptides. The experimental data showed that our proposed HLAB feature engineering algorithm outperformed the existing ones in detecting the HLA-I-binding peptides. The extensive evaluation data show that the proposed HLAB algorithm outperforms all the seven existing studies on predicting the peptides binding to the HLA-A*01:01 allele in AUC and achieves the best average AUC values on the six out of the seven k-mers (k=8,9,...,14, respectively represent the prediction task of a polypeptide consisting of k amino acids) except for the 9-mer prediction tasks. The source code and the fine-tuned feature extraction models are available at http://www.healthinformaticslab.org/supp/resources.php.
Collapse
Affiliation(s)
- Yaqi Zhang
- College of Computer Science and Technology, and Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin 130012, P.R. China
| | - Gancheng Zhu
- College of Computer Science and Technology, and Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin 130012, P.R. China
| | - Kewei Li
- College of Computer Science and Technology, and Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin 130012, P.R. China
| | - Fei Li
- College of Computer Science and Technology, and Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin 130012, P.R. China
| | - Lan Huang
- College of Computer Science and Technology, and Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin 130012, P.R. China
| | - Meiyu Duan
- College of Computer Science and Technology, and Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin 130012, P.R. China
| | - Fengfeng Zhou
- College of Computer Science and Technology, and Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin 130012, P.R. China
| |
Collapse
|
24
|
Lu M, Xu L, Jian X, Tan X, Zhao J, Liu Z, Zhang Y, Liu C, Chen L, Lin Y, Xie L. dbPepNeo2.0: A Database for Human Tumor Neoantigen Peptides From Mass Spectrometry and TCR Recognition. Front Immunol 2022; 13:855976. [PMID: 35493528 PMCID: PMC9043652 DOI: 10.3389/fimmu.2022.855976] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2022] [Accepted: 03/17/2022] [Indexed: 12/04/2022] Open
Abstract
Neoantigens are widely reported to induce T-cell response and lead to tumor regression, indicating a promising potential to immunotherapy. Previously, we constructed an open-access database, i.e., dbPepNeo, providing a systematic resource for human tumor neoantigens to storage and query. In order to expand data volume and application scope, we updated dbPepNeo to version 2.0 (http://www.biostatistics.online/dbPepNeo2). Here, we provide about 801 high-confidence (HC) neoantigens (increased by 170%) and 842,289 low-confidence (LC) HLA immunopeptidomes (increased by 107%). Notably, 55 class II HC neoantigens and 630 neoantigen-reactive T-cell receptor-β (TCRβ) sequences were firstly included. Besides, two new analytical tools are developed, DeepCNN-Ineo and BLASTdb. DeepCNN-Ineo predicts the immunogenicity of class I neoantigens, and BLASTdb performs local alignments to look for sequence similarities in dbPepNeo2.0. Meanwhile, the web features and interface have been greatly improved and enhanced.
Collapse
Affiliation(s)
- Manman Lu
- College of Food Science and Technology, Shanghai Ocean University, Shanghai, China.,Shanghai-Ministry of Science and Technology (MOST) Key Laboratory of Health and Disease Genomics, Institute for Genome and Bioinformatics, Shanghai Institute for Biomedical and Pharmaceutical Technologies, Shanghai, China
| | - Linfeng Xu
- Shanghai-Ministry of Science and Technology (MOST) Key Laboratory of Health and Disease Genomics, Institute for Genome and Bioinformatics, Shanghai Institute for Biomedical and Pharmaceutical Technologies, Shanghai, China.,School of Health Science and Engineering, University of Shanghai for Science and Technology, Shanghai, China
| | - Xingxing Jian
- Shanghai-Ministry of Science and Technology (MOST) Key Laboratory of Health and Disease Genomics, Institute for Genome and Bioinformatics, Shanghai Institute for Biomedical and Pharmaceutical Technologies, Shanghai, China.,Bioinformatics Center, National Clinical Research Centre for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital, Central South University, Changsha, China
| | - Xiaoxiu Tan
- Shanghai-Ministry of Science and Technology (MOST) Key Laboratory of Health and Disease Genomics, Institute for Genome and Bioinformatics, Shanghai Institute for Biomedical and Pharmaceutical Technologies, Shanghai, China.,Department of Bioinformatics and Biostatistics, Shanghai Jiao Tong University, Shanghai, China
| | - Jingjing Zhao
- College of Food Science and Technology, Shanghai Ocean University, Shanghai, China.,Shanghai-Ministry of Science and Technology (MOST) Key Laboratory of Health and Disease Genomics, Institute for Genome and Bioinformatics, Shanghai Institute for Biomedical and Pharmaceutical Technologies, Shanghai, China
| | - Zhenhao Liu
- Shanghai-Ministry of Science and Technology (MOST) Key Laboratory of Health and Disease Genomics, Institute for Genome and Bioinformatics, Shanghai Institute for Biomedical and Pharmaceutical Technologies, Shanghai, China
| | - Yu Zhang
- Shanghai-Ministry of Science and Technology (MOST) Key Laboratory of Health and Disease Genomics, Institute for Genome and Bioinformatics, Shanghai Institute for Biomedical and Pharmaceutical Technologies, Shanghai, China.,School of Health Science and Engineering, University of Shanghai for Science and Technology, Shanghai, China
| | - Chunyu Liu
- College of Food Science and Technology, Shanghai Ocean University, Shanghai, China.,Shanghai-Ministry of Science and Technology (MOST) Key Laboratory of Health and Disease Genomics, Institute for Genome and Bioinformatics, Shanghai Institute for Biomedical and Pharmaceutical Technologies, Shanghai, China
| | - Lanming Chen
- College of Food Science and Technology, Shanghai Ocean University, Shanghai, China
| | - Yong Lin
- School of Health Science and Engineering, University of Shanghai for Science and Technology, Shanghai, China
| | - Lu Xie
- College of Food Science and Technology, Shanghai Ocean University, Shanghai, China.,Shanghai-Ministry of Science and Technology (MOST) Key Laboratory of Health and Disease Genomics, Institute for Genome and Bioinformatics, Shanghai Institute for Biomedical and Pharmaceutical Technologies, Shanghai, China.,Bioinformatics Center, National Clinical Research Centre for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital, Central South University, Changsha, China
| |
Collapse
|
25
|
A transformer-based model to predict peptide–HLA class I binding and optimize mutated peptides for vaccine design. NAT MACH INTELL 2022. [DOI: 10.1038/s42256-022-00459-7] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
|
26
|
Li X, Lin X, Mei X, Chen P, Liu A, Liang W, Chang S, Li J. HLA3D: an integrated structure-based computational toolkit for immunotherapy. Brief Bioinform 2022; 23:6548371. [PMID: 35289353 PMCID: PMC9116210 DOI: 10.1093/bib/bbac076] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2021] [Revised: 02/12/2022] [Accepted: 02/14/2022] [Indexed: 01/02/2023] Open
Abstract
Motivation The human major histocompatibility complex (MHC), also known as human leukocyte antigen (HLA), plays an important role in the adaptive immune system by presenting non-self-peptides to T cell receptors. The MHC region has been shown to be associated with a variety of diseases, including autoimmune diseases, organ transplantation and tumours. However, structural analytic tools of HLA are still sparse compared to the number of identified HLA alleles, which hinders the disclosure of its pathogenic mechanism. Result To provide an integrative analysis of HLA, we first collected 1296 amino acid sequences, 256 protein data bank structures, 120 000 frequency data of HLA alleles in different populations, 73 000 publications and 39 000 disease-associated single nucleotide polymorphism sites, as well as 212 modelled HLA heterodimer structures. Then, we put forward two new strategies for building up a toolkit for transplantation and tumour immunotherapy, designing risk alignment pipeline and antigenic peptide prediction pipeline by integrating different resources and bioinformatic tools. By integrating 100 000 calculated HLA conformation difference and online tools, risk alignment pipeline provides users with the functions of structural alignment, sequence alignment, residue visualization and risk report generation of mismatched HLA molecules. For tumour antigen prediction, we first predicted 370 000 immunogenic peptides based on the affinity between peptides and MHC to generate the neoantigen catalogue for 11 common tumours. We then designed an antigenic peptide prediction pipeline to provide the functions of mutation prediction, peptide prediction, immunogenicity assessment and docking simulation. We also present a case study of hepatitis B virus mutations associated with liver cancer that demonstrates the high legitimacy of our antigenic peptide prediction process. HLA3D, including different HLA analytic tools and the prediction pipelines, is available at http://www.hla3d.cn/.
Collapse
Affiliation(s)
- Xingyu Li
- Key Laboratory of DGHD, MOE, School of Life Science and Technology, Southeast University, Nanjing, China
| | - Xue Lin
- Department of Bioinformatics, School of Biomedical Engineering and Informatics, Nanjing Medical University, Nanjing, China
| | - Xueyin Mei
- Key Laboratory of DGHD, MOE, School of Life Science and Technology, Southeast University, Nanjing, China
| | - Pin Chen
- Key Laboratory of DGHD, MOE, School of Life Science and Technology, Southeast University, Nanjing, China
| | - Anna Liu
- Key Laboratory of DGHD, MOE, School of Life Science and Technology, Southeast University, Nanjing, China
| | - Weicheng Liang
- Key Laboratory of DGHD, MOE, School of Life Science and Technology, Southeast University, Nanjing, China
| | - Shan Chang
- Institute of Bioinformatics and Medical Engineering, School of Electrical and Information Engineering, Jiangsu University of Technology, Changzhou, China
| | - Jian Li
- Key Laboratory of DGHD, MOE, School of Life Science and Technology, Southeast University, Nanjing, China
| |
Collapse
|
27
|
Cheng R, Xu Z, Luo M, Wang P, Cao H, Jin X, Zhou W, Xiao L, Jiang Q. Identification of alternative splicing-derived cancer neoantigens for mRNA vaccine development. Brief Bioinform 2022; 23:bbab553. [PMID: 35279714 DOI: 10.1093/bib/bbab553] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2021] [Revised: 11/15/2021] [Accepted: 12/02/2021] [Indexed: 12/17/2023] Open
Abstract
Messenger RNA (mRNA) vaccines have shown great potential for anti-tumor therapy due to the advantages in safety, efficacy and industrial production. However, it remains a challenge to identify suitable cancer neoantigens that can be targeted for mRNA vaccines. Abnormal alternative splicing occurs in a variety of tumors, which may result in the translation of abnormal transcripts into tumor-specific proteins. High-throughput technologies make it possible for systematic characterization of alternative splicing as a source of suitable target neoantigens for mRNA vaccine development. Here, we summarized difficulties and challenges for identifying alternative splicing-derived cancer neoantigens from RNA-seq data and proposed a conceptual framework for designing personalized mRNA vaccines based on alternative splicing-derived cancer neoantigens. In addition, several points were presented to spark further discussion toward improving the identification of alternative splicing-derived cancer neoantigens.
Collapse
Affiliation(s)
- Rui Cheng
- Harbin Institute of Technology, China
| | | | - Meng Luo
- Harbin Institute of Technology, China
| | | | | | | | | | | | | |
Collapse
|
28
|
Staem5: A novel computational approachfor accurate prediction of m5C site. MOLECULAR THERAPY. NUCLEIC ACIDS 2021; 26:1027-1034. [PMID: 34786208 PMCID: PMC8571400 DOI: 10.1016/j.omtn.2021.10.012] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/05/2021] [Revised: 08/27/2021] [Accepted: 10/06/2021] [Indexed: 12/25/2022]
Abstract
5-Methylcytosine (m5C) is an important post-transcriptional modification that has been extensively found in multiple types of RNAs. Many studies have shown that m5C plays vital roles in many biological functions, such as RNA structure stability and metabolism. Computational approaches act as an efficient way to identify m5C sites from high-throughput RNA sequence data and help interpret the functional mechanism of this important modification. This study proposed a novel species-specific computational approach, Staem5, to accurately predict RNA m5C sites in Mus musculus and Arabidopsis thaliana. Staem5 was developed by employing feature fusion tactics to leverage informatic sequence profiles, and a stacking ensemble learning framework combined five popular machine learning algorithms. Extensive benchmarking tests demonstrated that Staem5 outperformed state-of-the-art approaches in both cross-validation and independent tests. We provide the source code of Staem5, which is publicly available at https://github.com/Cxd-626/Staem5.git.
Collapse
|
29
|
Li F, Dong S, Leier A, Han M, Guo X, Xu J, Wang X, Pan S, Jia C, Zhang Y, Webb GI, Coin LJM, Li C, Song J. Positive-unlabeled learning in bioinformatics and computational biology: a brief review. Brief Bioinform 2021; 23:6415313. [PMID: 34729589 DOI: 10.1093/bib/bbab461] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2021] [Revised: 09/27/2021] [Accepted: 10/07/2021] [Indexed: 12/14/2022] Open
Abstract
Conventional supervised binary classification algorithms have been widely applied to address significant research questions using biological and biomedical data. This classification scheme requires two fully labeled classes of data (e.g. positive and negative samples) to train a classification model. However, in many bioinformatics applications, labeling data is laborious, and the negative samples might be potentially mislabeled due to the limited sensitivity of the experimental equipment. The positive unlabeled (PU) learning scheme was therefore proposed to enable the classifier to learn directly from limited positive samples and a large number of unlabeled samples (i.e. a mixture of positive or negative samples). To date, several PU learning algorithms have been developed to address various biological questions, such as sequence identification, functional site characterization and interaction prediction. In this paper, we revisit a collection of 29 state-of-the-art PU learning bioinformatic applications to address various biological questions. Various important aspects are extensively discussed, including PU learning methodology, biological application, classifier design and evaluation strategy. We also comment on the existing issues of PU learning and offer our perspectives for the future development of PU learning applications. We anticipate that our work serves as an instrumental guideline for a better understanding of the PU learning framework in bioinformatics and further developing next-generation PU learning frameworks for critical biological applications.
Collapse
Affiliation(s)
- Fuyi Li
- Monash University, Australia
| | | | - André Leier
- Department of Genetics, UAB School of Medicine, USA
| | - Meiya Han
- Department of Biochemistry and Molecular Biology, Monash University, Australia
| | | | - Jing Xu
- Computer Science and Technology from Nankai University, China
| | - Xiaoyu Wang
- Department of Biochemistry and Molecular Biology and Biomedicine Discovery Institute, Monash University, Australia
| | - Shirui Pan
- University of Technology Sydney (UTS), Ultimo, NSW, Australia
| | - Cangzhi Jia
- College of Science, Dalian Maritime University, Australia
| | - Yang Zhang
- Northwestern Polytechnical University, China
| | - Geoffrey I Webb
- Faculty of Information Technology at Monash University, Australia
| | - Lachlan J M Coin
- Department of Clinical Pathology, University of Melbourne, Australia
| | - Chen Li
- Biomedicine Discovery Institute and Department of Biochemistry of Molecular Biology, Monash University, Australia
| | - Jiangning Song
- Monash Biomedicine Discovery Institute, Monash University, Melbourne, Australia
| |
Collapse
|
30
|
Lv H, Dao FY, Zulfiqar H, Lin H. DeepIPs: comprehensive assessment and computational identification of phosphorylation sites of SARS-CoV-2 infection using a deep learning-based approach. Brief Bioinform 2021; 22:6310410. [PMID: 34184738 PMCID: PMC8406875 DOI: 10.1093/bib/bbab244] [Citation(s) in RCA: 45] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2019] [Revised: 05/18/2020] [Accepted: 06/03/2021] [Indexed: 11/14/2022] Open
Abstract
The rapid spread of SARS-CoV-2 infection around the globe has caused a massive health and socioeconomic crisis. Identification of phosphorylation sites is an important step for understanding the molecular mechanisms of SARS-CoV-2 infection and the changes within the host cells pathways. In this study, we present DeepIPs, a first specific deep-learning architecture to identify phosphorylation sites in host cells infected with SARS-CoV-2. DeepIPs consists of the most popular word embedding method and convolutional neural network-long short-term memory network architecture to make the final prediction. The independent test demonstrates that DeepIPs improves the prediction performance compared with other existing tools for general phosphorylation sites prediction. Based on the proposed model, a web-server called DeepIPs was established and is freely accessible at http://lin-group.cn/server/DeepIPs. The source code of DeepIPs is freely available at the repository https://github.com/linDing-group/DeepIPs.
Collapse
Affiliation(s)
- Hao Lv
- Center for Informational Biology at the University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Fu-Ying Dao
- Center for Informational Biology at the University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Hasan Zulfiqar
- Center for Informational Biology at the University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Hao Lin
- Center for Informational Biology at the University of Electronic Science and Technology of China, Chengdu 610054, China
| |
Collapse
|